My love

วันอังคารที่ 16 มกราคม พ.ศ. 2561

Using corpus analysis software to analyse specialized texts

Using corpus analysis software to analyse specialized texts
1.    What is a carpus?
In corpus linguistics, a corpus (sometimes used in the plural form "corpora can be generally defined as collection of naturally-occurring texts in a camputer-readable format which can be retrieved and analyzed using corpus analysis software'
2.    Sources of language corpora
-         http//corpus leeds.ac.uk/protectedlquery.html
-         http:/lextutor.ca/conc/eng/
-         http:l/www.arts.chula.ac.th/~ling ParaConc/index.html
-         http://www.athel.com/para html
3.    Designing a specialised corpus  
Corpus size
-         There are no fixed rules; depending on research purposes, availability of data and time.
-         Large, general corpora may be less useful than small, focused corpora if searches are made on context-specific terms.
-         There are limitations of too small corpora e.g. no enough hits to make decent generalization, not covering enough concepts, terms, or patterns under investigation.
-         Is preferable to create a "monitor' or 'open' corpus because specialized words/usage are dynamic.
Text extracts Vs. full text
-         Depends on the aim of corpus compilation.
-         Whole text offers more coverage because words or terms to be looked at may be randomly distributed throughout the text.
-         Specific sections may be helpful if we are looking for words or phrases under particular content areas or want to create purposeful sub-corpora.

My specialised corpus profile


Size
68,643 words
Source of corpus data
From the Internet
(https://batconservation.org/)
Number of texts
91 texts
Medium
Written
Subject
Bat
Text type
News articles
Authorship
Written by scientists
Language
Texts written in English by native speakers
Publication date
Recent texts (retrieved in September 2017)

Specialized Corpus Analysis
Topic: Bat
1.   Terminologies and collocation
turbine
virus
Habitat

turbine blade
turbine orientation
turbine nacelle

Ebola virus
Rabies virus
Khujand virus
Aravan virus
Bat virus


habitat degradation habitat destruction Swampy habitat habitat suitability


2.   Local grammar
2.1         Comparison of Adjectives

2.1.1Comparative Degree

Lesser false vampire bat smaller than their other vampire bat cousins, the lesser false vampire bats make their own homes in caves and hollowed out trees.

2.1.2Superlative Degree

For their size, bats are the slowest reproducing mammals on Earth.

The tropics have the biggest variety of bat species: Indonesia has 175 species of bats (about ten times the number of species found in the UK). Central and South America are home to almost one third of the world's bats species.


2.2         “Species” is used the both singular and plural.

This bat is a member of the species Eptesicus
fuscus (Big brown bat), which are found in the U.S.

There are more than 950, and perhaps as many as 1,200 species of bats.

Lesser false vampire bat, this bat species is another tent-making bat. Though it is considered a species of "least concern" on the IUCN Red List, it is found in forests and is somewhat at risk due to habitat loss.

3.   Style
3.1         Present Simple Tense

Common vampire bats, Desmodus rotundus, are warm-weather microbats from South America that live and hunt in groups.

They are cave dwellers, and are persecuted due to an unfounded fear of rabies: They are not major carriers of the disease.

 Bats are the only mammals capable of true flight.

Most bats eat flowers, small insects, fruits, nectar, pollen and leaves, though it depends on the type of bat.


3.2         Use of passive structure

This time of year, millions of bats from a spectrum of species are hunkered down in caves, where they’ll huddle together for warmth and hibernate through the winter.

This bat is a member of the species Eptesicus fuscus (Big brown bat), which are found in the U.S.

Many bat species around the world are threatened with extinction.

Bats are classified in the order Chiroptera, derived from the Greek words "cheir" for hand and "pteron," meaning wing.

4.   Content knowledge

4.1         use of acronyms
EPS = European Protected Species
BBC = British Broadcasting Corporation
IUCN = The International Union for Conservation of Nature
 BCT =   Bat Conservation Trust

4.2         use of comma (,)

 The Indiana Bat is a medium sized species. It can range in coloring from brown, black, or gray.

 Colonies of the Indian Bat will meet up with each other for hibernation. There can be thousands and thousands of them in one location. Typically, hibernation takes place in the state of Indiana, which is where their name comes from.

 Still other bat species feed on fish, frogs, lizards, small rodents, small birds, and even other bats. And while bats have an evil reputation for sucking blood.
4.3         use of Parentheses or Round Brackets

 Bats range in size across the different species, but tend to average about 5.5 to 19 cm in length (tip to tail) with a wingspread of approximately 15 to 38 cm. Most weigh between 3.5 and 60 grams (in the U.S

 The fossil record of bats prior to the Pleistocene Epoch (about 2,600,000 to 11,700 years ago) is limited and reveals little about bat evolution.


 The order Chiroptera is readily divided into two suborders—Megachiroptera (large Old World fruit bats) and Microchiroptera (small bats).