Corpus language

Last updated

A corpus language is a language that has no living speakers but for which numerous records produced by its native speakers survive. [1] Examples of corpus languages are Ancient Greek, Latin, the Egyptian Language, Old English and Elamite.

Some corpus languages left a very large corpus, such as Ancient Greek and Latin, and therefore can be fully reconstructed, even though some details of the pronunciation may be unclear. Such languages can be used even today, as is the case with Sanskrit and Latin. Others have such a limited corpus that some important words, e.g. some pronouns, are not found in the corpus. Examples for this are Ugaritic and Gothic. Languages that are only attested by a few words, often names, and a few phrases (called Trümmersprachen in German linguistics, literally "rubble languages") can only be reconstructed in a very limited way and often their genetic relationship to other languages remains unclear. Examples are the Lombardic language and Dadanitic, a Semitic language that may be close to classical Arabic.

Corpus languages are studied using the methods of corpus linguistics, but corpus linguistics can be used (and is commonly used) for the study of the writings and other records of living languages.

Not all extinct languages are "corpus languages," since there are many extinct languages in which few or no writings or other records survive.

Related Research Articles

In linguistics, declension is the changing of the form of a word, generally to express its syntactic function in the sentence, by way of some inflection. Declensions may apply to nouns, pronouns, adjectives, adverbs, and articles to indicate number, case, gender, and a number of other grammatical categories. Meanwhile, the inflectional change of verbs is called conjugation.

<span class="mw-page-title-main">Greek language</span> Indo-European language

Greek is an independent branch of the Indo-European family of languages, native to Greece, Cyprus, Italy, southern Albania, and other regions of the Balkans, the Black Sea coast, Asia Minor, and the Eastern Mediterranean. It has the longest documented history of any Indo-European language, spanning at least 3,400 years of written records. Its writing system is the Greek alphabet, which has been used for approximately 2,800 years; previously, Greek was recorded in writing systems such as Linear B and the Cypriot syllabary. The alphabet arose from the Phoenician script and was in turn the basis of the Latin, Cyrillic, Coptic, Gothic, and many other writing systems.

<span class="mw-page-title-main">Indo-European languages</span> Language family native to western and southern Eurasia

The Indo-European languages are a language family native to the overwhelming majority of Europe, the Iranian plateau, and the northern Indian subcontinent. Some European languages of this family, English, French, Portuguese, Russian, Dutch, and Spanish, have expanded through colonialism in the modern period and are now spoken across several continents. The Indo-European family is divided into several branches or sub-families, of which there are eight groups with languages still alive today: Albanian, Armenian, Balto-Slavic, Celtic, Germanic, Hellenic, Indo-Iranian, and Italic; and another nine subdivisions that are now extinct.

<span class="mw-page-title-main">Language</span> Structured system of communication

Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and written forms, and may also be conveyed through sign languages. The vast majority of human languages have developed writing systems that allow for the recording and preservation of the sounds or signs of language. Human language is characterized by its cultural and historical diversity, with significant variations observed between cultures and across time. Human languages possess the properties of productivity and displacement, which enable the creation of an infinite number of sentences, and the ability to refer to objects, events, and ideas that are not immediately present in the discourse. The use of human language relies on social convention and is acquired through learning.

<span class="mw-page-title-main">Egyptian language</span> Language spoken in ancient Egypt

The Egyptian language or Ancient Egyptian is an extinct Afro-Asiatic language that was spoken in ancient Egypt. It is known today from a large corpus of surviving texts which were made accessible to the modern world following the decipherment of the ancient Egyptian scripts in the early 19th century. Egyptian is one of the earliest written languages, first being recorded in the hieroglyphic script in the late 4th millennium BC. It is also the longest-attested human language, with a written record spanning over 4,000 years. Its classical form is known as Middle Egyptian, the vernacular of the Middle Kingdom of Egypt which remained the literary language of Egypt until the Roman period. By the time of classical antiquity the spoken language had evolved into Demotic, and by the Roman era it had diversified into the Coptic dialects. These were eventually supplanted by Arabic after the Muslim conquest of Egypt, although Bohairic Coptic remains in use as the liturgical language of the Coptic Church.

Historical linguistics, also termed diachronic linguistics, is the scientific study of language change over time. Principal concerns of historical linguistics include:

  1. to describe and account for observed changes in particular languages
  2. to reconstruct the pre-history of languages and to determine their relatedness, grouping them into language families
  3. to develop general theories about how and why language changes
  4. to describe the history of speech communities
  5. to study the history of words, i.e. etymology
<span class="mw-page-title-main">Anatolian languages</span> Extinct branch of Indo-European languages

The Anatolian languages are an extinct branch of Indo-European languages that were spoken in Anatolia, part of present-day Turkey. The best known Anatolian language is Hittite, which is considered the earliest-attested Indo-European language.

<span class="mw-page-title-main">Ancient Greek</span> Forms of Greek used from around the 16th century BC

Ancient Greek includes the forms of the Greek language used in ancient Greece and the ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek, Dark Ages, the Archaic period, and the Classical period.

<span class="mw-page-title-main">Extinct language</span> Language that no longer has any speakers

An extinct language is a language that no longer has any speakers, especially if the language has no living descendants. In contrast, a dead language is one that is no longer the native language of any community, even if it is still in use, like Latin. A dormant language is a dead language that still serves as a symbol of ethnic identity to a particular group. These languages are often undergoing a process of revitalisation. Languages that currently have living native speakers are sometimes called modern languages to contrast them with dead languages, especially in educational contexts.

Lydian is an extinct Indo-European Anatolian language spoken in the region of Lydia, in western Anatolia. The language is attested in graffiti and in coin legends from the late 8th century or the early 7th century to the 3rd century BCE, but well-preserved inscriptions of significant length are so far limited to the 5th century and the 4th century BCE, during the period of Persian domination. Thus, Lydian texts are effectively contemporaneous with those in Lycian.

In linguistics, mispronunciation is the act of pronouncing a word incorrectly. The matter of what is or is not mispronunciation is a contentious one, and indeed there is some disagreement about the extent to which the term is even meaningful. Languages are pronounced in different ways by different people, depending on such factors as the area they grew up in, their level of education, and their social class. Even within groups of the same area and class, different people can have different ways of pronouncing certain words.

<span class="mw-page-title-main">Diglossia</span> Community restriction of languages or dialects to specific settings

In linguistics, diglossia is a situation in which two dialects or languages are used by a single language community. In addition to the community's everyday or vernacular language variety, a second, highly codified lect is used in certain situations such as literature, formal education, or other specific settings, but not used normally for ordinary conversation. The H variety may have no native speakers but various degrees of fluency of the low speakers. In cases of three dialects, the term triglossia is used. When referring to two writing systems coexisting for a single language, the term digraphia is used.

<span class="mw-page-title-main">Proto-Indo-European language</span> Ancestor of the Indo-European languages

Proto-Indo-European (PIE) is the reconstructed common ancestor of the Indo-European language family. No direct record of Proto-Indo-European exists; its proposed features have been derived by linguistic reconstruction from documented Indo-European languages.

Language revitalization, also referred to as language revival or reversing language shift, is an attempt to halt or reverse the decline of a language or to revive an extinct one. Those involved can include linguists, cultural or community groups, or governments. Some argue for a distinction between language revival and language revitalization. There has only been one successful instance of a complete language revival, the Hebrew language, creating a new generation of native speakers without any pre-existing native speakers as a model.

Language contact occurs when speakers of two or more languages or varieties interact and influence each other. The study of language contact is called contact linguistics. When speakers of different languages interact closely, it is typical for their languages to influence each other. Language contact can occur at language borders, between adstratum languages, or as the result of migration, with an intrusive language acting as either a superstratum or a substratum.

Judaeo-Romance languages are Jewish languages derived from Romance languages, spoken by various Jewish communities originating in regions where Romance languages predominate, and altered to such an extent to gain recognition as languages in their own right. The status of many Judaeo-Romance languages is controversial as, despite manuscripts preserving transcriptions of Romance languages using the Hebrew alphabet, there is often little-to-no evidence that these "dialects" were actually spoken by Jews living in the various European nations.

<span class="mw-page-title-main">Languages of Scotland</span> Languages of a geographic region

The languages of Scotland are the languages spoken or once spoken in Scotland. Each of the numerous languages spoken in Scotland during its recorded linguistic history falls into either the Germanic or Celtic language families. The classification of the Pictish language was once controversial, but it is now generally considered a Celtic language. Today, the main language spoken in Scotland is English, while Scots and Scottish Gaelic are minority languages. The dialect of English spoken in Scotland is referred to as Scottish English.

The phonology of the Proto-Indo-European language (PIE) has been reconstructed by linguists, based on the similarities and differences among current and extinct Indo-European languages. Because PIE was not written, linguists must rely on the evidence of its earliest attested descendants, such as Hittite, Sanskrit, Ancient Greek, and Latin, to reconstruct its phonology.

<span class="mw-page-title-main">Languages of the Roman Empire</span> Languages of a geographic region

Latin and Greek were the dominant languages of the Roman Empire, but other languages were regionally important. Latin was the original language of the Romans and remained the language of imperial administration, legislation, and the military throughout the classical period. In the West, it became the lingua franca and came to be used for even local administration of the cities including the law courts. After all freeborn male inhabitants of the Empire were universally enfranchised in 212 AD, a great number of Roman citizens would have lacked Latin, though they were expected to acquire at least a token knowledge, and Latin remained a marker of "Romanness".

The evolution of languages or history of language includes the evolution, divergence and development of languages throughout time, as reconstructed based on glottochronology, comparative linguistics, written records and other historical linguistics techniques. The origin of language is a hotly contested topic, with some languages tentatively traced back to the Paleolithic. However, archaeological and written records extend the history of language into ancient times and the Neolithic.

References

  1. Langslow, D.R. 2002 "Approaching bilingualism in corpus languages" in James Noel Adams, Mark Janse, Simon Swain (edd.) Bilingualism in Ancient Society: Language Contact and the Written Text Oxford: OUP.

See also