Linguistics Research Center at UT Austin

Last updated

The Linguistics Research Center (LRC) at the University of Texas is a center for computational linguistics research & development. It was directed by Prof. Winfred Lehmann until his death in 2007, and subsequently by Dr. Jonathan Slocum. Since its founding, virtually all projects at the LRC have involved processing natural language texts with the aid of computers. The principal activities of the Center at present focus on Indo-European languages and comprise historical study, lexicography, and web-based teaching; staff members engage in several independent but often complementary projects in these fields using a variety of software, almost all of it developed in-house.

History

The LRC was founded by Winfred Lehmann in 1961. In the early days, research efforts at the LRC concentrated on machine translation (MT) -- the translation of texts from one human language to another with the aid of computers, very developed nowadays in the field of language industry—funded by the USAF and other sponsors. The LRC concentrated on German-English translation, though a copy of the Russian Master Dictionary was deposited at the LRC after the ALPAC report. After a general hiatus ca. 1975-78, new funding led to the development by Jonathan Slocum and others of a new system with the same name (the METAL MT system), but with new sets of tools for linguists and vastly greater success, resulting in the delivery a production prototype then later a full-fledged commercial MT system. MT R&D continued at the LRC, with funding by various sponsors, until well into the 1990s.

From its early years to the present, the LRC has mounted a number of smaller projects resulting in the publication of significant works relating to Indo-European languages and/or their common ancestor, Proto-Indo-European. The hallmark of this work has been the use of computers to transcribe texts and prepare them for publication. A prominent example of the LRC using computers to prepare texts for print publication is the book by Winfred P. Lehmann, A Gothic Etymological Dictionary (Leiden: Brill, 1986). The final print-ready version was produced with the aid of a laser printer (exotic new technology, in those days) using, for the various languages included in the entries, approximately 500 special characters—many of them designed at the Center. This was the first major etymological dictionary for Indo-European languages to be produced with the aid of computers.

Current LRC projects have concentrated on transcribing early Indo-European texts, developing language lessons based on them, and publishing on the web these and other materials related to the study of Indo-European languages, of their common ancestor Proto-Indo-European, and of historical linguistics more generally.

Alumni

Related Research Articles

Baltic languages Balto-Slavic languages of the Indo-European language family

The Baltic languages belong to the Balto-Slavic branch of the Indo-European language family. Baltic languages are spoken by the Balts, mainly in areas extending east and southeast of the Baltic Sea in Northern Europe.

Comparative method Technique for studying the historical development of languages, based on language comparison

In linguistics, the comparative method is a technique for studying the development of languages by performing a feature-by-feature comparison of two or more languages with common descent from a shared ancestor and then extrapolating backwards to infer the properties of that ancestor. The comparative method may be contrasted with the method of internal reconstruction in which the internal development of a single language is inferred by the analysis of features within that language. Ordinarily, both methods are used together to reconstruct prehistoric phases of languages; to fill in gaps in the historical record of a language; to discover the development of phonological, morphological and other linguistic systems and to confirm or to refute hypothesised relationships between languages.

Leonard Bloomfield was an American linguist who led the development of structural linguistics in the United States during the 1930s and the 1940s. He is considered to be the father of American distributionalism. His influential textbook Language, published in 1933, presented a comprehensive description of American structural linguistics. He made significant contributions to Indo-European historical linguistics, the description of Austronesian languages, and description of languages of the Algonquian family.

Tocharian languages Extinct Indo-European languages in Asia

The Tocharianlanguages, also known as Arśi-Kuči, Agnean-Kuchean or Kuchean-Agnean, are an extinct branch of the Indo-European language family spoken by inhabitants of the Tarim Basin, the Tocharians. They are known from manuscripts dating from the 5th to the 8th century AD, which were found in oasis cities on the northern edge of the Tarim Basin and the Lop Desert. The discovery of this language family in the early 20th century contradicted the formerly prevalent idea of an east–west division of the Indo-European language family on the centum–satem isogloss, and prompted reinvigorated study of the family. Mistakenly identifying the authors with the Tokharoi people of ancient Bactria (Tokharistan), early authors called these languages "Tocharian". This naming has remained, although the names Agnean and Kuchean have been proposed as a replacement.

Historical linguistics, also termed diachronic linguistics, is the scientific study of language change over time. Principal concerns of historical linguistics include:

  1. to describe and account for observed changes in particular languages
  2. to reconstruct the pre-history of languages and to determine their relatedness, grouping them into language families
  3. to develop general theories about how and why language changes
  4. to describe the history of speech communities
  5. to study the history of words, i.e. etymology
Avestan Eastern Iranian language used in Zoroastrian scripture

Avestan, also known historically as Zend, comprises two languages: Old Avestan and Younger Avestan. The languages are known only from their use as the language of Zoroastrian scripture, from which they derive their name. Both are early Iranian languages, a branch of the Indo-Iranian languages within the Indo-European family. Its immediate ancestor was the Proto-Iranian language, a sister language to the Proto-Indo-Aryan language, with both having developed from the earlier Proto-Indo-Iranian. As such, Old Avestan is quite close in grammar and lexicon to Vedic Sanskrit, the oldest preserved Indo-Aryan language.

Proto-Germanic language Ancestor of the Germanic languages

Proto-Germanic is the reconstructed proto-language of the Germanic branch of the Indo-European languages.

Hittite language extinct Bronze Age Indo-European language

Hittite, also known as Nesite, was an Indo-European language that was spoken by the Hittites, a people of Bronze Age Anatolia who created an empire centred on Hattusa, as well as parts of the northern Levant and Upper Mesopotamia. The language, now long extinct, is attested in cuneiform, in records dating from the 17th to the 13th centuries BCE, with isolated Hittite loanwords and numerous personal names appearing in an Old Assyrian context from as early as the 20th century BCE, making it the earliest-attested of the Indo-European languages.

Etymology is the study of the history of words. By extension, the etymology of a word means its origin and development throughout history.

Comparative linguistics, or comparative-historical linguistics is a branch of historical linguistics that is concerned with comparing languages to establish their historical relatedness.

The Indogermanisches etymologisches Wörterbuch was published in 1959 by the Austrian-German comparative linguist and Celtic languages expert Julius Pokorny. It is an updated and slimmed-down reworking of the three-volume Vergleichendes Wörterbuch der indogermanischen Sprachen.

Classical Armenian

Classical Armenian is the oldest attested form of the Armenian language. It was first written down at the beginning of the 5th century, and all Armenian literature from then through the 18th century is in Classical Armenian. Many ancient manuscripts originally written in Ancient Greek, Persian, Hebrew, Syriac and Latin survive only in Armenian translation.

Winfred P. Lehmann American linguist

Winfred Philip Lehmann was an American linguist who specialized in historical, Germanic, and Indo-European linguistics. He was for many years a professor and head of departments for linguistics at the University of Texas at Austin, and served as president of both the Linguistic Society of America and the Modern Language Association. Lehmann was also a pioneer in machine translation. He lectured a large number of future scholars at Austin, and was the author of several influential works on linguistics.

Grundriß der vergleichenden Grammatik der indogermanischen Sprachen is a major work of historical linguistics by Karl Brugmann and Berthold Delbrück, published in two editions between 1886 and 1916. Brugmann treated phonology and morphology, and Delbrück treated syntax. The grammar of Proto-Indo-European (PIE) is reconstructed from those of its daughter languages known in the late 19th century. The work represents a major step in Indo-European studies, after Franz Bopp's Comparative Grammar of 1833 and August Schleicher's Compendium of 1871. Brugmann's neogrammarian re-evaluation of PIE resulted in a view that in its essence continued to be valid until present times.

Proto-language Common ancestor of a language family

In the tree model of historical linguistics, a proto-language is a postulated ancestral language from which a number of attested languages are believed to have descended by evolution, forming a language family. Proto-languages are usually unattested, or in some cases only partially attested. They are reconstructed by way of the comparative method.

In historical linguistics, the Germanic parent language (GPL) includes the reconstructed languages in the Germanic group referred to as Pre-Germanic Indo-European (PreGmc), Early Proto-Germanic (EPGmc), and Late Proto-Germanic (LPGmc), spoken in the 2nd and 1st millennia BC.

Ranko Matasović is a Croatian linguist, Indo-Europeanist and Celticist.

The Indo-European Etymological Dictionary is a research project of the Department of Comparative Indo-European Linguistics at Leiden University, initiated in 1991 by Peter Schrijver and others. It is financially supported by the Faculty of Humanities and Centre for Linguistics of Leiden University, Brill Publishers, and the Netherlands Organisation for Scientific Research.

Vladimir Orel

Vladimir Emmanuilovich Orël was a Russian linguist.

The Sino-Tibetan Etymological Dictionary and Thesaurus was a linguistics research project hosted at the University of California at Berkeley. The project, which focused on Sino-Tibetan historical linguistics, started in 1987 and lasted until 2015.