Quantitative linguistics

Last updated

Quantitative linguistics (QL) is a sub-discipline of general linguistics and, more specifically, of mathematical linguistics. Quantitative linguistics deals with language learning, language change, and application as well as structure of natural languages. QL investigates languages using statistical methods; its most demanding objective is the formulation of language laws and, ultimately, of a general theory of language in the sense of a set of interrelated languages laws. [1] Synergetic linguistics was from its very beginning specifically designed for this purpose. [2] QL is empirically based on the results of language statistics, a field which can be interpreted as statistics of languages or as statistics of any linguistic object. This field is not necessarily connected to substantial theoretical ambitions. Corpus linguistics and computational linguistics are other fields which contribute important empirical evidence.

Contents

History

The earliest QL approaches date back to the ancient Indian world. One of the historical sources consists of applications of combinatorics to linguistic matters, [3] another one is based on elementary statistical studies, which can be found under the header colometry and stichometry. [4]

Language laws

Frequency of demonstratives in Serbo-Croatian Frequency of demonstratives2.jpg
Frequency of demonstratives in Serbo-Croatian

In QL, the concept of law is understood as the class of law hypotheses which have been deduced from theoretical assumptions, are mathematically formulated, are interrelated with other laws in the field, and have sufficiently and successfully been tested on empirical data, i.e. which could not be refuted in spite of much effort to do so. Köhler writes about QL laws: "Moreover, it can be shown that these properties of linguistic elements and of the relations among them abide by universal laws which can be formulated strictly mathematically in the same way as common in the natural sciences. One has to bear in mind in this context that these laws are of stochastic nature; they are not observed in every single case (this would be neither necessary nor possible); they rather determine the probabilities of the events or proportions under study. It is easy to find counterexamples to each of the above-mentioned examples; nevertheless, these cases do not violate the corresponding laws as variations around the statistical mean are not only admissible but even essential; they are themselves quantitatively exactly determined by the corresponding laws. This situation does not differ from that in the natural sciences, which have since long abandoned the old deterministic and causal views of the world and replaced them by statistical/probabilistic models." [5]

Linguistic laws

In quantitative linguistics, linguistic laws are statistical regularities emerging across different linguistic scales (i.e. phonemes, syllables, words or sentences) that can be formulated mathematically and that have been deduced from certain theoretical assumptions. They are also required to have been successfully tested through the use of data, that is, not to have been refuted by empirical evidence. Among the main linguistic laws proposed by various authors, the following can be highlighted: [6]

Stylistics

The study of poetic and also non-poetic styles can be based on statistical methods; moreover, it is possible to conduct corresponding investigations on the basis of the specific forms (parameters) language laws take in texts of different styles. In such cases, QL supports research into stylistics: One of the overall aims is evidence as objective as possible also in at least part of the domain of stylistic phenomena by referring to language laws. One of the central assumptions of QL is that some laws (e.g. the distribution of word lengths) require different models, at least different parameter values of the laws (distributions or functions) depending on the text sort a text belongs to. If poetic texts are under study QL methods form a sub-discipline of Quantitative Study of Literature (stylometrics). [9]

Important authors

See also

Notes

  1. Reinhard Köhler: Gegenstand und Arbeitsweise der Quantitativen Linguistik. In: Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (Hrsg.): Quantitative Linguistik - Quantitative Linguistics. Ein internationales Handbuch. de Gruyter, Berlin/ New York 2005, pp. 1–16. ISBN   3-11-015578-8.
  2. Reinhard Köhler: Synergetic linguistics. In: Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (Hrsg.): Quantitative Linguistik - Quantitative Linguistics. Ein internationales Handbuch. de Gruyter, Berlin/ New York 2005, pp. 760–774. ISBN   3-11-015578-8.
  3. N.L. Biggs: The Roots of Combinatorics. In: Historia Mathematica 6, 1979, pp. 109–136.
  4. Adam Pawłowski: Prolegomena to the History of Corpus and Quantitative Linguistics. Greek Antiquity. In: Glottotheory 1, 2008, pp. 48–54.
  5. cf. note 1, pp. 1–2.
  6. cf. references: Köhler, Altmann, Piotrowski (eds.) (2005)
  7. H. Guiter, M. V. Arapov (eds.): Studies on Zipf's Law. Bochum: Brockmeyer 1982. ISBN   3-88339-244-8.
  8. Zipf GK. 1935The Psychobiology of language, an introduction to dynamic philology. Boston, MA: Houghton–Mifflin.
  9. Alexander Mehler: Eigenschaften der textuellen Einheiten und Systeme. In: Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (Hrsg.): Quantitative Linguistik - Quantitative Linguistics. Ein internationales Handbuch. de Gruyter, Berlin/ New York 2005, p. 325-348, esp. Quantitative Stilistik, pp. 339–340. ISBN   3-11-015578-8; Vivien Altmann, Gabriel Altmann: Anleitung zu quantitativen Textanalysen. Methoden und Anwendungen. Lüdenscheid: RAM-Verlag 2008, ISBN   978-3-9802659-5-9.
  10. Grzybek, Peter, & Köhler, Reinhard (eds.) (2007): Exact Methods in the Study of Language and Text. Dedicated to Gabriel Altmann on the Occasion of his 75th Birthday. Berlin/ New York: Mouton de Gruyter
  11. de:Benutzer:Dr._Karl-Heinz_Best
  12. index
  13. de:Sergei Grigorjewitsch Tschebanow
  14. Best, Karl-Heinz (2009): William Palin Elderton (1877-1962). Glottometrics 19, p. 99-101 (PDF ram-verlag.eu).
  15. Homepage_Gertraud Fenk
  16. de:Ernst Förstemann; Karl-Heinz Best: Ernst Wilhelm Förstemann (1822-1906). In: Glottometrics 12, 2006, pp. 77–86 (PDF ram-verlag.eu)
  17. Dieter Aichele: Das Werk von W. Fucks. In: Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (Hrsg.): Quantitative Linguistik - Quantitative Linguistics. Ein internationales Handbuch. de Gruyter, Berlin/ New York 2005, pp. 152–158. ISBN   3-11-015578-8
  18. Peter Grzybek :: Homepage : Home / Kontakt Archived September 29, 2012, at the Wayback Machine
  19. de:Gustav Herdan
  20. "Herdan dimension - Laws in Quantitative Linguistics". Archived from the original on 2011-07-19. Retrieved 2010-05-22.
  21. de:Luděk Hřebíček
  22. de:Friedrich Wilhelm Kaeding
  23. Universität Trier: Prof. Dr. Reinhard Köhler Archived 2015-04-07 at the Wayback Machine
  24. Kordić, Snježana (2001). Wörter im Grenzbereich von Lexikon und Grammatik im Serbokroatischen[Serbo-Croatian Words on the Border Between Lexicon and Grammar]. Studies in Slavic Linguistics; 18 (in German). Munich: Lincom Europa. p. 280. ISBN   3-89586-954-6. LCCN   2005530314. OCLC   47905097. OL   2863539W. NYPL   b15245330. NCID   BA56769448.
  25. Kordić, Snježana (2005) [1st pub. 1999; 2nd pub. 2002; 3rd pub. 2005]. Der Relativsatz im Serbokroatischen[Relative Clauses in Serbo-Croatian]. Studies in Slavic Linguistics; 10 (in German). Munich: Lincom Europa. p. 330. ISBN   3-89586-573-7. OCLC   42422661. OL   2863535W. S2CID   171902446. NYPL   b14328353. Contents
  26. Georg-August-Universität Göttingen - Lehfeldt, Werner, Prof. em. Dr
  27. Festschrift on the occasion of the 70. anniversary: Problems of General, Germanic and Slavic Linguistics. Papers for 70th Anniversary of Professor V. Levickij. Herausgegeben von Gabriel Altmann, Iryna Zadoroshna, Yuliya Matskulyak. Books, Chernivtsi 2008. (No ISBN.) Levickij dedicated: Glottometrics, Heft 16, 2008; Emmerich Kelih: Der Czernowitzer Beitrag zur Quantitativen Linguistik: Zum 70. Geburtstag von Prof. Dr. Habil. Viktor V. Levickij. In: Naukovyj Visnyk Černivec'koho Universytetu: Hermans'ka filolohija. Vypusk 407, 2008, pp. 3–10.
  28. Human-Language-Computer - staff Homepage, ZJU
  29. Karl-Heinz Best: Paul Menzerath (1883-1954). In: Glottometrics 14, 2007, pp. 86–98 (PDF ram-verlag.eu)
  30. Shizuo Mizutani; Portrait on the occasion of his 80. anniversary in: Glottometrics 12, 2006 (PDF ram-verlag.eu); about Mizutani: Naoko Maruyama: Sizuo Mizutani (1926). The Founder of Japanese Quantitative Linguistics. In: Glottometrics 10, 2005, pp. 99-107 (PDF ram-verlag.eu).
  31. Charles Muller: Initiation à la statistique linguistique. Paris: Larousse 1968; German: Einführung in die Sprachstatistik. Hueber, München 1972.
  32. Rajmund G. Piotrowski, R.G. Piotrovskij; cf. Piotrowski's law: http://lql.uni-trier.de/index.php/Change_in_language Archived 2011-07-19 at the Wayback Machine
  33. de:Piotrowski-Gesetz
  34. Journal of Quantitative Linguistics 4, Nr. 1, 1997 (Festschrift in Honour of Juh. Tuldava)
  35. Dr Andrew Wilson - Linguistics and English Language at Lancaster University
  36. de:Albert Thumb
  37. de:Eberhard Zwirner

Related Research Articles

<span class="mw-page-title-main">Functional linguistics</span> Approach to linguistics

Functional linguistics is an approach to the study of language characterized by taking systematically into account the speaker's and the hearer's side, and the communicative needs of the speaker and of the given language community. Linguistic functionalism spawned in the 1920s to 1930s from Ferdinand de Saussure's systematic structuralist approach to language (1916).

<span class="mw-page-title-main">Zipf's law</span> Probability distribution

Zipf's law is an empirical law that often holds, approximately, when a list of measured values is sorted in decreasing order. It states that the value of the nth entry is inversely proportional to n.

Linguistic typology is a field of linguistics that studies and classifies languages according to their structural features to allow their comparison. Its aim is to describe and explain the structural diversity and the common properties of the world's languages. Its subdisciplines include, but are not limited to phonological typology, which deals with sound features; syntactic typology, which deals with word order and form; lexical typology, which deals with language vocabulary; and theoretical typology, which aims to explain the universal tendencies.

<span class="mw-page-title-main">Loanword</span> Word borrowed from a donor language and incorporated into a recipient language

A loanword is a word at least partly assimilated from one language into another language, through the process of borrowing. Loanwords may be adapted to the phonology, phonotactics, orthography, and morphology of the target language. When a loanword is fully adapted to the rules of the target language, it is distinguished from native words of the target language only by its origin. However, often the adaptation is incomplete, so loanwords may conserve specific features distinguishing them from native words of the target language: for example, loaned phonemes and sound combinations, partial or total conserving of the original spelling, foreign plural or case forms or indeclinability.

Glottochronology is the part of lexicostatistics which involves comparative linguistics and deals with the chronological relationship between languages.

<span class="mw-page-title-main">Low Franconian</span> Language family

In historical and comparative linguistics, Low Franconian is a linguistic category used to classify a number of historical and contemporary West Germanic varieties closely related to, and including, the Dutch language. Most dialects and languages included within this category are spoken in the Netherlands, northern Belgium (Flanders), in the Nord department of France, in western Germany, as well as in Suriname, South Africa and Namibia.

In sociolinguistics, an abstand language is a language variety or cluster of varieties with significant linguistic distance from all others, while an ausbau language is a standard variety, possibly with related dependent varieties. Heinz Kloss introduced these terms in 1952 to denote two separate and largely independent sets of criteria for recognizing a "language":

<span class="mw-page-title-main">East Germanic languages</span> Group of extinct Indo-European languages in the Germanic family

The East Germanic languages, also called the Oder-Vistula Germanic languages, are a group of extinct Germanic languages that were spoken by East Germanic peoples. East Germanic is one of the primary branches of Germanic languages, along with North Germanic and West Germanic.

In linguistics and grammar, a sentence is a linguistic expression, such as the English example "The quick brown fox jumps over the lazy dog." In traditional grammar it is typically defined as a string of words that expresses a complete thought, or as a unit consisting of a subject and predicate. In non-functional linguistics it is typically defined as a maximal unit of syntactic structure such as a constituent. In functional linguistics, it is defined as a unit of written texts delimited by graphological features such as upper-case letters and markers such as periods, question marks, and exclamation marks. This notion contrasts with a curve, which is delimited by phonologic features such as pitch and loudness and markers such as pauses; and with a clause, which is a sequence of words that represents some process going on throughout time. A sentence can include words grouped meaningfully to express a statement, question, exclamation, request, command, or suggestion.

<span class="mw-page-title-main">Jan Brzechwa</span> Polish poet and author (1898–1966)

Jan Brzechwa, was a Polish poet, author and lawyer, known mostly for his contribution to children's literature. He was born Jan Wiktor Lesman to a Polish family of Jewish descent.

The Neo-Aramaic or Modern Aramaic languages are varieties of Aramaic that evolved during the late medieval and early modern periods, and continue to the present day as vernacular (spoken) languages of modern Aramaic-speaking communities. Within the field of Aramaic studies, classification of Neo-Aramaic languages has been a subject of particular interest among scholars, who proposed several divisions, into two, three or four primary groups.

<span class="mw-page-title-main">András Kornai</span> Hungarian mathematical linguist

András Kornai, son of economist János Kornai, is a mathematical linguist. He has earned two PhDs. He earned his first in Mathematics in 1983 from Eötvös Loránd University in Budapest, where his advisor was Miklós Ajtai, and his second in Linguistics in 1991 from Stanford University, where his advisor was Paul Kiparsky.

Stefan Th. Gries is (full) professor of linguistics in the Department of Linguistics at the University of California, Santa Barbara (UCSB), Honorary Liebig-Professor of the Justus-Liebig-Universität Giessen, and since 1 April 2018 also Chair of English Linguistics at the Justus-Liebig-Universität Giessen.

Menzerath's law, or Menzerath–Altmann law, is a linguistic law according to which the increase of the size of a linguistic construct results in a decrease of the size of its constituents, and vice versa.

Behaghel's Laws describe the basic principles of the position of words and phrases in a sentence. They were formulated by the linguist Otto Behaghel in the last volume of his four volume work Deutsche Syntax: Eine geschichtliche Darstellung.

<span class="mw-page-title-main">Upper German</span> Family of High German languages

Upper German is a family of High German dialects spoken primarily in the southern German-speaking area.

William Freeman Twaddell (1906–1982) was a professor of German and linguistics, who worked at Brown University as a linguist during the 1950s and 1960s. He also served as president of the Linguistic Society of America in 1957.

The Pakawan languages were a small language family spoken in what is today northern Mexico and southern Texas. All Pakawan languages are today extinct.

<span class="mw-page-title-main">Zhao Jin (linguist)</span>

Zhao Jin is a Chinese professor of German linguistics and a scholar in cultural-analytical linguistics.

Paul Menzerath was a German linguist and experimental phonetician. He discovered that in German, longer words used shorter syllables and he suggested that other languages may also follow this principle and was able to confirm it for Spanish. It was later examined by Gabriel Altmann and the rule is called Menzerath's law or the Menzerath-Altmann law.

References