Brevity law

Last updated

In linguistics, the brevity law (also called Zipf's law of abbreviation) is a linguistic law that qualitatively states that the more frequently a word is used, the shorter that word tends to be, and vice versa; the less frequently a word is used, the longer it tends to be. [1] This is a statistical regularity that can be found in natural languages and other natural systems and that claims to be a general rule.

The brevity law was originally formulated by the linguist George Kingsley Zipf in 1945 as a negative correlation between the frequency of a word and its size. He analyzed a written corpus in American English and showed that the average lengths in terms of the average number of phonemes fell as the frequency of occurrence increased. Similarly, in a Latin corpus, he found a negative correlation between the number of syllables in a word and the frequency of its appearance. This observation says that the most frequent words in a language are the shortest, e.g. the most common words in English are: the, be (in different forms), to, of, and, a; all containing 1 to 3 phonemes. He claimed that this Law of Abbreviation is a universal structural property of language, hypothesizing that it arises as a result of individuals optimising form-meaning mappings under competing pressures to communicate accurately but also efficiently. [2] [3]

Since then, the law has been empirically verified for almost a thousand languages of 80 different linguistic families for the relationship between the number of letters in a written word & its frequency in text. [4] The Brevity law appears universal and has also been observed acoustically when word size is measured in terms of word duration. [5] 2016 evidence suggests it holds in the acoustic communication of other primates. [6]

Log per-million word count as a function of wordlength (number of characters) in the Brown Corpus, illustrating Zipf's Brevity Law. Zipf brevity law in Brown corpus.jpg
Log per-million word count as a function of wordlength (number of characters) in the Brown Corpus, illustrating Zipf's Brevity Law.

The origin of this statistical pattern seems to be related to optimization principles and derived by a mediation between two major constraints: the pressure to reduce the cost of production and the pressure to maximize transmission success. This idea is very related with the principle of least effort, which postulates that efficiency selects a path of least resistance or "effort". This principle of reducing the cost of production might also be related to principles of optimal data compression in information theory. [7]


See also

Related Research Articles

<span class="mw-page-title-main">Benjamin Lee Whorf</span> American linguist (1897–1941)

Benjamin Lee Whorf was an American linguist and fire prevention engineer best known for proposing the Sapir–Whorf hypothesis. He believed that the structures of different languages shape how their speakers perceive and conceptualize the world. Whorf saw this idea, named after him and his mentor Edward Sapir, as having implications similar to those of Einstein's principle of physical relativity. However, the concept originated from 19th-century philosophy and thinkers like Wilhelm von Humboldt and Wilhelm Wundt.

<span class="mw-page-title-main">Language</span> Structured system of communication

Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and written forms, and may also be conveyed through sign languages. Human language is characterized by its cultural and historical diversity, with significant variations observed between cultures and across time. Human languages possess the properties of productivity and displacement, which enable the creation of an infinite number of sentences, and the ability to refer to objects, events, and ideas that are not immediately present in the discourse. The use of human language relies on social convention and is acquired through learning.

Language acquisition is the process by which humans acquire the capacity to perceive and comprehend language. In other words, it is how human beings gain the ability to be aware of language, to understand it, and to produce and use words and sentences to communicate.

In linguistics and specifically phonology, a phoneme is any set of similar phones that, within a given language, is perceptually regarded as a single distinct sound and helps distinguish one word from another.

Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or, in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. The field of phonetics is traditionally divided into three sub-disciplines based on the research questions involved such as how humans plan and execute movements to produce speech, how various movements affect the properties of the resulting sound or how humans convert sound waves to linguistic information. Traditionally, the minimal linguistic unit of phonetics is the phone—a speech sound in a language which differs from the phonological unit of phoneme; the phoneme is an abstract categorization of phones and it is also defined as the smallest unit that discerns meaning between sounds in any given language.

<span class="mw-page-title-main">Zipf's law</span> Probability distribution

Zipf's law is an empirical law that often holds, approximately, when a list of measured values is sorted in decreasing order. It states that the value of the nth entry is inversely proportional to n.

Linguistic typology is a field of linguistics that studies and classifies languages according to their structural features to allow their comparison. Its aim is to describe and explain the structural diversity and the common properties of the world's languages. Its subdisciplines include, but are not limited to phonological typology, which deals with sound features; syntactic typology, which deals with word order and form; lexical typology, which deals with language vocabulary; and theoretical typology, which aims to explain the universal tendencies.

In functional-cognitive linguistics, as well as in semiotics, iconicity is the conceived similarity or analogy between the form of a sign and its meaning, as opposed to arbitrariness. The principle of iconicity is also shared by the approach of linguistic typology.

Linguistics is the scientific study of human language. Someone who engages in this study is called a linguist. See also the Outline of linguistics, the List of phonetics topics, the List of linguists, and the List of cognitive science topics. Articles related to linguistics include:

<span class="mw-page-title-main">Heaps' law</span> Heuristic for distinct words in a document

In linguistics, Heaps' law is an empirical law which describes the number of distinct words in a document as a function of the document length. It can be formulated as

<span class="mw-page-title-main">George Kingsley Zipf</span> Pioneering American linguist

George Kingsley Zipf, was an American linguist and philologist who studied statistical occurrences in different languages.

In linguistics and social sciences, markedness is the state of standing out as nontypical or divergent as opposed to regular or common. In a marked–unmarked relation, one term of an opposition is the broader, dominant one. The dominant default or minimum-effort form is known as unmarked; the other, secondary one is marked. In other words, markedness involves the characterization of a "normal" linguistic unit against one or more of its possible "irregular" forms.

Two types of language change can be characterized as linguistic drift: a unidirectional short-term and cyclic long-term drift.

Linguistic reconstruction is the practice of establishing the features of an unattested ancestor language of one or more given languages. There are two kinds of reconstruction:

Menzerath's law, or Menzerath–Altmann law, is a linguistic law according to which the increase of the size of a linguistic construct results in a decrease of the size of its constituents, and vice versa.

The lazy user model of solution selection (LUM) is a model in information systems proposed by Tétard and Collan that tries to explain how an individual selects a solution to fulfill a need from a set of possible solution alternatives. LUM expects that a solution is selected from a set of available solutions based on the amount of effort the solutions require from the user – the user is supposed to select the solution that carries the least effort. The model is applicable to a number of different types of situations, but it can be said to be closely related to technology acceptance models.

Quantitative linguistics (QL) is a sub-discipline of general linguistics and, more specifically, of mathematical linguistics. Quantitative linguistics deals with language learning, language change, and application as well as structure of natural languages. QL investigates languages using statistical methods; its most demanding objective is the formulation of language laws and, ultimately, of a general theory of language in the sense of a set of interrelated languages laws. Synergetic linguistics was from its very beginning specifically designed for this purpose. QL is empirically based on the results of language statistics, a field which can be interpreted as statistics of languages or as statistics of any linguistic object. This field is not necessarily connected to substantial theoretical ambitions. Corpus linguistics and computational linguistics are other fields which contribute important empirical evidence.

Afitti is a language spoken on the eastern side of Jebel el-Dair, a solitary rock formation in the North Kordofan province of Sudan. Although the term ‘Dinik’ can be used to designate the language regardless of cultural affiliation, people in the villages of the region readily recognize the terms ‘Ditti’ and ‘Afitti.’ There are approximately 4,000 speakers of the Afitti language and its closest linguistic neighbor is the Nyimang language, spoken west of Jebel el-Dair in the Nuba Mountains of the South Kordofan province of Sudan.

A bibliometrician is a researcher or a specialist in bibliometrics. It is near-synonymous with an informetrican, a scientometrican and a webometrician, who study webometrics.

The economy principle in linguistics, also known as linguistic economy, is a functional explanation of linguistic form. It suggests that the organization of phonology, morphology, lexicon and syntax is fundamentally based on a compromise between simplicity and clarity, two desirable but to some extent incompatible qualities. The more distinctive elements that a language has, for example, phonemes or functional markers, the more it will promote hearer-easiness. This, however, occurs on the expense of the speaker, who must make a greater effort to convey a message. An economic solution yields good communicative value without excessive time and energy costs.

References

  1. Zipf GK. 1949 Human behavior and the principle of least effort. Cambridge, MA: Addison-Wesley
  2. Zipf GK. 1935 The Psychobiology of language, an introduction to dynamic philology. Boston, MA: Houghton–Mifflin
  3. Zipf GK. 1949 Human behavior and the principle of least effort. Cambridge, MA: Addison-Wesley
  4. Bentz C, Ferrer-i-Cancho R. 2016 Zipf’s Law of abbreviation as a language universal. Universitätsbibliothek Tübingen.
  5. Tomaschek F, Wieling M, Arnold D, Baayen RH. 2013 Word frequency, vowel length and vowel quality in speech production: an EMA study of the importance of experience. In Proc. of the 14th Annual Conf. of the International Speech Communication Association (INTERSPEECH 2013), Lyon, France, 25–29 August (eds F Bimbot et al.), pp. 1302–1306
  6. Gustison ML, Semple S, Ferrer-i-Cancho R, Bergman TJ. 2016 Gelada vocal sequences follow Menzerath’s linguistic law. Proc. Natl Acad. Sci. USA 113, E2750-E2758
  7. Kanwal J, Smith K, Culbertson J, Kirby S. 2017 Zipf’s Law of abbreviation and the principle of least effort: language users optimise a miniature lexicon for efficient communication. Cognition 165, 45–52. ( doi : 10.1016/j.cognition.2017.05.001)