A sprachbund (German: [ˈʃpʁaːxbʊnt] , lit. "language federation"), also known as a linguistic area, area of linguistic convergence, diffusion area or language crossroads, is a group of languages that share areal features resulting from geographical proximity and language contact. The languages may be genetically unrelated, or only distantly related, but the sprachbund characteristics might give a false appearance of relatedness.


A grouping of languages that share features can only be defined as a sprachbund if the features are shared for some reason other than the genetic history of the languages. Because of this, attempts to classify some language families without knowledge about the history of the languages can lead to misclassification as sprachbunds and similarly some sprachbunds are incorrectly classified as language families. [1]


In a 1904 paper, Jan Baudouin de Courtenay emphasised the need to distinguish between language similarities arising from a genetic relationship (rodstvo) and those arising from convergence due to language contact (srodstvo). [2] [3]

Nikolai Trubetzkoy introduced the Russian term языковой союз (yazykovoy soyuz; "language union") in a 1923 article. [4] In a paper presented to the first International Congress of Linguists in 1928, he used a German calque of this term, Sprachbund, defining it as a group of languages with similarities in syntax, morphological structure, cultural vocabulary and sound systems, but without systematic sound correspondences, shared basic morphology or shared basic vocabulary. [5] [3]

Later workers, starting with Trubetzkoy's colleague Roman Jakobson, [6] [7] have relaxed the requirement of similarities in all four of the areas stipulated by Trubetzkoy. [8] [9] [10]


Western Europe

Standard Average European (SAE) is a concept introduced in 1939 by Benjamin Whorf to group the modern Indo-European languages of Europe which shared common features. [11] Whorf argued that these languages were characterized by a number of similarities including syntax and grammar, vocabulary and its use as well as the relationship between contrasting words and their origins, idioms and word order which all made them stand out from many other language groups around the world which do not share these similarities; in essence creating a continental sprachbund. His point was to argue that the disproportionate degree of knowledge of SAE languages biased linguists towards considering grammatical forms to be highly natural or even universal, when in fact they were only peculiar to the SAE language group.

Whorf likely considered Romance and West Germanic to form the core of the SAE, i.e. the literary languages of Europe which have seen substantial cultural influence from Latin during the medieval period. The North Germanic and Balto-Slavic languages tend to be more peripheral members.

Alexander Gode, who was instrumental in the development of Interlingua, characterized it as "Standard Average European". [12] The Romance, Germanic, and Slavic control languages of Interlingua are reflective of the language groups most often included in the SAE Sprachbund.

The Standard Average European Sprachbund is most likely the result of ongoing language contact in the time of the Migration Period [13] and later, continuing during the Middle Ages and the Renaissance.[ citation needed ] Inheritance of the SAE features from Proto-Indo-European can be ruled out because Proto-Indo-European, as currently reconstructed, lacked most of the SAE features. [14]

The Balkans

The idea of areal convergence is commonly attributed to Jernej Kopitar's description in 1830 of Albanian, Bulgarian and Romanian as giving the impression of "nur eine Sprachform ... mit dreierlei Sprachmaterie", [15] which has been rendered by Victor Friedman as "one grammar with the three lexicons". [16] [17]

The Balkan Sprachbund comprises Albanian, Romanian, the South Slavic languages of the southern Balkans (Bulgarian, Macedonian and to a lesser degree Serbo-Croatian), Greek, Balkan Turkish, and Romani.

All but one of these are Indo-European languages but from very divergent branches, and Turkish is a Turkic language. Yet they have exhibited several signs of grammatical convergence, such as avoidance of the infinitive, future tense formation, and others.

The same features are not found in other languages that are otherwise closely related, such as the other Romance languages in relation to Romanian, and the other Slavic languages such as Polish in relation to Bulgaro-Macedonian. [9] [17]

Indian subcontinent

In a classic 1956 paper titled "India as a Linguistic Area", Murray Emeneau laid the groundwork for the general acceptance of the concept of a sprachbund. In the paper, Emeneau observed that the subcontinent's Dravidian and Indo-Aryan languages shared a number of features that were not inherited from a common source, but were areal features, the result of diffusion during sustained contact. These include retroflex consonants, echo words, subject–object–verb word order, discourse markers, and the quotative. [18]

Emeneau specified the tools to establish that language and culture had fused for centuries on the Indian soil to produce an integrated mosaic of structural convergence of four distinct language families: Indo-Aryan, Dravidian, Munda and Tibeto-Burman. This concept provided scholarly substance for explaining the underlying Indian-ness of apparently divergent cultural and linguistic patterns. With his further contributions, this area has now become a major field of research in language contact and convergence. [9] [19] [20]

Mainland Southeast Asia

The Mainland Southeast Asia linguistic area is one of the most dramatic of linguistic areas in terms of the surface similarity of the languages involved, to the extent that early linguists tended to group them all into a single family, although the modern consensus places them into numerous unrelated families. The area stretches from Thailand to China and is home to speakers of languages of the Sino-Tibetan, Hmong–Mien (or Miao–Yao), Tai–Kadai, Austronesian (represented by Chamic) and Mon–Khmer families. [21]

Neighbouring languages across these families, though presumed unrelated, often have similar features, which are believed to have spread by diffusion. A well-known example is the similar tone systems in Sinitic languages (Sino-Tibetan), Hmong–Mien, Tai languages (Kadai) and Vietnamese (Mon–Khmer). Most of these languages passed through an earlier stage with three tones on most syllables (but no tonal distinctions on checked syllables ending in a stop consonant), which was followed by a tone split where the distinction between voiced and voiceless consonants disappeared but in compensation the number of tones doubled. These parallels led to confusion over the classification of these languages, until André-Georges Haudricourt showed in 1954 that tone was not an invariant feature, by demonstrating that Vietnamese tones corresponded to certain final consonants in other languages of the Mon–Khmer family, and proposed that tone in the other languages had a similar origin. [21]

Similarly, the unrelated Khmer (Mon–Khmer), Cham (Austronesian) and Lao (Kadai) languages have almost identical vowel systems. Many languages in the region are of the isolating (or analytic) type, with mostly monosyllabic morphemes and little use of inflection or affixes, though a number of Mon–Khmer languages have derivational morphology. Shared syntactic features include classifiers, object–verb order and topic–comment structure, though in each case there are exceptions in branches of one or more families. [21]

Northeast Asia

Some linguists, such as Matthias Castrén, G. J. Ramstedt, Nicholas Poppe and Pentti Aalto supported the idea that the Mongolic, Turkic, and Tungusic families of Asia (and some small parts of Europe) are genetically related, in a controversial group they call Altaic. Koreanic and Japonic languages, which are also hypothetically related according to some scholars like William George Aston, Shōsaburō Kanazawa, Samuel Martin and Sergei Starostin, are sometimes included as part of the purported Altaic family. This latter hypothesis was supported by people including Roy Andrew Miller, John C. Street and Karl Heinrich Menges. Gerard Clauson, Gerhard Doerfer, Juha Janhunen, Stefan Georg and others dispute or reject this. A common alternative explanation for similarities between said Altaic languages such as vowel harmony and agglutination is that they are due to areal diffusion. [22]


Comparative method

In linguistics, the comparative method is a technique for studying the development of languages by performing a feature-by-feature comparison of two or more languages with common descent from a shared ancestor and then extrapolating backwards to infer the properties of that ancestor. The comparative method may be contrasted with the method of internal reconstruction in which the internal development of a single language is inferred by the analysis of features within that language. Ordinarily, both methods are used together to reconstruct prehistoric phases of languages; to fill in gaps in the historical record of a language; to discover the development of phonological, morphological and other linguistic systems and to confirm or to refute hypothesised relationships between languages.

Language family Group of languages related through descent from a common ancestor

A language family is a group of languages related through descent from a common ancestral language or parental language, called the proto-language of that family. The term "family" reflects the tree model of language origination in historical linguistics, which makes use of a metaphor comparing languages to people in a biological family tree, or in a subsequent modification, to species in a phylogenetic tree of evolutionary taxonomy. Linguists therefore describe the daughter languages within a language family as being genetically related.

Roman Jakobson Russian-American linguist

Roman Osipovich Jakobson was a Russian Empire-born American linguist and literary theorist.

Ural-Altaic languages Former language family

Ural-Altaic, Uralo-Altaic or Uraltaic is a linguistic convergence zone and former language-family proposal uniting the Uralic and the Altaic languages. It is generally now agreed that even the Altaic languages most likely do not share a common descent: the similarities among Turkic, Mongolic and Tungusic are better explained by diffusion and borrowing. The term continues to be used for the central Eurasian typological, grammatical and lexical convergence zone. Indeed, "Ural-Altaic" may be preferable to "Altaic" in this sense. For example, J. Janhunen states that "speaking of 'Altaic' instead of 'Ural-Altaic' is a misconception, for there are no areal or typological features that are specific to 'Altaic' without Uralic."

The languages of East Asia belong to several distinct language families, with many common features attributed to interaction. In the Mainland Southeast Asia linguistic area, Chinese varieties and languages of southeast Asia share many areal features, tending to be analytic languages with similar syllable and tone structure. In the 1st millennium AD, Chinese culture came to dominate East Asia. Classical Chinese was adopted by scholars in Vietnam, Korea, and Japan. There was a massive influx of Chinese vocabulary into these and other neighboring languages. The Chinese script was also adapted to write Vietnamese, Korean, and Japanese, though in the first two the use of Chinese characters is now restricted to university learning, linguistic or historical study, artistic or decorative works and newspapers.

The Balkan sprachbund or Balkan language area is the ensemble of areal features—similarities in grammar, syntax, vocabulary and phonology—among the languages of the Balkans. Several features are found across these languages though not all apply to every single language.

Language convergence is a type of linguistic change in which languages come to structurally resemble one another as a result of prolonged language contact and mutual interference, regardless of whether those languages belong to the same language family, i.e. stem from a common genealogical proto-language. In contrast to other contact-induced language changes like creolization or the formation of mixed languages, convergence refers to a mutual process that results in changes in all the languages involved. Linguists use the term to describe changes in the linguistic patterns of the languages in contact rather than alterations of isolated lexical items.

Paleo-Balkan languages Geographical grouping of Indo-European languages

The Paleo-Balkan languages or Palaeo-Balkan languages is a grouping of various extinct Indo-European languages that were spoken in the Balkans and surrounding areas in ancient times.

In linguistics, areal features are elements shared by languages or dialects in a geographic area, particularly when such features are not descended from a proto-language, or, common ancestor language. That is, an areal feature is contrasted to genealogically determined similarity within the same language family. Features may diffuse from one dominant language to neighbouring languages.

Genetic relationship or genealogical relationship, in linguistics, is the relationship between languages that are members of the same language family.

Thraco-Illyrian is a hypothesis that the Daco-Thracian and Illyrian languages comprise a distinct branch of Indo-European. Thraco-Illyrian is also used as a term merely implying a Thracian-Illyrian interference, mixture or sprachbund, or as a shorthand way of saying that it is not determined whether a subject is to be considered as pertaining to Thracian or Illyrian. Downgraded to a geo-linguistic concept, these languages are referred to as Paleo-Balkan.

The linguistic classification of the ancient Thracian language has long been a matter of contention and uncertainty, and there are widely varying hypotheses regarding its position among other Paleo-Balkan languages. It is not contested, however, that the Thracian languages were Indo-European languages which had acquired satem characteristics by the time they are attested.

In historical linguistics, transphonologization is a type of sound change whereby a phonemic contrast that used to involve a certain feature X evolves in such a way that the contrast is preserved, yet becomes associated with a different feature Y.

The Mesoamerican language area is a sprachbund containing many of the languages natively spoken in the cultural area of Mesoamerica. This sprachbund is defined by an array of syntactic, lexical and phonological traits as well as a number of ethnolinguistic traits found in the languages of Mesoamerica, which belong to a number of language families, such as Uto-Aztecan, Mayan, Totonacan, Oto-Manguean and Mixe–Zoque languages as well as some language isolates and unclassified languages known to the region.

Standard Average European (SAE) is a concept introduced in 1939 by Benjamin Whorf to group the modern Indo-European languages of Europe with shared common features. Whorf argued that these languages were characterized by a number of similarities including syntax and grammar, vocabulary and its use as well as the relationship between contrasting words and their origins, idioms and word order which all made them stand out from many other language groups around the world which do not share these similarities; in essence creating a continental sprachbund. His point was to argue that the disproportionate degree of knowledge of SAE languages biased linguists towards considering grammatical forms to be highly natural or even universal, when in fact they were only peculiar to the SAE language group.

The Pueblo linguistic area is a Sprachbund consisting of the languages spoken in and near North American Pueblo locations. There are also many shared cultural practices in this area. For example, these cultures share many ceremonial vocabulary terms meant for prayer or song.

Mainland Southeast Asia linguistic area Geolinguistic region sharing areal features such as tonality

The Mainland Southeast Asia linguistic area is a sprachbund including languages of the Sino-Tibetan, Hmong–Mien, Kra–Dai, Austronesian and Austroasiatic families spoken in an area stretching from Thailand to China. Neighbouring languages across these families, though presumed unrelated, often have similar typological features, which are believed to have spread by diffusion. James Matisoff referred to this area as the "Sinosphere", contrasted with the "Indosphere", but viewed it as a zone of mutual influence in the ancient period.

Eastern South Slavic

The Eastern South Slavic dialects form the eastern subgroup of the South Slavic languages. They are spoken mostly in Bulgaria, North Macedonia and adjacent areas in the neighbouring countries. They form the so-called Balkan Slavic linguistic area which encompasses the southeastern part of the dialect continuum of South Slavic.

Theodor Capidan

Theodor Capidan was an Ottoman-born Romanian linguist. An ethnic Aromanian from the Macedonia region, he studied at Leipzig before teaching school at Thessaloniki. Following the creation of Greater Romania at the end of World War I, Capidan followed his friend Sextil Pușcariu to the Transylvanian capital Cluj, where he spent nearly two decades, the most productive part of his career. He then taught in Bucharest for a further ten years and was marginalized late in life under the nascent communist regime. Capidan's major contributions involve studies of the Aromanians and the Megleno-Romanians, as well as their respective languages. His research extended to reciprocal influences between Romanian and the surrounding Slavic languages, the Eastern Romance substratum and the Balkan sprachbund, as well as toponymy. He made a significant contribution to projects for a Romanian-language dictionary and atlas.

Albanian–Romanian linguistic relationship Study of the similarities of the Albanian and Romanian languages

The Albanian–Romanian linguistic relationship is a field of the research of the ethnogenesis of both peoples. The common phonological, morphological and syntactical features of the two languages have been studied for more than a century. Both languages are part of the Balkan sprachbund but there are certain elements shared only by Albanian and Romanian. Aside from Latin, and from shared Greek, Slavic and Turkish elements, other characteristics and words are attributed to the Paleo-Balkan linguistic base: Illyrian, Thracian, Dacian and/or Thraco-Illyrian, Daco-Thracian.


