The Cyrillic script family contains many specially treated two-letter combinations, or digraphs, but few of these are used in Slavic languages. In a few alphabets, trigraphs and even the occasional tetragraph or pentagraph are used.
In early Cyrillic, the digraphs ⟨оу⟩ and ⟨оѵ⟩ were used for /u/. As with the equivalent digraph in Greek, they were reduced to a typographic ligature, ⟨ꙋ⟩, and are now written ⟨у⟩. The modern letters ⟨ы⟩ and ⟨ю⟩ started out as digraphs, ⟨ъі⟩ and ⟨іо⟩. In Church Slavonic printing practice, both historical and modern, ⟨оу⟩ (which is considered as a letter from the alphabet's point of view) is mostly treated as two individual characters, but ⟨ы⟩ is a single letter. For example, letter-spacing affects ⟨оу⟩ as if they were two individual letters, and never affects components of ⟨ы⟩. In a context of Old Slavonic language, ⟨шт⟩ is a digraph that can replace a letter ⟨щ⟩ and vice versa.
Modern Slavic languages written in the Cyrillic alphabet make little or no use of digraphs. There are only two true digraphs: ⟨дж⟩ for /d͡ʒ/ and ⟨дз⟩ for /d͡z/ (Belarusian, Bulgarian, Ukrainian). Sometimes these digraphs are even considered as special letters of their respective alphabets. In standard Russian, however, the letters in ⟨дж⟩ and ⟨дз⟩ are always pronounced separately. Digraph-like letter pairs include combinations of consonants with the soft sign ⟨ь⟩ (Serbian/Macedonian letters ⟨љ⟩ and ⟨њ⟩ are derived from ⟨ль⟩ and ⟨нь⟩), and ⟨жж⟩ or ⟨зж⟩ for the uncommon and optional Russian phoneme /ʑː/. Native descriptions of Cyrillic writing system often use the term "digraph" to combinations ⟨ьо⟩ and ⟨йо⟩ (Bulgarian, Ukrainian) as they both correspond to a single letter ⟨ё⟩ of Russian and Belarusian alphabets (⟨ьо⟩ is used for /ʲo/, and ⟨йо⟩ for /jo/).
Cyrillic uses large numbers of digraphs only when used to write non-Slavic languages; in some languages such as Avar, these are completely regular in formation.
Many Caucasian languages use ⟨ә⟩ (Abkhaz), ⟨у⟩ (Kabardian & Adyghe), or ⟨в⟩ (Avar) for labialization, just as many of them, like Russian, use ⟨ь⟩ for palatalization. Since such sequences are decomposable, regular forms will not be listed below. (In Abkhaz, ⟨ә⟩ with sibilants is equivalent to ⟨ьә⟩, for instance ж /ʐ/, жь /ʒ/~/ʐʲ/, жә /ʒʷ/~/ʐʲʷ/, but this is predictable phonetic detail.) Similarly, long vowels written double in some languages, such as ⟨аа⟩ for Abkhaz /aː/ or ⟨аюу⟩ for Kirghiz /ajuː/ "bear", or with glottal stop, as Tajik аъ[aʔ~aː], are not included.
Archi: а́а [áː], аӏ [aˤ], а́ӏ [áˤ], ааӏ [aːˤ], гв [ɡʷ], гь [h], гъ [ʁ], гъв [ʁʷ], гъӏ [ʁˤ], гъӏв [ʁʷˤ], гӏ [ʕ], е́е [éː], еӏ [eˤ], е́ӏ [éˤ], жв [ʒʷ], зв [zʷ], и́и [íː], иӏ [iˤ], кк [kː], кв [kʷ], ккв [kːʷ], кӏ [kʼ], кӏв [kʷʼ], къ [qʼ], къв [q’ʷ], ккъ [qː’], къӏ [qˤʼ], ккъӏ [qːˤʼ], къӏв [qʷˤʼ], ккъӏв [qːʷˤʼ], кь [kʟ̥ʼ], кьв [kʟ̥ʷʼ], лъ [ɬ], ллъ [ɬː], лъв [ɬʷ], ллъв [ɬːʷ], лӏ [kʟ̥], лӏв [kʟ̥ʷ], о́о [óː], оӏ [oˤ], о́ӏ [óˤ], ооӏ [oːˤ], пп [pː], пӏ [pʼ], сс [sː], св [sʷ], тт [tː], тӏ [tʼ], тв [tʷ], твӏ [t’ʷ], у́у [úː], уӏ [uˤ], у́ӏ [úˤ], хх [χː], хв [χʷ], ххв [χːʷ], хӏ [ħ], хьӏ [χˤ], ххьӏ [χːˤ], хьӏв [χʷˤ], ххьӏв [χːʷˤ], хъ [q], хъв [qʷ], хъӏ [qˤ], хъӏв [qʷˤ], цв [t͡sʷ], цӏ [t͡sʼ], ццӏ [t͡sː], чв [t͡ʃʷ], чӏ [t͡ʃʼ], чӏв [t͡ʃ’ʷ], шв [ʃʷ], щв [ʃːʷ], ээ [əː], эӏ [əˤ]
Avar uses ⟨в⟩ for labialization, as in хьв /xʷ/. Other digraphs are:
The ь digraphs are spelled this way even before vowels, as in гьабуна/habuna/ "made", not *гябуна.
Note that three of these are tetragraphs. However, gemination for the 'strong' consonants in Avar orthography is sporadic, and the simple letters or digraphs are frequently used in their place.
The Belarusian language has the following digraphs:
Chechen uses the following digraphs:
The vowel digraphs are used for front vowels for other Dagestanian languages and also the local Turkic languages Kumyk and Nogay. ⟨Ӏ⟩ digraphs for ejectives is common across the North Caucasus, as is гӏ for /ɣ~ʁ~ʕ/.
Kabardian and Adyghe both use ⟨у⟩ for labialization, as in ӏу /ʔʷ/. гу is /ɡʷ/, though г is /ɣ/); ку is /kʷ/, despite the fact that к is not used outside loan words. [lower-alpha 1]
Other digraphs are:
Labialized, the trigraph becomes the unusual tetragraph кхъу /qʷ/.
Tabasaran uses gemination for its 'strong' consonants, but this has a different value with г.
It uses ⟨в⟩ for labialization of its postalveolar consonants: шв /ʃʷ/, жв /ʒʷ/, чв /tʃʰʷ/, джь /dʒʷ/, ь /tʃʼʷ/, ччь /tʃʷʰː/).
Tatar has a number of vowels which are written with ambiguous letters that are normally resolved by context, but which are resolved by discontinuous digraphs when context is not sufficient. These ambiguous vowel letters are е, front /je/ or back /jɤ/, ю, front /jy/ or back /ju/; and я, front /jæ/ or back /ja/. They interact with the ambiguous consonant letters к, velar /k/ or uvular /q/, and г, velar /ɡ/ or uvular /ʁ/.
In general, velar consonants occur before front vowels and uvular consonants before back vowels, so it is frequently not necessary to specify these values in the orthography. However, this is not always the case. A uvular followed by a front vowel, as in /qærdæʃ/ "kinsman", for example, is written with the corresponding back vowel to specify the uvular value: кардәш. The front value of а is required by vowel harmony with the following front vowel ә, so this spelling is unambiguous.
If, however, the proper value of the vowel is not recoverable through vowel harmony, then the letter ь /ʔ/ is added at the end of the syllable, as in шагыйрь/ʃaʁir/ "poet". That is, /i/ is written with a ы rather than a и to show that the г is pronounced /ʁ/ rather than /ɡ/, then the ь is added to show that the ы is pronounced as if it were a и, so the discontinuous digraph ы...ь is used here to write the vowel /i/. This strategy is also followed with the ambiguous letters е, ю, and я in final syllables, for instance in юнь/jyn/cheap. That is, the discontinuous digraphs е...ь, ю...ь, я...ь are used for /j/ plus the front vowels /e,y,æ/.
Exceptional final-syllable velars and uvulars, however, are written with simple digraphs, with ь for velars and ъ for uvulars: пакь/pak/pure, вәгъдә/wæʁdæ/promise.
The Ukrainian language has the following digraphs:
In the Cyrillization of Mandarin, there are digraphs цз and чж, which correspond to Pinyin z/j and zh. Final n is нь, while н stands for final ng. юй is yu, but ю you, ю- yu-, -уй -ui.
Chechen is a Northeast Caucasian language spoken by approximately 1.8 million people, mostly in the Chechen Republic and by members of the Chechen diaspora throughout Russia and the rest of Europe, Jordan, Austria, Turkey, Azerbaijan, Ukraine, Central Asia and Georgia.
A digraph or digram is a pair of characters used in the orthography of a language to write either a single phoneme, or a sequence of phonemes that does not correspond to the normal values of the two characters combined.
Kabardian also known as East Circassian, is a Northwest Caucasian language closely related to the Adyghe language. Circassian nationalists reject the distinction between the two languages and refer to them both as "Circassian".
The soft sign is a letter in the Cyrillic script that is used in various Slavic languages. In Old Church Slavonic, it represented a short or reduced front vowel. However, over time, the specific vowel sound it denoted was largely eliminated and merged with other vowel sounds.
The Ukrainian alphabet is the set of letters used to write Ukrainian, which is the official language of Ukraine. It is one of several national variations of the Cyrillic script. It comes from the Cyrillic script, which was devised in the 9th century for the first Slavic literary language, called Old Slavonic. In the 10th century, it became used in Kievan Rus' to write Old East Slavic, from which the Belarusian, Russian, Rusyn, and Ukrainian alphabets later evolved. The modern Ukrainian alphabet has 33 letters in total: 21 consonants, 1 semivowel, 10 vowels and 1 palatalization sign. Sometimes the apostrophe (') is also included, which has a phonetic meaning and is a mandatory sign in writing, but is not considered as a letter and is not included in the alphabet.
The letter Ъ ъ of the Cyrillic script is known as er goläm in the Bulgarian alphabet, as the hard sign in the modern Russian and Rusyn alphabets, as the debelo jer in pre-reform Serbian orthography, and as ayirish belgisi in the Uzbek Cyrillic alphabet. The letter is called back yer or back jer and yor or jor in the pre-reform Russian orthography, in Old East Slavic, and in Old Church Slavonic.
Ch is a digraph in the Latin script. It is treated as a letter of its own in the Chamorro, Old Spanish, Czech, Slovak, Igbo, Uzbek, Quechua, Ladino, Guarani, Welsh, Cornish, Breton, Ukrainian, Japanese, Latynka, and Belarusian Łacinka alphabets. Formerly ch was also considered a separate letter for collation purposes in Modern Spanish, Vietnamese, and sometimes in Polish; now the digraph ch in these languages continues to be used, but it is considered as a sequence of letters and sorted as such.
This article deals with the phonology of the standard Ukrainian language.
Numerous Cyrillic alphabets are based on the Cyrillic script. The early Cyrillic alphabet was developed in the 9th century AD and replaced the earlier Glagolitic script developed by the Bulgarian theologians Cyril and Methodius. It is the basis of alphabets used in various languages, past and present, Slavic origin, and non-Slavic languages influenced by Russian. As of 2011, around 252 million people in Eurasia use it as the official alphabet for their national languages. About half of them are in Russia. Cyrillic is one of the most-used writing systems in the world. The creator is Saint Clement of Ohrid from the Preslav literary school in the First Bulgarian Empire.
Proto-Circassian is the reconstructed common ancestor of the Adyghean and Kabardian languages.
A tetragraph is a sequence of four letters used to represent a single sound (phoneme), or a combination of sounds, that do not necessarily correspond to the individual values of the letters. In German, for example, the tetragraph tsch represents the sound of the English digraph ch. English does not have tetragraphs in native words, but chth is a true tetragraph when found initially in words of Greek origin such as chthonian.
Adyghe is a language of the Northwest Caucasian family which, like the other Northwest Caucasian languages, is very rich in consonants, featuring many labialized and ejective consonants. Adyghe is phonologically more complex than Kabardian, having the retroflex consonants and their labialized forms.
The Lezgin language has been written in several different alphabets over the course of its history. These alphabets have been based on three scripts: Arabic script, Latin script, and Cyrillic script.
The Besleney Kabardian dialect is one of the East Circassian dialects and usually considered a dialect of Kabardian. However, because the Besleney tribe lived at the center of Circassia, the Besleney dialect also shares a large number of features with dialects of Western Circassian.
The Komi language, a Uralic language spoken in the north-eastern part of European Russia, has been written in several different alphabets. Currently, Komi writing uses letters from the Cyrillic script. There have been five distinct stages in the history of Komi writing:
There are several conventions for phonetic transcription using the Cyrillic script, typically augmented with Latin and Greek to fill in missing sounds. The details vary by author, and depend on which letters are available for the language of the text. For instance, in a work written in Ukrainian, ⟨г⟩ may be used for, whereas in Russian texts, ⟨г⟩ is used for. This article follows common Russian usage.