Cyrillic (Unicode block)

Last updated
Cyrillic
RangeU+0400..U+04FF
(256 code points)
Plane BMP
Scripts Cyrillic (254 characters)
Inherited (2 characters)
Major alphabets Russian
Ukrainian
Belarusian
Bulgarian
Serbian
Macedonian
Abkhaz
Assigned256 code points
Unused0 reserved code points
Source standards ISO 8859-5
Unicode version history
1.0.0 (1991)192 (+192)
1.0.1 (1992)188 (-4)
1.1 (1993)226 (+38)
3.0 (1999)238 (+12)
3.2 (2002)246 (+8)
4.1 (2005)248 (+2)
5.0 (2006)255 (+7)
5.1 (2008)256 (+1)
Unicode documentation
Code chart ∣ Web page
Note: Four characters (two upper and lower case letter pairs) were removed from the Cyrillic block in version 1.0.1 during the process of unifying with ISO 10646. [1] [2] [3]

Cyrillic is a Unicode block containing the characters used to write the most widely used languages with a Cyrillic orthography. The core of the block is based on the ISO 8859-5 standard, with additions for minority languages and historic orthographies.

Contents

Block

Cyrillic [1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+040x Ѐ Ё Ђ Ѓ Є Ѕ І Ї Ј Љ Њ Ћ Ќ Ѝ Ў Џ
U+041x А Б В Г Д Е Ж З И Й К Л М Н О П
U+042x Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я
U+043x а б в г д е ж з и й к л м н о п
U+044x р с т у ф х ц ч ш щ ъ ы ь э ю я
U+045x ѐ ё ђ ѓ є ѕ і ї ј љ њ ћ ќ ѝ ў џ
U+046x Ѡ ѡ Ѣ ѣ Ѥ ѥ Ѧ ѧ Ѩ ѩ Ѫ ѫ Ѭ ѭ Ѯ ѯ
U+047x Ѱ ѱ Ѳ ѳ Ѵ ѵ Ѷ ѷ Ѹ ѹ Ѻ ѻ Ѽ ѽ Ѿ ѿ
U+048x Ҁ ҁ ҂ ҃ ҄ ҅ ҆ ҇ ҈ ҉ Ҋ ҋ Ҍ ҍ Ҏ ҏ
U+049x Ґ ґ Ғ ғ Ҕ ҕ Җ җ Ҙ ҙ Қ қ Ҝ ҝ Ҟ ҟ
U+04Ax Ҡ ҡ Ң ң Ҥ ҥ Ҧ ҧ Ҩ ҩ Ҫ ҫ Ҭ ҭ Ү ү
U+04Bx Ұ ұ Ҳ ҳ Ҵ ҵ Ҷ ҷ Ҹ ҹ Һ һ Ҽ ҽ Ҿ ҿ
U+04Cx Ӏ Ӂ ӂ Ӄ ӄ Ӆ ӆ Ӈ ӈ Ӊ ӊ Ӌ ӌ Ӎ ӎ ӏ
U+04Dx Ӑ ӑ Ӓ ӓ Ӕ ӕ Ӗ ӗ Ә ә Ӛ ӛ Ӝ ӝ Ӟ ӟ
U+04Ex Ӡ ӡ Ӣ ӣ Ӥ ӥ Ӧ ӧ Ө ө Ӫ ӫ Ӭ ӭ Ӯ ӯ
U+04Fx Ӱ ӱ Ӳ ӳ Ӵ ӵ Ӷ ӷ Ӹ ӹ Ӻ ӻ Ӽ ӽ Ӿ ӿ
Notes
1. ^ As of Unicode version 15.1

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Cyrillic block:

Related Research Articles

<span class="mw-page-title-main">Cyrillic script</span> Writing system used for various Eurasian languages

The Cyrillic script, Slavonic script or simply Slavic script is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Eastern Europe, the Caucasus, Central Asia, North Asia, and East Asia, and used by many other minority languages.

<span class="mw-page-title-main">Ezh</span> Letter of the Latin alphabet

Ezh, also called the "tailed z", is a letter, notable for its use in the International Phonetic Alphabet (IPA) to represent the voiced postalveolar fricative consonant. For example, the pronunciation of "si" in vision and precision, or the ⟨s⟩ in treasure. See also the letter ⟨Ž⟩ as used in many Slavic languages, the Persian alphabet letter ⟨ژ⟩, the Cyrillic letter ⟨Ж⟩, and the Esperanto letter ⟨Ĵ⟩.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic.

ISO 9 is an international standard establishing a system for the transliteration into Latin characters of Cyrillic characters constituting the alphabets of many Slavic and non-Slavic languages.

<span class="mw-page-title-main">Uk (Cyrillic)</span> Cyrillic letter

Uk is a digraph of the early Cyrillic alphabet, although commonly considered and used as a single letter. To save space, it was often written as a vertical ligature (Ꙋ ꙋ), called "monograph Uk". In modern times, ⟨оу⟩ has been replaced by the simple ⟨у⟩.

As of Unicode version 15.1, Cyrillic script is encoded across several blocks:

Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with combining diacritics, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages and the Vietnamese alphabet. Latin Extended-C contains additions for Uighur and the Claudian letters. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (Teuthonista). Latin Extended-F and -G contain characters for phonetic transcription.

Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English dictionary and American dictionaries, and Americanist and Russianist phonetic notations. Its character set is continued in the following Unicode block, Phonetic Extensions Supplement.

Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin.

Cyrillic Extended-A is a Unicode block containing Cyrillic combining characters used in Old Church Slavonic texts.

Cyrillic Extended-B is a Unicode block containing Cyrillic characters for writing Old Cyrillic and Old Abkhazian, and combining numeric signs for Cyrillic numerals used in early Slavic or Church Slavonic texts.

Cyrillic Supplement is a Unicode block containing Cyrillic letters for writing several minority languages, including Abkhaz, Kurdish, Komi, Mordvin, Aleut, Azerbaijani, and Jakovlev's Chuvash orthography.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

Armenian is a Unicode block containing characters for writing the Armenian language, both the traditional Western Armenian and reformed Eastern Armenian orthographies. Five Armenian ligatures are encoded in the Alphabetic Presentation Forms block.

Georgian Supplement is a Unicode block containing characters for the ecclesiastical form of the Georgian script, Nuskhuri. To write the full ecclesiastical Khutsuri orthography, the Asomtavruli capitals encoded in the Georgian block.

Shavian is a Unicode block containing characters of the Shavian alphabet, an orthography invented to write English phonemically and funded by the will of George Bernard Shaw. The Shavian block was derived from an earlier private use encoding in the ConScript Unicode Registry, like the Deseret and Phaistos Disc encodings.

Bamum is a Unicode block containing the characters of stage-G Bamum script, used for modern writing of the Bamum language of western Cameroon. Characters for writing earlier orthographies are contained in a Bamum Supplement block.

Bamum Supplement is a Unicode block containing the characters of the historic stage A-F of the Bamum script, used for writing the Bamum language of western Cameroon. The modern stage G characters, which include many characters used for stage A-F orthographies, are included in the Bamum block.

Saurashtra is a Unicode block containing characters used up to the late 19th century as a primary script for the Saurashtra language. The Saurashtra Unicode encoding supports both traditional and modern Saurashtra orthographies.

Cyrillic Extended-D is a Unicode block containing superscript and subscript Cyrillic characters used in Cyrillic-based phonetic transcription. The block contains the first Cyrillic characters defined outside of the Basic Multilingual Plane (BMP).

References

  1. "Unicode 1.0.1 Addendum" (PDF). The Unicode Standard. 1992-11-03. Retrieved 2016-07-09.
  2. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  3. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.