Cyrillic Extended-D

Last updated
Cyrillic Extended-D
RangeU+1E030..U+1E08F
(96 code points)
Plane SMP
Scripts Cyrillic
Assigned63 code points
Unused33 reserved code points
Unicode version history
15.0 (2022)63 (+63)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

Cyrillic Extended-D is a Unicode block containing superscript and subscript Cyrillic characters used in Cyrillic-based phonetic transcription. [3] [4] The block contains the first Cyrillic characters defined outside of the Basic Multilingual Plane (BMP).

Contents

Block

Cyrillic Extended-D [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+1E03x𞀰𞀱𞀲𞀳𞀴𞀵𞀶𞀷𞀸𞀹𞀺𞀻𞀼𞀽𞀾𞀿
U+1E04x𞁀𞁁𞁂𞁃𞁄𞁅𞁆𞁇𞁈𞁉𞁊𞁋𞁌𞁍𞁎𞁏
U+1E05x𞁐𞁑𞁒𞁓𞁔𞁕𞁖𞁗𞁘𞁙𞁚𞁛𞁜𞁝𞁞𞁟
U+1E06x𞁠𞁡𞁢𞁣𞁤𞁥𞁦𞁧𞁨𞁩𞁪𞁫𞁬𞁭
U+1E07x
U+1E08x𞂏
Notes
1. ^ As of Unicode version 15.1
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Cyrillic Extended-D block:

Version Final code points [lower-alpha 1] Count L2  IDDocument
15.0U+1E030..1E06D, 1E08F63 L2/21-107 Miller, Kirk (2021-06-07), Unicode request for Cyrillic modifier letters
L2/21-142 Miller, Kirk (2021-06-25), Addendum to L2/21-107, Cyrillic modifier letters
L2/21-130 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Liang, Hai (2021-07-26), "1. Cyrillic", Recommendations to UTC #168 July 2021 on Script Proposals
L2/21-123 Cummings, Craig (2021-08-03), "B.1 Section 1, Cyrillic", Draft Minutes of UTC Meeting 168
L2/22-010 Miller, Kirk (2022-01-07), Addendum II to L2/21-107, Cyrillic modifier letters
L2/22-023 Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Constable, Peter (2022-01-22), "1b. Cyrillic Modifier Letters", Recommendations to UTC #170 January 2022 on Script Proposals
L2/22-016 Constable, Peter (2022-04-21), "D.1 1b Cyrillic Modifier Letters", UTC #170 Minutes
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

As of Unicode version 15.1, Cyrillic script is encoded across several blocks:

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English dictionary and American dictionaries, and Americanist and Russianist phonetic notations. Its character set is continued in the following Unicode block, Phonetic Extensions Supplement.

Latin Extended-D is a Unicode block containing Latin characters for phonetic, Mayanist, and Medieval transcription and notation systems. 89 of the characters in this block are for medieval characters proposed by the Medieval Unicode Font Initiative, many of which are representative of scribal abbreviations used in Medieval manuscript texts.

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 and also legacy characters from the ISO 6937 standard.

Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin.

Cyrillic Extended-A is a Unicode block containing Cyrillic combining characters used in Old Church Slavonic texts.

Cyrillic Extended-B is a Unicode block containing Cyrillic characters for writing Old Cyrillic and Old Abkhazian, and combining numeric signs for Cyrillic numerals used in early Slavic or Church Slavonic texts.

Cyrillic Supplement is a Unicode block containing Cyrillic letters for writing several minority languages, including Abkhaz, Kurdish, Komi, Mordvin, Aleut, Azerbaijani, and Jakovlev's Chuvash orthography.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

Cyrillic is a Unicode block containing the characters used to write the most widely used languages with a Cyrillic orthography. The core of the block is based on the ISO 8859-5 standard, with additions for minority languages and historic orthographies.

CJK Unified Ideographs Extension D is a Unicode block containing uncommon CJK ideographs for Chinese, Japanese, Korean, and Vietnamese, some of which are in current use. Much smaller than most Unicode blocks for CJK unified ideographs, Extension D consists of characters which were submitted to the Ideographic Research Group as "urgently needed characters" between 2006 and 2009. Characters submitted during the same period which were needed less urgently were included in CJK Unified Ideographs Extension E instead.

Kanbun is a Unicode block containing annotation characters used in Japanese copies (kanbun) of Classical Chinese texts, to indicate reading order.

Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text", and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.

CJK Unified Ideographs Extension E is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese submitted to the Ideographic Research Group between 2006 and 2013, excluding the characters submitted as "urgently needed" between 2006 and 2009, which were included in CJK Unified Ideographs Extension D.

Cyrillic Extended-C is a Unicode block containing Cyrillic characters for facsimile reprinting Old Believer service books. They are (contextual) graphic variants of standard Cyrillic rather than distinct letters.

Georgian Extended is a Unicode block containing Georgian Mtavruli letters that function as uppercase versions of their Mkhedruli counterparts in the Georgian block. Unlike all other casing scripts in Unicode, there is no title casing between Mkhedruli and Mtavruli letters, because Mtavruli is typically used only in all-caps text, although there have been some historical attempts at capitalization.

Devanagari Extended-A is a Unicode block containing characters for auspicious signs from Indian inscriptions and manuscripts from the 11th century onward.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. Miller, Kirk (2021-06-07). "L2/21-107: Unicode request for Cyrillic modifier letters" (PDF).
  4. The Unicode Standard (PDF). 15.0.0. The Unicode Consortium. 2022. ISBN   978-1-936213-32-0.