Old Sogdian (Unicode block)

Last updated
Old Sogdian
RangeU+10F00..U+10F2F
(48 code points)
Plane SMP
Scripts Old Sogdian
Assigned40 code points
Unused8 reserved code points
Unicode version history
11.0 (2018)40 (+40)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

Old Sogdian is a Unicode block containing characters for a group of related, non-cursive Sogdian writing systems used to write historic Sogdian in the 3rd to 5th centuries CE. [3]

Contents

Block

Old Sogdian [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+10F0x𐼀𐼁𐼂𐼃𐼄𐼅𐼆𐼇𐼈𐼉𐼊𐼋𐼌𐼍𐼎𐼏
U+10F1x𐼐𐼑𐼒𐼓𐼔𐼕𐼖𐼗𐼘𐼙𐼚𐼛𐼜𐼝𐼞𐼟
U+10F2x𐼠𐼡𐼢𐼣𐼤𐼥𐼦𐼧
Notes
1. ^ As of Unicode version 15.1
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Old Sogdian block:

Font

There is a Unicode font encoding Old Sogdian - Noto Sans Old Sogdian.

Related Research Articles

<span class="mw-page-title-main">Sogdian alphabet</span> Alphabet for use with the Sogdian language of central Asia

The Sogdian alphabet was originally used for the Sogdian language, a language in the Iranian family used by the people of Sogdia. The alphabet is derived from Syriac, a descendant script of the Aramaic alphabet. The Sogdian alphabet is one of three scripts used to write the Sogdian language, the others being the Manichaean alphabet and the Syriac alphabet. It was used throughout Central Asia, from the edge of Iran in the west, to China in the east, from approximately 100–1200 A.D.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context. Its block name in Unicode 1.0 was Generic Diacritical Marks.

Spacing Modifier Letters is a Unicode block containing characters for the IPA, UPA, and other phonetic transcriptions. Included are the IPA tone marks, and modifiers for aspiration and palatalization. The word spacing indicates that these characters occupy their own horizontal space within a line of text. Its block name in Unicode 1.0 was simply Modifier Letters.

Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English Dictionary and American dictionaries, and Americanist and Russianist phonetic notations. Its character set is continued in the following Unicode block, Phonetic Extensions Supplement.

Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of the screen and portraying drop shadows. Its block name in Unicode 1.0 was Blocks.

Control Pictures is a Unicode block containing characters for graphically representing the C0 control codes, and other control characters. Its block name in Unicode 1.0 was Pictures for Control Codes.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically for internal use.

Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.

Tibetan is a Unicode block containing characters for the Tibetan, Dzongkha, and other languages of China, Bhutan, Nepal, Mongolia, northern India, eastern Pakistan and Russia.

Cherokee is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee block contains all the uppercase letters plus six lowercase letters. The Cherokee Supplement block, added in version 8.0, contains the rest of the lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

CJK Compatibility Forms is a Unicode block containing vertical glyph variants for east Asian compatibility. Its block name in Unicode 1.0 was CNS 11643 Compatibility, in reference to CNS 11643.

Byzantine Musical Symbols is a Unicode block containing characters for representing Byzantine music in ekphonetic notation.

Halfwidth and Fullwidth Forms is the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the second-to-last block of the Basic Multilingual Plane, followed only by the short Specials block at U+FFF0–FFFF. Its block name in Unicode 1.0 was Halfwidth and Fullwidth Variants.

Manichaean is a Unicode block containing characters historically used for writing Sogdian, Parthian, and the dialects of Fars.

Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

Makasar is a Unicode block containing characters for Makasar script . The script was used historically in South Sulawesi, Indonesia for writing the Makassarese language.

Sogdian is a Unicode block containing characters used to write the Sogdian language from the 7th to 14th centuries CE.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. "Chapter 14: South and Central Asia-III, Ancient Scripts". The Unicode Standard, Version 11.0 (PDF). Mountain View, CA: Unicode, Inc. June 2018. ISBN   978-1-936213-19-1.