Old Sogdian (Unicode block)

Old Sogdian
Old Sogdian
Range	U+10F00..U+10F2F; (48 code points)
Plane	SMP
Scripts	Old Sogdian
Assigned	40 code points
Unused	8 reserved code points
Unicode version history
11.0 (2018)	40 (+40)
Unicode documentation
	Code chart ∣ Web page
	Note:

Last updated July 27, 2024

Old Sogdian is a Unicode block containing characters for a group of related, non-cursive Sogdian writing systems used to write historic Sogdian in the 3rd to 5th centuries CE.^[3]

Block

Old Sogdian ^[1]^[2] Official Unicode Consortium code chart (PDF)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
U+10F0x	𐼀‎	𐼁‎	𐼂‎	𐼃‎	𐼄‎	𐼅‎	𐼆‎	𐼇‎	𐼈‎	𐼉‎	𐼊‎	𐼋‎	𐼌‎	𐼍‎	𐼎‎	𐼏‎
U+10F1x	𐼐‎	𐼑‎	𐼒‎	𐼓‎	𐼔‎	𐼕‎	𐼖‎	𐼗‎	𐼘‎	𐼙‎	𐼚‎	𐼛‎	𐼜‎	𐼝‎	𐼞‎	𐼟‎
U+10F2x	𐼠‎	𐼡‎	𐼢‎	𐼣‎	𐼤‎	𐼥‎	𐼦‎	𐼧‎
Notes 1. ^ As of Unicode version 15.1 2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Old Sogdian block:

Version	Final code points^{[lower-alpha 1]}	Count	L2 ID	WG2 ID	Document
11.0	U+10F00..10F27	40	L2/00-128		Bunz, Carl-Martin (2000-03-01), Scripts from the Past in Future Versions of Unicode
			L2/01-007		Bunz, Carl-Martin (2000-12-21), "Inscriptional Alphabets (Middle Persian, Parthian) and Sogdian vs. Aramaic", Iranianist Meeting Report: Symposium on Encoding Iranian Scripts in Unicode
			L2/02-009		Bunz, Carl-Martin (2001-11-23), "Sogdian script", 2nd Iranian Meeting Report
			L2/15-149		Anderson, Deborah; Whistler, Ken; McGowan, Rick; Pournader, Roozbeh; Pandey, Anshuman; Glass, Andrew (2015-05-03), "8. Old Sogdian", Recommendations to UTC #143 May 2015 on Script Proposals
			L2/15-089R		Pandey, Anshuman (2015-11-03), Preliminary Proposal to Encode the Old Sogdian Script
			L2/16-037		Anderson, Deborah; Whistler, Ken; McGowan, Rick; Pournader, Roozbeh; Glass, Andrew; Iancu, Laurențiu (2016-01-22), "10. Old Sogdian", Recommendations to UTC #146 January 2016 on Script Proposals
			L2/16-312R	N4814	Pandey, Anshuman (2016-12-01), Proposal to encode the Old Sogdian script
			L2/17-037		Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Glass, Andrew; Iancu, Laurențiu; Moore, Lisa; Liang, Hai; Ishida, Richard; Misra, Karan; McGowan, Rick (2017-01-21), "13. Old Sogdian", Recommendations to UTC #150 January 2017 on Script Proposals
			L2/17-016		Moore, Lisa (2017-02-08), "D.11", UTC #150 Minutes
			L2/17-362		Moore, Lisa (2018-02-02), "Consensus 153-C41", UTC #153 Minutes
↑ Proposed code points and characters names may differ from final code points and names

Font

There is a Unicode font encoding Old Sogdian - Noto Sans Old Sogdian.

Related Research Articles

The Sogdian alphabet was originally used for the Sogdian language, a language in the Iranian family used by the people of Sogdia. The alphabet is derived from Syriac, a descendant script of the Aramaic alphabet. The Sogdian alphabet is one of three scripts used to write the Sogdian language, the others being the Manichaean alphabet and the Syriac alphabet. It was used throughout Central Asia, from the edge of Iran in the west, to China in the east, from approximately 100–1200 A.D.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context. Its block name in Unicode 1.0 was Generic Diacritical Marks.

Spacing Modifier Letters is a Unicode block containing characters for the IPA, UPA, and other phonetic transcriptions. Included are the IPA tone marks, and modifiers for aspiration and palatalization. The word spacing indicates that these characters occupy their own horizontal space within a line of text. Its block name in Unicode 1.0 was simply Modifier Letters.

Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English Dictionary and American dictionaries, and Americanist and Russianist phonetic notations. Its character set is continued in the following Unicode block, Phonetic Extensions Supplement.

Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of the screen and portraying drop shadows. Its block name in Unicode 1.0 was Blocks.

Control Pictures is a Unicode block containing characters for graphically representing the C0 control codes, and other control characters. Its block name in Unicode 1.0 was Pictures for Control Codes.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically for internal use.

Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.

Tibetan is a Unicode block containing characters for the Tibetan, Dzongkha, and other languages of China, Bhutan, Nepal, Mongolia, northern India, eastern Pakistan and Russia.

Cherokee is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee block contains all the uppercase letters plus six lowercase letters. The Cherokee Supplement block, added in version 8.0, contains the rest of the lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

CJK Compatibility Forms is a Unicode block containing vertical glyph variants for east Asian compatibility. Its block name in Unicode 1.0 was CNS 11643 Compatibility, in reference to CNS 11643.

Byzantine Musical Symbols is a Unicode block containing characters for representing Byzantine music in ekphonetic notation.

Halfwidth and Fullwidth Forms is the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the second-to-last block of the Basic Multilingual Plane, followed only by the short Specials block at U+FFF0–FFFF. Its block name in Unicode 1.0 was Halfwidth and Fullwidth Variants.

Manichaean is a Unicode block containing characters historically used for writing Sogdian, Parthian, and the dialects of Fars.

Cherokee Supplement is a Unicode block containing the syllabic characters for writing the Cherokee language. When Cherokee was first added to Unicode in version 3.0 it was treated as a unicameral alphabet, but in version 8.0 it was redefined as a bicameral script. The Cherokee Supplement block contains lowercase letters only, whereas the Cherokee block contains all the uppercase letters, together with six lowercase letters. For backwards compatibility, the Unicode case folding algorithm—which usually converts a string to lowercase characters—maps Cherokee characters to uppercase.

Makasar is a Unicode block containing characters for Makasar script . The script was used historically in South Sulawesi, Indonesia for writing the Makassarese language.

Sogdian is a Unicode block containing characters used to write the Sogdian language from the 7th to 14th centuries CE.

References

↑ "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
↑ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
↑ "Chapter 14: South and Central Asia-III, Ancient Scripts". The Unicode Standard, Version 11.0 (PDF). Mountain View, CA: Unicode, Inc. June 2018. ISBN 978-1-936213-19-1.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[final-4] Proposed code points and characters names may differ from final code points and names

[1] "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.

[2] "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.

[3] "Chapter 14: South and Central Asia-III, Ancient Scripts". The Unicode Standard, Version 11.0 (PDF). Mountain View, CA: Unicode, Inc. June 2018. ISBN 978-1-936213-19-1.

[1]

[2]

[3]

[lower-alpha 1]

Old Sogdian
Range	U+10F00..U+10F2F (48 code points)
Plane	SMP
Scripts	Old Sogdian
Assigned	40 code points
Unused	8 reserved code points
Unicode version history

11.0 (2018)	40 (+40)

Unicode documentation
Code chart ∣ Web page
Note: ^[1]^[2]