Avestan (Unicode block)

Last updated
Avestan
RangeU+10B00..U+10B3F
(64 code points)
Plane SMP
Scripts Avestan
Major alphabetsPazand
Assigned61 code points
Unused3 reserved code points
Unicode version history
5.261 (+61)
Note: [1] [2]

Avestan is a Unicode block containing characters devised for recording the Zoroastrian religious texts, Avesta, and was used to write the Middle Persian, or Pazand language.

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

Avesta Zoroastrian compendium of sacred literature

The Avesta is the primary collection of religious texts of Zoroastrianism, composed in the otherwise unrecorded Avestan language.

Middle Persian or Pahlavi, also known by its endonym as Parsik, is a Western Middle Iranian language which became the literary language of the Sasanian Empire. For some time after the Sasanian collapse, Middle Persian continued to function as a prestige language. It descended from Old Persian, the language of Achaemenid Empire, and it is the linguistic ancestor of Modern Persian.

Avestan [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+10B0x𐬀𐬁𐬂𐬃𐬄𐬅𐬆𐬇𐬈𐬉𐬊𐬋𐬌𐬍𐬎𐬏
U+10B1x𐬐𐬑𐬒𐬓𐬔𐬕𐬖𐬗𐬘𐬙𐬚𐬛𐬜𐬝𐬞𐬟
U+10B2x𐬠𐬡𐬢𐬣𐬤𐬥𐬦𐬧𐬨𐬩𐬪𐬫𐬬𐬭𐬮𐬯
U+10B3x𐬰𐬱𐬲𐬳𐬴𐬵𐬹𐬺𐬻𐬼𐬽𐬾𐬿
Notes
1. ^ As of Unicode version 12.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Avestan block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
5.2U+10B00..10B35, 10B39..10B3F61 L2/98-031 N1684 Everson, Michael (1998-01-18), Proposal to encode Avestan in the BMP of ISO/IEC 10646
L2/98-070 Aliprand, Joan; Winkler, Arnold, "3.A.3. item a. Avestan", Minutes of the joint UTC and L2 meeting from the meeting in Cupertino, February 25-27, 1998
L2/98-286 N1703 Umamaheswaran, V. S.; Ksar, Mike (1998-07-02), "8.9.4", Unconfirmed Meeting Minutes, WG 2 Meeting #34, Redmond, WA, USA; 1998-03-16--20
L2/00-128 Bunz, Carl-Martin (2000-03-01), Scripts from the Past in Future Versions of Unicode
L2/01-007 Bunz, Carl-Martin (2000-12-21), "Avestan", Iranianist Meeting Report: Symposium on Encoding Iranian Scripts in Unicode
L2/02-009 Bunz, Carl-Martin (2001-11-23), "Avestan and Pahlavi scripts", 2nd Iranian Meeting Report
L2/02-450 Gippert, Jost (2002-11-29), 3rd Iranian Unicode Conference: Conference material (29-11-2002)
L2/02-449 N2556 Everson, Michael (2002-12-04), Revised proposal to encode the Avestan and Pahlavi script in the UCS
L2/06-335 N3178 Everson, Michael; Pournader, Roozbeh (2006-10-20), Proposal to encode the Avestan script in the BMP of the UCS
L2/06-375 Gippert, Jost (2006-11-06), Note from Jost Gippert to Deborah Anderson in support of Avestan
L2/07-004 N3193 Everson, Michael; Baker, Peter; Dohnicht, Marcus; Emiliano, António; Haugen, Odd Einar; Pedro, Susana; Perry, David J.; Pournader, Roozbeh (2007-01-09), Proposal to add Medievalist and Iranianist punctuation characters to the UCS
L2/07-015 Moore, Lisa (2007-02-08), "Consensus 110-C30", UTC #110 Minutes
L2/07-006R N3197R Everson, Michael; Pournader, Roozbeh (2007-03-22), Revised proposal to encode the Avestan script in the SMP of the UCS
L2/07-268 N3253 (pdf, doc)Umamaheswaran, V. S. (2007-07-26), "M50.35", Unconfirmed minutes of WG 2 meeting 50, Frankfurt-am-Main, Germany; 2007-04-24/27
L2/07-304 N3336 Anderson, Deborah (2007-09-13), Comments on the Avestan Separation Point
L2/08-088 N3443 Anderson, Deborah (2008-01-28), Additional Comments on the Avestan Separation Point
L2/08-173 N3444 Anderson, Deborah (2008-04-13), Expert Feedback on AVESTAN SEPARATION POINT
L2/08-155 Anderson, Deborah (2008-04-14), Expert Feedback on Avestan Separation Point by Profs. Skjaervo, Jamison, and Watkins
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0-25FF.

Letterlike Symbols is a Unicode block containing 80 characters which are constructed mainly from the glyphs of one or more letters. In addition to this block, Unicode includes full styled mathematical alphabets, although Unicode does not explicitly categorise these characters as being "letterlike".

Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context.

Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of the screen and portraying drop shadows.

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 and also legacy characters from the ISO 6937 standard.

Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts.

CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages.

Cyrillic is a Unicode block containing the characters used to write the most widely used languages with a Cyrillic orthography. The core of the block is based on the ISO 8859-5 standard, with additions for minority languages and historic orthographies.

Hebrew is a Unicode block containing characters for writing the Hebrew, Yiddish, Ladino, and other Jewish diaspora languages.

Tibetan is a Unicode block containing characters for the Tibetan, Dzongkha, and other languages of Tibet, Bhutan, Nepal, and northern India. The Tibetan Unicode block is unique for having been allocated as a standard virama-based encoding for version 1.0, removed from the Unicode Standard when unifying with ISO 10646 for version 1.1, then reintroduced as an explicit root/subjoined encoding, with a larger block size in version 2.0.

Hiragana is a Unicode block containing hiragana characters for the Japanese language.

Katakana is a Unicode block containing katakana characters for the Japanese and Ainu languages.

Katakana Phonetic Extensions is a Unicode block containing additional small katakana characters for writing the Ainu language, in addition to characters in the Katakana block.

Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. During the unification with ISO 10646 for version 1.1, the Japanese Industrial Standard Symbol was reassigned from the code point U+32FF at the end of the block to U+3004. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.

Kana Supplement is a Unicode block containing one archaic katakana character and 255 hentaigana characters. Additional hentaigana characters are encoded in the Kana Extended-A block.

Byzantine Musical Symbols is a Unicode block containing characters for representing Byzantine-era musical notation.

Ancient Greek Musical Notation is a Unicode block containing symbols representing musical notations used in ancient Greece.

Tai Viet is a Unicode block containing characters for writing the Tai languages Tai Dam, Tai Dón, and Thai Song.

Halfwidth and Fullwidth Forms is the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the last of the Basic Multilingual Plane excepting the short Specials block at U+FFF0–FFFF.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.