Cuneiform Numbers and Punctuation | |
---|---|
Range | U+12400..U+1247F (128 code points) |
Plane | SMP |
Scripts | Cuneiform |
Symbol sets | Numeric signs Fractions Punctuation |
Assigned | 116 code points |
Unused | 12 reserved code points |
Unicode version history | |
5.0 (2006) | 103 (+103) |
7.0 (2014) | 116 (+13) |
Unicode documentation | |
Code chart ∣ Web page | |
Note: [1] [2] |
In Unicode , the Sumero-Akkadian Cuneiform script is covered in three blocks in the Supplementary Multilingual Plane (SMP):
The sample glyphs in the chart file published by the Unicode Consortium [3] show the characters in their Classical Sumerian form (Early Dynastic period, mid 3rd millennium BCE). The characters as written during the 2nd and 1st millennia BCE, the era during which the vast majority of cuneiform texts were written, are considered font variants of the same characters.
The final proposal for Unicode encoding of the script was submitted by two cuneiform scholars working with an experienced Unicode proposal writer in June 2004. [4] The base character inventory is derived from the list of Ur III signs compiled by the Cuneiform Digital Library Initiative of UCLA based on the inventories of Miguel Civil, Rykle Borger (2003), and Robert Englund. Rather than opting for a direct ordering by glyph shape and complexity, according to the numbering of an existing catalogue, the Unicode order of glyphs was based on the Latin alphabetic order of their 'main' Sumerian transliteration as a practical approximation.
Cuneiform Numbers and Punctuation [1] [2] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+1240x | 𒐀 | 𒐁 | 𒐂 | 𒐃 | 𒐄 | 𒐅 | 𒐆 | 𒐇 | 𒐈 | 𒐉 | 𒐊 | 𒐋 | 𒐌 | 𒐍 | 𒐎 | 𒐏 |
U+1241x | 𒐐 | 𒐑 | 𒐒 | 𒐓 | 𒐔 | 𒐕 | 𒐖 | 𒐗 | 𒐘 | 𒐙 | 𒐚 | 𒐛 | 𒐜 | 𒐝 | 𒐞 | 𒐟 |
U+1242x | 𒐠 | 𒐡 | 𒐢 | 𒐣 | 𒐤 | 𒐥 | 𒐦 | 𒐧 | 𒐨 | 𒐩 | 𒐪 | 𒐫 | 𒐬 | 𒐭 | 𒐮 | 𒐯 |
U+1243x | 𒐰 | 𒐱 | 𒐲 | 𒐳 | 𒐴 | 𒐵 | 𒐶 | 𒐷 | 𒐸 | 𒐹 | 𒐺 | 𒐻 | 𒐼 | 𒐽 | 𒐾 | 𒐿 |
U+1244x | 𒑀 | 𒑁 | 𒑂 | 𒑃 | 𒑄 | 𒑅 | 𒑆 | 𒑇 | 𒑈 | 𒑉 | 𒑊 | 𒑋 | 𒑌 | 𒑍 | 𒑎 | 𒑏 |
U+1245x | 𒑐 | 𒑑 | 𒑒 | 𒑓 | 𒑔 | 𒑕 | 𒑖 | 𒑗 | 𒑘 | 𒑙 | 𒑚 | 𒑛 | 𒑜 | 𒑝 | 𒑞 | 𒑟 |
U+1246x | 𒑠 | 𒑡 | 𒑢 | 𒑣 | 𒑤 | 𒑥 | 𒑦 | 𒑧 | 𒑨 | 𒑩 | 𒑪 | 𒑫 | 𒑬 | 𒑭 | 𒑮 | |
U+1247x | 𒑰 | 𒑱 | 𒑲 | 𒑳 | 𒑴 | |||||||||||
Notes |
The following table allows matching of Borger's 1981 and 2003 numbering with Unicode characters [5] The "primary" transliteration column has the glyphs' Sumerian values as given by the official glyph name, slightly modified here for legibility by including traditional assyriological symbols such as "x" rather than "TIMES". The exact Unicode names can be unambiguously recovered by prefixing, "CUNEIFORM [NUMERIC] SIGN", replacing "TIMES" for "x", "PLUS" for "+" and "OVER" for "/", "ASTERISK" for "*", "H" for "Ḫ", "SH" for "Š", and switching to uppercase.
Sign | Code point | Name | Borger (2003) | Borger (1981) | Comments |
---|---|---|---|---|---|
𒀸 | U+12038 | one AŠ | 001 | 1, from general Cuneiform_(Unicode_block) not this block | |
𒐀 | U+12400 | two AŠ | 002 | 2 | 2, = U+1212C |
𒐁 | U+12401 | three AŠ | 004 | 3, EŠ6 | |
𒐂 | U+12402 | four AŠ | 215 | 124,42 | 4, LIMMU2, LIMM2, TAB.TAB |
𒐃 | U+12403 | five AŠ | 216 | 5, IA7, TAB.TAB.AŠ | |
𒐄 | U+12404 | six AŠ | 217 | 6, AŠ4, TAB.TAB.TAB | |
𒐅 | U+12405 | seven AŠ | 218 | 7, IMIN2, TAB.TAB.TAB.AŠ | |
𒐆 | U+12406 | eight AŠ | 219 | 8, USSU2, TAB.TAB.TAB.TAB | |
𒐇 | U+12407 | nine AŠ | 220 | 9, ILIMMU2, TAB.TAB.TAB.TAB.AŠ | |
𒐈 | U+12408 | three DIŠ | 834 | 593 | 3, 180, EŠ5 |
𒐉 | U+12409 | four DIŠ | 851; 852; 853 | 316 | 4, 240, ZA, LIMMU5, NIGIDALIMMU, = U+1235D |
𒐊 | U+1240A | five DIŠ | 861 | 598a | 5, 300, IA2 |
𒐋 | U+1240B | six DIŠ | 862 | 598b | 6, 360, AŠ3 |
𒐌 | U+1240C | seven DIŠ | 863 | 598c | 7, 420 |
𒐍 | U+1240D | eight DIŠ | 864 | 598d | 8, 480 |
𒐎 | U+1240E | nine DIŠ | 9, 540 | ||
𒐏 | U+1240F | four U | 713 | 474 | 40, NIMIN |
𒐐 | U+12410 | five U | 714 | 475 | 50, NINNU |
𒐑 | U+12411 | six U | 715 | 476 | 60 |
𒐒 | U+12412 | seven U | 716 | 477 | 70 |
𒐓 | U+12413 | eight U | 717 | 478 | 80 |
𒐔 | U+12414 | nine U | 718 | 479 | 90 |
𒐕 | U+12415 | one GEŠ2 | |||
𒐖 | U+12416 | two GEŠ2 | |||
𒐗 | U+12417 | three GEŠ2 | |||
𒐘 | U+12418 | four GEŠ2 | |||
𒐙 | U+12419 | five GEŠ2 | |||
𒐚 | U+1241A | six GEŠ2 | |||
𒐛 | U+1241B | seven GEŠ2 | |||
𒐜 | U+1241C | eight GEŠ2 | |||
𒐝 | U+1241D | nine GEŠ2 | |||
𒐞 | U+1241E | one GEŠU | 824 | 534 | GEŠ2.U; 600 or 70 |
𒐟 | U+1241F | two GEŠU | 1200 or 80 | ||
𒐠 | U+12420 | three GEŠU | 1800 or 90 | ||
𒐡 | U+12421 | four GEŠU | 2400 or 100 | ||
𒐢 | U+12422 | five GEŠU | 3000 or 110 | ||
𒐣 | U+12423 | two ŠAR2 | |||
𒐤 | U+12424 | three ŠAR2 | |||
𒐥 | U+12425 | three ŠAR2 variant form | |||
𒐦 | U+12426 | four ŠAR2 | |||
𒐧 | U+12427 | five ŠAR2 | |||
𒐨 | U+12428 | six ŠAR2 | |||
𒐩 | U+12429 | seven ŠAR2 | |||
𒐪 | U+1242A | eight ŠAR2 | |||
𒐫 | U+1242B | nine ŠAR2 | |||
𒐬 | U+1242C | one ŠARU | 653 | 409 | 36,000 |
𒐭 | U+1242D | two ŠARU | 72,000 | ||
𒐮 | U+1242E | three ŠARU | 108,000 | ||
𒐯 | U+1242F | three ŠARU variant form | 108,000 | ||
𒐰 | U+12430 | four ŠARU | 144,000 | ||
𒐱 | U+12431 | five ŠARU | 180,000 | ||
𒐲 | U+12432 | ŠAR2 x GAL.DIŠ | 651 | 408 | 216,000 |
𒐳 | U+12433 | ŠAR2 x GAL.MIN | 652 | 408 | 432,000 |
𒐴 | U+12434 | one BURU | 662 | 350,8 | U gunû |
𒐵 | U+12435 | two BURU | |||
𒐶 | U+12436 | three BURU | |||
𒐷 | U+12437 | three BURU variant form | |||
𒐸 | U+12438 | four BURU | |||
𒐹 | U+12439 | five BURU | |||
𒐺 | U+1243A | EŠ16 | 505 | 3, = U+1203C | |
𒐻 | U+1243B | EŠ21 | 210 | 3 | |
𒐼 | U+1243C | LIMMU | 859; 860 | 4, NIG2, GAR, NINDA | |
𒐽 | U+1243D | LIMMU4 | 506 | 4 | |
𒐾 | U+1243E | ||||
𒐿 | U+1243F | ||||
𒑀 | U+12440 | AŠ9 | 536 | 6, EŠ16.EŠ16 | |
𒑁 | U+12441 | IMIN3 | 537 | 7, UMUN9 | |
𒑂 | U+12442 | IMIN | 863 | 7 | |
𒑃 | U+12443 | IMIN variant form | 866 | 7 | |
𒑄 | U+12444 | USSU | 867 | 8 | |
𒑅 | U+12445 | USSU3 | 538 | 8 | |
𒑆 | U+12446 | ILIMMU | 868 | 9 | |
𒑇 | U+12447 | ILIMMU3 | 539 | 9, EŠ16.EŠ16.EŠ16 | |
𒑈 | U+12448 | ILIMMU4 | 577 | 9 | |
𒑉 | U+12449 | DIŠ / DIŠ / DIŠ | 865v | 9 | |
𒑊 | U+1244A | two AŠ tenû | 593 | ||
𒑋 | U+1244B | three AŠ tenû | 629 | ||
𒑌 | U+1244C | four AŠ tenû | 854 | 379; 380 | ZA tenû, ERIM tenû |
𒑍 | U+1244D | five AŠ tenû | |||
𒑎 | U+1244E | six AŠ tenû | |||
𒑏 | U+1244F | one BAN2 | 122 | = U+12047 | |
𒑐 | U+12450 | two BAN2 | |||
𒑑 | U+12451 | three BAN2 | |||
𒑒 | U+12452 | four BAN2 | |||
𒑓 | U+12453 | four BAN2 variant form | |||
𒑔 | U+12454 | five BAN2 | |||
𒑕 | U+12455 | five BAN2 variant form | |||
𒑖 | U+12456 | NIGIDAMIN | 847, 848 | ||
𒑗 | U+12457 | NIGIDAEŠ | 850 | ||
𒑘 | U+12458 | one EŠE3 | = U+12041, U+12300 | ||
𒑙 | U+12459 | two EŠE3 | = U+12049 | ||
𒑚 | U+1245A | one third | 826 | 571 | ŠUŠANA |
𒑛 | U+1245B | two thirds | 832 | 572 | |
𒑜 | U+1245C | five sixths | 838 | 573 | KINGUSILA |
𒑝 | U+1245D | one third variant form | |||
𒑞 | U+1245E | two thirds variant form | |||
𒑟 | U+1245F | one eighth | |||
𒑠 | U+12460 | one quarter | |||
𒑡 | U+12461 | Old Assyrian one sixth | 630 | Kültepe only | |
𒑢 | U+12462 | Old Assyrian one quarter | |||
𒑰 | U+12470 | Old Assyrian word divider | |||
𒑱 | U+12471 | vertical colon | 592 | Glossenkeil | |
𒑲 | U+12472 | diagonal colon | 592 | Glossenkeil | |
𒑳 | U+12473 | diagonal tricolon | |||
The following Unicode-related documents record the purpose and process of defining specific characters in the Cuneiform Numbers and Punctuation block:
Version | Final code points [lower-alpha 1] | Count | L2 ID | WG2 ID | Document |
---|---|---|---|---|---|
5.0 | U+12400..12462, 12470..12473 | 103 | L2/00-128 | Bunz, Carl-Martin (2000-03-01), Scripts from the Past in Future Versions of Unicode | |
L2/00-153 | Bunz, Carl-Martin (2000-04-26), Further comments on historic scripts | ||||
L2/00-398 | Snyder, Dean (2000-11-07), Cuneiform: From Clay Tablet to Computer | ||||
L2/00-419 | N2297 | Everson, Michael (2000-11-20), Legacy cuneiform font implementations and the ICE project | |||
L2/03-162 | N2585 | Everson, Michael; Feuerherm, Karljürgen (2003-05-25), Basic principles for the encoding of Sumero-Akkadian Cuneiform | |||
L2/03-415 | Snyder, Dean (2003-11-01), Proposal to Encode the Sumero-Akkadian Cuneiform Script in the UCS | ||||
L2/03-393R | N2664R | Everson, Michael; Feuerherm, Karljürgen; Tinney, Steve (2003-11-03), Preliminary proposal to encode Cuneiform script in the SMP of the UCS | |||
L2/03-416 | Anderson, Lloyd (2003-11-03), The Cuneiform Encoding Proposal -- a View of its Current Status | ||||
L2/04-080 | Tinney, Steve (2004-01-24), Rationale for changes to N2664R | ||||
L2/04-036 | N2698 | Everson, Michael; Feuerherm, Karljürgen; Tinney, Steve (2004-01-29), Revised proposal to encode Cuneiform script in the SMP of the UCS | |||
L2/04-041 | Anderson, Lloyd (2004-01-29), Fitting Cuneiform Encoding to Cuneiform Script | ||||
L2/04-059 | Feuerherm, Karljürgen (2004-01-30), Short Response to L2/04-041 "Fitting Cuneiform Encoding to Cuneiform Script" | ||||
L2/04-063 | Gewecke, Tom (2004-01-30), Re: Cuneiform at UTC | ||||
L2/04-056 | Veldhuis, Niek (2004-01-31), Letter re "Cuneiform Unicode" | ||||
L2/04-057 | Jones, Charles E. (2004-02-01), Letter re "Cuneiform" | ||||
L2/04-058 | Michalowski, Piotr (2004-02-01), Letter re "cuneiform unicode" | ||||
L2/04-064 | Cooper, Jerry (2004-02-01), Letter re "unicode proposal" | ||||
L2/04-066 | Durusau, Patrick (2004-02-02), Letter re "Proposal N2698" | ||||
L2/04-081 | Black, Jeremy (2004-02-02), Letter re "cuneiform Unicode proposal" | ||||
L2/04-086 | Anderson, Lloyd (2004-02-03), Notes for verbal presentation to UTC meeting, 3 February 2004 | ||||
L2/04-099 | Anderson, Lloyd (2004-02-09), Unification of cuneiform numbers | ||||
L2/04-225 | Anderson, Lloyd (2004-06-07), Proposed modifications to introductory text of N2798 = L204-189 Proposal for Cuneiform Encoding | ||||
L2/04-189 | N2786 | Everson, Michael; Feuerherm, Karljürgen; Tinney, Steve (2004-06-08), Final proposal to encode Cuneiform script | |||
L2/04-223R | Anderson, Lloyd (2004-06-11), Proposed modifications to delete and add signs to N2798 = L204-189 Proposal for Cuneiform Encoding | ||||
L2/04-354 | McGowan, Rick (2004-09-20), Cuneiform Properties | ||||
L2/05-135 | Tinney, Steve (2005-05-10), Corrections to N2786 | ||||
L2/05-174 | Everson, Michael (2005-07-28), Irish comments on Cuneiform | ||||
L2/05-108R | Moore, Lisa (2005-08-26), "Cuneiform (C.17)", UTC #103 Minutes | ||||
N2953 (pdf, doc) | Umamaheswaran, V. S. (2006-02-16), "M47.12", Unconfirmed minutes of WG 2 meeting 47, Sophia Antipolis, France; 2005-09-12/15 | ||||
L2/12-112 | Moore, Lisa (2012-05-17), "Consensus 131-C30", UTC #131 / L2 #228 Minutes, Change the numeric values for 1240F..12414 to 40..90, for Unicode 6.2. | ||||
L2/12-240 | Davis, Mark (2012-07-20), Property Issues for U6.2 | ||||
L2/12-239 | Moore, Lisa (2012-08-14), "Consensus 132-C19", UTC #132 Minutes, Give U+12432 and U+12433 the numeric type "numeric" and the numeric values 216,000, and 432,000 respectively. Make U+12456 and 12457 have the numeric type "numeric" and value "-1". | ||||
L2/12-328 | Anderson, Deborah (2012-10-16), Numeric value fixes for two cuneiform characters | ||||
L2/12-343R2 | Moore, Lisa (2012-12-04), "Consensus 133-C30", UTC #133 Minutes, Change the numeric value of U+12456 to 2 and U+12457 to 3, for Unicode 6.3. | ||||
7.0 | U+12463..1246E, 12474 | 13 | L2/12-002 | N4178R | Everson, Michael; Tinney, Steve (2012-01-16), Proposal for additions and corrections to Sumero-Akkadian Cuneiform |
L2/12-207R | N4277R | Everson, Michael; Tinney, Steve (2012-07-31), Proposal for additions and corrections to Sumero-Akkadian Cuneiform | |||
L2/12-239 | Moore, Lisa (2012-08-14), "C.3", UTC #132 Minutes | ||||
|
The Coptic script is the script used for writing the Coptic language, the most recent development of Egyptian. The repertoire of glyphs is based on the uncial Greek alphabet, augmented by letters borrowed from the Egyptian Demotic. It was the first alphabetic script used for the Egyptian language. There are several Coptic alphabets, as the script varies greatly among the various dialects and eras of the Coptic language.
The Ugaritic writing system is a cuneiform abjad with syllabic elements used from around either 1400 BCE or 1300 BCE for Ugaritic, an extinct Northwest Semitic language. It was discovered in Ugarit, modern Ras Al Shamra, Syria, in 1928. It has 30 letters. Other languages, particularly Hurrian, were occasionally written in the Ugaritic script in the area around Ugarit, although not elsewhere.
Cuneiform is a logo-syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform scripts are marked by and named for the characteristic wedge-shaped impressions which form their signs. Cuneiform is the earliest known writing system and was originally developed to write the Sumerian language of southern Mesopotamia.
A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.
Lugal is the Sumerian term for "king, ruler". Literally, the term means "big man." In Sumerian, lú "𒇽" is "man" and gal "𒃲" is "great", or "big."
Symbol is one of the four standard fonts available on all PostScript-based printers, starting with Apple's original LaserWriter (1985). It contains a complete unaccented Greek alphabet and a selection of commonly used mathematical symbols. Insofar as it fits into any standard classification, it is a serif font designed in the style of Times New Roman.
Linear Elamite was a writing system used in Elam during the Bronze Age between c. 2300 and 1850 BCE, and known mainly from a few extant monumental inscriptions. It was used contemporaneously with Elamite cuneiform and records the Elamite language. The French archaeologist François Desset and his colleagues have argued that it is the oldest known purely phonographic writing system, although others, such as the linguist Michael Mäder, have argued that it is partly logographic.
Old Persian cuneiform is a semi-alphabetic cuneiform script that was the primary script for Old Persian. Texts written in this cuneiform have been found in Iran, Armenia, Romania (Gherla), Turkey, and along the Suez Canal. They were mostly inscriptions from the time period of Darius I, such as the DNa inscription, as well as his son, Xerxes I. Later kings down to Artaxerxes III used more recent forms of the language classified as "pre-Middle Persian".
In Unicode, the Sumero-Akkadian Cuneiform script is covered in three blocks in the Supplementary Multilingual Plane (SMP):
The Winkelhaken, also simply called a hook, is one of five basic wedge elements appearing in the composition of signs in Akkadian cuneiform. It was realized by pressing the point of the stylus into the clay.
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.
A numeral is a character that denotes a number. The decimal number digits 0–9 are used widely in various writing systems throughout the world, however the graphemes representing the decimal digits differ widely. Therefore Unicode includes 22 different sets of graphemes for the decimal digits, and also various decimal points, thousands separators, negative signs, etc. Unicode also includes several non-decimal numerals such as Aegean numerals, Roman numerals, counting rod numerals, Mayan numerals, Cuneiform numerals and ancient Greek numerals. There is also a large number of typographical variations of the Western Arabic numerals provided for specialized mathematical use and for compatibility with earlier character sets, such as ² or ②, and composite characters such as ½.
Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 15.1, five of the planes have assigned code points (characters), and seven are named.
KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.
The rupee sign "₨" is a currency sign used to represent the monetary unit of account in Pakistan, Sri Lanka, Nepal, Mauritius, Seychelles, and formerly in India. It resembles, and is often written as, the Latin character sequence "Rs", of which it is an orthographic ligature.
Ukkin (UKKIN) is the Sumerian word or symbol for assembly, temple council or Divine council, written ideographically with the cuneiform sign 𒌺.
Runic is a Unicode block containing runic characters. It was introduced in Unicode 3.0 (1999), with eight additional characters introduced in Unicode 7.0 (2014). The original encoding of runes in UCS was based on the recommendations of the "ISO Runes Project" submitted in 1997.
Early Dynastic Cuneiform is the name of a Unicode block of the Supplementary Multilingual Plane (SMP), at U+12480–U+1254F, introduced in version 8.0. It is a supplement to the earlier encoding of the cuneiform script in the two blocks U+12000–U+123FF "Cuneiform" and U+12400–U+1247F "Cuneiform Numbers and Punctuation".