Coptic Epact Numbers

Last updated
Coptic Epact Numbers
RangeU+102E0..U+102FF
(32 code points)
Plane SMP
Scripts Common (27 char.)
Inherited (1 char.)
Symbol setsNumber forms
Assigned28 code points
Unused4 reserved code points
Unicode version history
7.0 (2014)28 (+28)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

Coptic Epact Numbers is a Unicode block containing Old Coptic number forms.

These numbers were used in some regions instead of letters of the Coptic alphabet that were used for encoding numbers, [3] as was common in much of the world at the time, like Roman numerals. It was used most extensively in the Bohairic dialect of the Coptic language that became the liturgical language of Egyptian Christians. It contains separate characters for each of the digits, 1-9 (0 was not indicated), each of the tens numbers from 10-90, and each of the hundreds numbers from 100-900. Numbers were composed from left-to-right by successively adding the values that each character or digit represented. There is a thousand mark diacritic that multiplies the digit by one thousand (so 5 with thousand mark = 5,000, 900 with thousand mark indicates 900,000) Two of the thousands marks together (visually similar to a tanween al-kasra in Arabic) represents a million in a similar fashion, and mirrors other Coptic conventions of indicating higher orders by repetition of marks.

Coptic Epact Numbers [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+102Ex𐋠𐋡𐋢𐋣𐋤𐋥𐋦𐋧𐋨𐋩𐋪𐋫𐋬𐋭𐋮𐋯
U+102Fx𐋰𐋱𐋲𐋳𐋴𐋵𐋶𐋷𐋸𐋹𐋺𐋻
Notes
1. ^ As of Unicode version 15.1
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Coptic Epact Numbers block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
7.0U+102E0..102FB28 L2/09-163R Pandey, Anshuman (2009-09-15), Proposal to Encode Coptic Numerals in ISO/IEC 10646
L2/10-114 N3786 Pandey, Anshuman (2010-04-10), Towards an Encoding for Coptic Numbers in the UCS
L2/10-108 Moore, Lisa (2010-05-19), "C.8.1", UTC #123 / L2 #220 Minutes
L2/10-206R N3843R Pandey, Anshuman (2010-06-21), Final Proposal to Encode Coptic Numbers
L2/10-333 N3886 Everson, Michael; Emmel, Stephen (2010-09-08), Towards the encoding of a complete set of Coptic numbers
L2/10-421R N3958R Pandey, Anshuman (2010-11-01), Request to Rename 'Coptic Numbers' to 'Coptic Epact Numerals'
L2/10-416R Moore, Lisa (2010-11-09), "Consensus 125-C9", UTC #125 / L2 #222 Minutes, Change the block name (for U+102E0 - U+102FF) from "Coptic Numbers" to "Coptic Epact Numbers" and the character names from "Coptic..." to "Coptic Epact..." for range U+102E0 - U+102FF.
L2/11-035 Lazrek, Azzeddine (2011-01-08), Opposition about encode Coptic Epact numeral system
L2/11-065 Anderson, Deborah (2011-02-09), Comparison of Coptic Epact vs. Rumi digits
L2/11-062R N3990 Pandey, Anshuman (2011-02-14), Final Proposal to Encode Coptic Epact Numbers
L2/11-016 Moore, Lisa (2011-02-15), "C.16.1", UTC #126 / L2 #223 Minutes
N3903 (pdf, doc)"M57.16", Unconfirmed minutes of WG2 meeting 57, 2011-03-31
N4103 "T.2. Coptic Numbers", Unconfirmed minutes of WG 2 meeting 58, 2012-01-03
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. The ISO working group maintaining this series of standards has been disbanded.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9. It is similar to ISO 8859-1, and thus also intended for “Western European” languages, but replaces some less common symbols with the euro sign and some letters that were deemed necessary: This encoding is by far most used, close to half the use, by German, though this is the least used encoding for German.

The Coptic script is the script used for writing the Coptic language, the latest stage of Egyptian. The repertoire of glyphs is based on the uncial Greek alphabet, augmented by letters borrowed from the Egyptian Demotic. It was the first alphabetic script used for the Egyptian language. There are several Coptic alphabets, as the script varies greatly among the various dialects and eras of the Coptic language.

<span class="mw-page-title-main">Character (computing)</span> Primitive data type

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.

Magnetic ink character recognition code, known in short as MICR code, is a character recognition technology used mainly by the banking industry to streamline the processing and clearance of cheques and other documents. MICR encoding, called the MICR line, is at the bottom of cheques and other vouchers and typically includes the document-type indicator, bank code, bank account number, cheque number, cheque amount, and a control indicator. The format for the bank code and bank account number is country-specific.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

<span class="mw-page-title-main">Michael Everson</span> American-Irish type designer (born 1963)

Michael Everson is an American and Irish linguist, script encoder, typesetter, type designer and publisher. He runs a publishing company called Evertype, through which he has published over one hundred books since 2006.

ISO/IEC 9995Information technology — Keyboard layouts for text and office systems is an ISO/IEC standard series defining layout principles for computer keyboards. It does not define specific layouts but provides the base for national and industry standards which define such layouts.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string, since the string ends once the program reads the null character.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 and also legacy characters from the ISO 6937 standard.

KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

The Unicode Standard assigns various properties to each Unicode character and code point.

The regional indicator symbols are a set of 26 alphabetic Unicode characters (A–Z) intended to be used to encode ISO 3166-1 alpha-2 two-letter country codes in a way that allows optional special treatment.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. Pandey, Anshuman (2011-02-14). "N3990: Final Proposal to Encode Coptic Epact Numbers in ISO/IEC 10646" (PDF). ISO/IEC JTC1/SC2/WG2.