Phoenician (Unicode block)

Last updated
Phoenician
RangeU+10900..U+1091F
(32 code points)
Plane SMP
Scripts Phoenician
Assigned29 code points
Unused3 reserved code points
Unicode version history
5.0 (2006)27 (+27)
5.2 (2009)29 (+2)
Unicode documentation
Code chart ∣ Web page
Note: [1] [2]

Phoenician is a Unicode block containing characters used across the Mediterranean world from the 12th century BCE to the 3rd century CE. The Phoenician alphabet was added to the Unicode Standard in July 2006 with the release of version 5.0. An alternative proposal to handle it as a font variation of Hebrew was turned down. (See PDF [ dead link ] summary.)

Contents

The Unicode block for Phoenician is U+10900–U+1091F. It is intended for the representation of text in Paleo-Hebrew, Archaic Phoenician, Phoenician, Early Aramaic, Late Phoenician cursive, Phoenician papyri, Siloam Hebrew, Hebrew seals, Ammonite, Moabite and Punic. [3]

The letters are encoded U+10900 𐤀aleph through to U+10915 𐤕taw, U+10916 𐤖, U+10917 𐤗, U+10918 𐤘 and U+10919 𐤙 encode the numerals 1, 10, 20, and 100, respectively, and U+1091F 𐤟 is the word separator.

Characters

Phoenician [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+1090x𐤀𐤁𐤂𐤃𐤄𐤅𐤆𐤇𐤈𐤉𐤊𐤋𐤌𐤍𐤎𐤏
U+1091x𐤐𐤑𐤒𐤓𐤔𐤕𐤖𐤗𐤘𐤙𐤚𐤛𐤟
Notes
1. ^ As of Unicode version 15.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Phoenician block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
5.0U+10900..10919, 1091F27 N1579 Everson, Michael (1997-05-27), Proposal for encoding the Phoenician script
L2/97-288 N1603 Umamaheswaran, V. S. (1997-10-24), "8.24.1", Unconfirmed Meeting Minutes, WG 2 Meeting # 33, Heraklion, Crete, Greece, 20 June – 4 July 1997
L2/99-013 N1932 Everson, Michael (1998-11-23), Revised proposal for encoding the Phoenician script in the UCS
L2/99-224 N2097, N2025-2 Röllig, W. (1999-07-23), Comments on proposals for the Universal Multiple-Octed Coded Character Set
N2133 Response to comments on the question of encoding Old Semitic scripts in the UCS (N2097), 1999-10-04
L2/00-010 N2103 Umamaheswaran, V. S. (2000-01-05), "10.4", Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13—16
L2/04-149 Kass, James; Anderson, Deborah W.; Snyder, Dean; Lehmann, Reinhard G.; Cowie, Paul James; Kirk, Peter; Cowan, John; Khalaf, S. George; Richmond, Bob (2004-05-25), Miscellaneous Input on Phoenician Encoding Proposal
L2/04-141R2 N2746R2 Everson, Michael (2004-05-29), Final proposal for encoding the Phoenician script in the UCS
L2/04-177 Anderson, Deborah (2004-05-31), Expert Feedback on Phoenician
L2/04-178 N2772 Anderson, Deborah (2004-06-04), Additional Support for Phoenician
L2/04-181 Keown, Elaine (2004-06-04), REBUTTAL to "Final proposal for encoding the Phoenician script in the UCS"
L2/04-190 N2787 Everson, Michael (2004-06-06), Additional examples of the Phoenician script in use
L2/04-187 McGowan, Rick (2004-06-07), Phoenician Recommendation
L2/04-206 N2793 Kirk, Peter (2004-06-07), Response to the revised "Final proposal for encoding the Phoenician script" (L2/04-141R2)
L2/04-213 Rosenne, Jony (2004-06-07), Responses to Several Hebrew Related Items
L2/04-217R Keown, Elaine (2004-06-07), Proposal to add Archaic Mediterranean Script block to ISO 10646
L2/04-226 Durusau, Patrick (2004-06-07), Statement of the Society of Biblical Literature on WG2 N2746R2
L2/04-218 N2792 Snyder, Dean (2004-06-08), Response to the Proposal to Encode Phoenician in Unicode
L2/05-009 N2909 Anderson, Deborah (2005-01-19), Letters in support of Phoenician
5.2U+1091A..1091B2 N3353 (pdf, doc)Umamaheswaran, V. S. (2007-10-10), "M51.14", Unconfirmed minutes of WG 2 meeting 51 Hanzhou, China; 2007-04-24/27
L2/07-206 N3284 Everson, Michael (2007-07-25), Proposal to add two numbers for the Phoenician script
L2/07-225 Moore, Lisa (2007-08-21), "Phoenician", UTC #112 Minutes
  1. Proposed code points and characters names may differ from final code points and names

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji, and non-visual control and formatting codes.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

In Unicode, the Sumero-Akkadian Cuneiform script is covered in three blocks in the Supplementary Multilingual Plane (SMP):

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 15.0, five of the planes have assigned code points (characters), and seven are named.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Unicode Standard assigns various properties to each Unicode character and code point.

Hangul Syllables is a Unicode block containing precomposed Hangul syllable blocks for modern Korean. The syllables can be directly mapped by algorithm to sequences of two or three characters in the Hangul Jamo Unicode block:

Gurmukhi is a Unicode block containing characters for the Punjabi language, in the Gurmukhi script. In its original incarnation, the code points U+0A02..U+0A4C were a direct copy of the Gurmukhi characters A2-EC from the 1988 ISCII standard. The Devanagari, Bengali, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Gujarati is a Unicode block containing characters for writing the Gujarati language. In its original incarnation, the code points U+0A81..U+0AD0 were a direct copy of the Gujarati characters A1-F0 from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01..U+0B4D were a direct copy of the Odia characters A1-ED from the 1988 ISCII standard. The Devanagari, Bengali, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, and Malayalam blocks were similarly all based on their ISCII encodings.

Myanmar is a Unicode block containing characters for the Burmese, Mon, Shan, Palaung, and the Karen languages of Myanmar, as well as the Aiton and Phake languages of Northeast India. It is also used to write Pali and Sanskrit in Myanmar.

CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. Such encodings include the South Korean KS X 1001:1998, Taiwanese Big5, Japanese IBM 32, South Korean KS X 1001:2004, Japanese JIS X 0213, Japanese ARIB STD-B24 and the North Korean KPS 10721-2000 source standards.

Dingbats is a Unicode block containing dingbats. Most of its characters were taken from Zapf Dingbats; it was the Unicode block to have imported characters from a specific typeface; Unicode later adopted a policy that excluded symbols with "no demonstrated need or strong desire to exchange in plain text," and thus no further dingbat typefaces were encoded until Webdings and Wingdings were encoded in Version 7.0. Some ornaments are also an emoji, having optional presentation variants.

Halfwidth and Fullwidth Forms is the name of a Unicode block U+FF00–FFEF, provided so that older encodings containing both halfwidth and fullwidth characters can have lossless translation to/from Unicode. It is the second-to-last block of the Basic Multilingual Plane, followed only by the short Specials block at U+FFF0–FFFF. Its block name in Unicode 1.0 was Halfwidth and Fullwidth Variants.

Latin Extended-E is a Unicode block containing Latin script characters used in German dialectology (Teuthonista), Anthropos alphabet, Sakha and Americanist usage.

<span class="mw-page-title-main">Dogra (Unicode block)</span> Unicode character block

Dogra is a Unicode block for the Dogri script, for writing the Dogri language in Jammu and Kashmir in the northern part of the Indian subcontinent. The Takri script version of Jammu is known as Dogra Akkhar.

Indic Siyaq Numbers is a Unicode block containing a specialized subset of the Arabic script that was used for accounting in India under the Mughals by the 17th century through the middle of the 20th century.

Old Sogdian is a Unicode block containing characters for a group of related, non-cursive Sogdian writing systems used to write historic Sogdian in the 3rd to 5th centuries CE.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. "Middle-East scripts II: Ancient Scripts" (PDF). The Unicode Standard: Version 13.0 – Core Specification. The Unicode Consortium. 2020. Retrieved 2021-01-28.