Greek Extended

Last updated
Greek Extended
RangeU+1F00..U+1FFF
(256 code points)
Plane BMP
Scripts Greek
Major alphabets polytonic Greek
Assigned233 code points
Unused23 reserved code points
Unicode version history
1.1233 (+233)
Note: [1] [2]

Greek Extended is a Unicode block containing the accented vowels necessary for writing polytonic Greek. The regular, unaccented Greek characters as well as the characters with tonos and diaeresis can be found in the Greek and Coptic (Unicode block). Greek Extended was encoded in version 1.1 of the Unicode Standard. As an alternative to Greek Extended, combining characters can be used to represent the tones and breath marks of polytonic Greek.

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

The diaeresis and the umlaut are two homoglyphic diacritical marks that consist of two dots ( ¨ ) placed over a letter, usually a vowel. When that letter is an i or a j, the diacritic replaces the tittle: ï.

Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context.

Contents

In this block, the letters with oxia (acute accent) and no other accent are not used in any of the Unicode normalizations. Decomposition of U+1F71GREEK SMALL LETTER ALPHA WITH OXIA, for example, yields U+03B1αGREEK SMALL LETTER ALPHA followed by a U+0301́COMBINING ACUTE ACCENT, while composition yields the same letter with tonos, U+03ACάGREEK SMALL LETTER ALPHA WITH TONOS, from the Greek and Coptic block.

The acute accent is a diacritic used in many modern written languages with alphabets based on the Latin, Cyrillic, and Greek scripts.

Greek Extended [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+1F0x
U+1F1x
U+1F2x
U+1F3x Ἷ
U+1F4x
U+1F5x
U+1F6x
U+1F7x
U+1F8x
U+1F9x
U+1FAx
U+1FBx ᾿
U+1FCx
U+1FDx
U+1FEx
U+1FFx
Notes
1. ^ As of Unicode version 12.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Greek Extended block:

Version Final code points [lower-alpha 1] Count UTC  ID L2  ID WG2  IDDocument
1.1U+1F00..1F15, 1F18..1F1D, 1F20..1F45, 1F48..1F4D, 1F50..1F57, 1F59, 1F5B, 1F5D, 1F5F..1F7D, 1F80..1FB4, 1FB6..1FC4, 1FC6..1FD3, 1FD6..1FDB, 1FDD..1FEF, 1FF2..1FF4, 1FF6..1FFE233(to be determined)
X3L2/95-090 N1253 (doc, txt)Umamaheswaran, V. S.; Ksar, Mike (1995-09-09), "4.2", Unconfirmed Minutes of WG 2 Meeting # 28 in Helsinki, Finland; 1995-06-26--27
UTC/1999-017 Davis, Mark (1999-06-02), Data cross-checks (for Agenda)
L2/99-176R Moore, Lisa (1999-11-04), "Data Cross-Checks", Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999
  1. Proposed code points and characters names may differ from final code points and names

See also

Related Research Articles

Alpha is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of 1.

The Coptic alphabet is the script used for writing the Coptic language. The repertoire of glyphs is based on the Greek alphabet augmented by letters borrowed from the Egyptian Demotic and is the first alphabetic script used for the Egyptian language. There are several Coptic alphabets, as the Coptic writing system may vary greatly among the various dialects and subdialects of the Coptic language.

Mu or my is the 12th letter of the Greek alphabet. In the system of Greek numerals it has a value of 40. Mu was derived from the Egyptian hieroglyphic symbol for water, which had been simplified by the Phoenicians and named after their word for water, to become 𐤌 (mem). Letters that arose from mu include the Roman M and the Cyrillic М.

In the polytonic orthography of Ancient Greek, the rough breathing, is a diacritical mark used to indicate the presence of an sound before a vowel, diphthong, or after rho. It remained in the polytonic orthography even after the Hellenistic period, when the sound disappeared from the Greek language. In the monotonic orthography of Modern Greek phonology, in use since 1982, it is not used at all.

A precomposed character is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacritical mark, such as é. Technically, é (U+00E9) is a character that can be decomposed into an equivalent string of the base letter e (U+0065) and combining acute accent (U+0301). Similarly, ligatures are precompositions of their constituent letters or graphemes.

The Greek alphabet has been used to write the Greek language since the late ninth or early eighth century BC. It is derived from the earlier Phoenician alphabet, and was the first alphabetic script to have distinct letters for vowels as well as consonants. In Archaic and early Classical times, the Greek alphabet existed in many different local variants, but, by the end of the fourth century BC, the Eucleidean alphabet, with twenty-four letters, ordered from alpha to omega, had become standard and it is this version that is still used to write Greek today. These twenty-four letters are: Α α, Β β, Γ γ, Δ δ, Ε ε, Ζ ζ, Η η, Θ θ, Ι ι, Κ κ, Λ λ, Μ μ, Ν ν, Ξ ξ, Ο ο, Π π, Ρ ρ, Σ σ/ς, Τ τ, Υ υ, Φ φ, Χ χ, Ψ ψ, and Ω ω.

Windows-1253 is a Windows code page used to write modern Greek. It is not capable of supporting the older polytonic Greek.

Iota subscript diacritical mark

The iota subscript is a diacritic mark in the Greek alphabet shaped like a small vertical stroke or miniature iota ⟨ι⟩ placed below the letter. It can occur with the vowel letters eta ⟨η⟩, omega ⟨ω⟩, and alpha ⟨α⟩. It represents the former presence of an offglide after the vowel, forming a so‐called "long diphthong". Such diphthongs —phonologically distinct from the corresponding normal or "short" diphthongs —were a feature of ancient Greek in the pre-classical and classical eras.

As of Unicode version 12.0 Cyrillic script is encoded across several blocks, all in the BMP:

Unicode has a certain amount of duplication of characters. These are pairs of single Unicode code points that are canonically equivalent. The reason for this are compatibility issues with legacy systems.

The orthography of the Greek language ultimately has its roots in the adoption of the Greek alphabet in the 9th century BC. Some time prior to that, one early form of Greek, Mycenaean, was written in Linear B, although there was a lapse of several centuries between the time Mycenaean stopped being written and the time when the Greek alphabet came into use.

Beta Code is a method of representing, using only ASCII characters, characters and formatting found in ancient Greek texts. Its aim is to be not merely a romanization of the Greek alphabet, but to represent faithfully a wide variety of source texts – including formatting as well as rare or idiosyncratic characters.

Unicode supports several phonetic scripts and notations through the existing writing systems and the addition of extra blocks with phonetic characters. These phonetic extras are derived of an existing script, usually Latin, Greek or Cyrillic. In Unicode there is no "IPA script". Apart from IPA, extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters.

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 and also legacy characters from the ISO 6937 standard.

Greek orthography has used a variety of diacritics starting in the Hellenistic period. The more complex polytonic orthography notates Ancient Greek phonology. The simpler monotonic orthography, introduced in 1982, corresponds to Modern Greek phonology, and requires only two diacritics.

In Unicode, braille is represented in a block called Braille Patterns (U+2800..U+28FF). The block contains all 256 possible patterns of an 8-dot braille cell, thereby including the complete 6-dot cell range.

Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally used for writing Coptic, using the similar Greek letters, in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Unicode Standard, a separate Coptic block has been included in Unicode, allowing for mixed Greek/Coptic text that is stylistically contrastive, as is convention in scholarly works. Writing polytonic Greek requires the use of combining characters or the precomposed vowel + tone characters in the Greek Extended character block.

Coptic is a Unicode block used with the Greek and Coptic block to write the Coptic language. Prior to version 4.1 of the Unicode Standard, Greek and Coptic was used exclusively to write Coptic text, but Greek and Coptic letter forms are contrastive in many scholarly works, necessitating their disunification. Any specifically Coptic letters in the Greek and Coptic block are not reproduced in the Coptic Unicode block.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.