Combining Diacritical Marks Extended

Last updated
Combining Diacritical Marks Extended
RangeU+1AB0..U+1AFF
(80 code points)
Plane BMP
Scripts Inherited
Assigned17 code points
Unused63 reserved code points
Unicode version history
7.0 (2014)15 (+15)
13.0 (2020)17 (+2)
Note: [1] [2]

Combining Diacritical Marks Extended is a Unicode block containing diacritical marks used in German dialectology (Teuthonista). [3]

Contents

Combining Diacritical Marks Extended [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+1ABx◌᪰◌᪱◌᪲◌᪳◌᪴◌᪵◌᪶◌᪷◌᪸◌᪹◌᪺◌᪻◌᪼◌᪽◌᪾◌ᪿ
U+1ACx◌ᫀ
U+1ADx
U+1AEx
U+1AFx
Notes
1. ^ As of Unicode version 13.0
2. ^ Grey areas indicate non-assigned code points

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Combining Diacritical Marks Extended block:

Version Final code points [lower-alpha 1] Count L2  ID WG2  IDDocument
7.0U+1AB0..1ABE15 L2/08-428 N3555 Everson, Michael (2008-11-27), Exploratory proposal to encode Germanicist, Nordicist, and other phonetic characters in the UCS
L2/10-346 N3907 Everson, Michael; Wandl-Vogt, Eveline; Dicklberger, Alois (2010-09-23), Preliminary proposal to encode "Teuthonista" phonetic characters in the UCS
L2/11-137 N4031 Everson, Michael; Wandl-Vogt, Eveline; Dicklberger, Alois (2011-05-09), Proposal to encode "Teuthonista" phonetic characters in the UCS
L2/11-203 N4082 Everson, Michael; et al. (2011-05-27), Support for "Teuthonista" encoding proposal
L2/11-202 N4081 Everson, Michael; Dicklberger, Alois; Pentzlin, Karl; Wandl-Vogt, Eveline (2011-06-02), Revised proposal to encode "Teuthonista" phonetic characters in the UCS
L2/11-240 N4106 Everson, Michael; Pentzlin, Karl (2011-06-09), Report on the ad hoc re "Teuthonista" (SC2/WG2 N4081) held during the SC2/WG2 meeting at Helsinki
L2/11-261R2 Moore, Lisa (2011-08-16), "Consensus 128-C38", UTC #128 / L2 #225 Minutes, Approve 85 characters for German dialectology...
N4103 "11.16 Teuthonista phonetic characters", Unconfirmed minutes of WG 2 meeting 58, 2012-01-03
L2/12-269 N4296 Request to change the names of three Teuthonista characters under ballot, 2012-07-26
13.0U+1ABF..1AC02 L2/19-075R N5036R Everson, Michael (2019-05-05), Proposal to add six phonetic characters for Scots to the UCS
L2/19-173 Anderson, Deborah; et al. (2019-04-29), "Phonetic characters for Scots", Recommendations to UTC #159 April-May 2019 on Script Proposals
L2/19-122 Moore, Lisa (2019-05-08), "C.6", UTC #159 Minutes
N5122 "M68.05", Unconfirmed minutes of WG 2 meeting 68, 2019-12-31
L2/20-052 Pournader, Roozbeh (2020-01-15), Changes to Identifier_Type of some Unicode 13.0 characters
L2/20-015 Moore, Lisa (2020-01-23), "B.13.4 Changes to Identifier_Type of some Unicode 13.0 characters", Draft Minutes of UTC Meeting 162
  1. Proposed code points and characters names may differ from final code points and names

See also

Related Research Articles

In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks.

Windows-1258 is a code page used in Microsoft Windows to represent Vietnamese texts. It makes use of combining diacritical marks.

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

As of Unicode version 13.0 Cyrillic script is encoded across several blocks, all in the BMP:

Monospace (typeface)

Monospace is a monospaced Unicode font, developed by George Williams. It is based on the typeface Courier. This font contains 2860 glyphs. It includes characters in the following unicode ranges: Basic Latin, Latin-1 Supplement, Latin Extended-A, Latin Extended-B, IPA Extensions, Spacing Modifier Letters, Combining Diacritical Marks, Greek, Cyrillic, Hebrew, Latin Extended Additional, Greek Extended, General Punctuation, Superscripts and Subscripts, Currency Symbols, Combining Diacritical Marks for Symbols, Letterlike Symbols, Number Forms, Arrows, Mathematical Operators, Miscellaneous Technical, Control Pictures, Enclosed Alphanumerics, Box Drawing, Block Elements, Geometric Shapes, Miscellaneous Symbols, Alphabetic Presentation Forms, Halfwidth and Fullwidth Forms.

Arrow (symbol) Graphical symbol or pictogram used to point or indicate direction

An arrow is a graphical symbol, such as ← or →, or a pictogram, used to point or indicate direction. In its simplest form, an arrow is a triangle, chevron, or concave kite, usually affixed to a line segment or rectangle, and in more complex forms a representation of an actual arrow. The direction indicated by an arrow is the one along the length of the line or rectangle towards the single pointed end.

Combining Diacritical Marks Supplement is a Unicode block containing combining characters for the Uralic Phonetic Alphabet, Medievalist notations, and German dialectology (Teuthonista). It is an extension of the diacritic characters found in the Combining Diacritical Marks block.

Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context. Its block name in Unicode 1.0 was Generic Diacritical Marks.

Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with combining diacritics, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages and the Vietnamese alphabet. Latin Extended-C contains additions for Uighur and the Claudian letters. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (Teuthonista).

Combining Diacritical Marks for Symbols is a Unicode block containing arrows, dots, enclosures, and overlays for modifying symbol characters.

Macron below, U+0331◌̱COMBINING MACRON BELOW, is a combining diacritical mark that is used in various orthographies.

In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 13.0, seven of the planes have assigned code points (characters), and five are named.

Combining Half Marks is a Unicode block containing diacritic mark parts for spanning multiple characters.

IPA Extensions is a block (0250–02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

Latin Extended Additional is a Unicode block.

Teuthonista is a phonetic transcription system used predominantly for the transcription of (High) German dialects. It is very similar to other Central European transcription systems from the early 20th century. The base characters are mostly based on the Latin alphabet, which can be modified by various diacritics.

Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally used for writing Coptic, using the similar Greek letters, in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Unicode Standard, a separate Coptic block has been included in Unicode, allowing for mixed Greek/Coptic text that is stylistically contrastive, as is convention in scholarly works. Writing polytonic Greek requires the use of combining characters or the precomposed vowel + tone characters in the Greek Extended character block.

Greek Extended is a Unicode block containing the accented vowels necessary for writing polytonic Greek. The regular, unaccented Greek characters as well as the characters with tonos and diaeresis can be found in the Greek and Coptic block. Greek Extended was encoded in version 1.1 of the Unicode Standard. As an alternative to Greek Extended, combining characters can be used to represent the tones and breath marks of polytonic Greek.

Devanagari Extended is a Unicode block containing cantilation marks for writing the Samaveda, and nasalization marks for the Devanagari script.

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
  3. Everson, Michael; Dicklberger, Alois; Pentzlin, Karl; Wandl-Vogt, Eveline (2011-06-02). "Revised proposal to encode "Teuthonista" phonetic characters in the UCS" (PDF).