Combining Diacritical Marks Supplement

Last updated
Combining Diacritical Marks Supplement
RangeU+1DC0..U+1DFF
(64 code points)
Plane BMP
Scripts Inherited
Major alphabets UPA
Symbol setsMedieval letter diacritics
Assigned63 code points
Unused1 reserved code points
Unicode version history
4.1 (2005)4 (+4)
5.0 (2006)13 (+9)
5.1 (2008)41 (+28)
5.2 (2009)42 (+1)
6.0 (2010)43 (+1)
7.0 (2014)58 (+15)
9.0 (2016)59 (+1)
10.0 (2017)63 (+4)
Note: [1] [2]

Combining Diacritical Marks Supplement is a Unicode block containing combining characters for the Uralic Phonetic Alphabet, Medievalist notations, and German dialectology (Teuthonista). [3] It is an extension of the diacritic characters found in the Combining Diacritical Marks block.

Contents

Block

Combining Diacritical Marks Supplement [1] [2]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+1DCx
U+1DDx
U+1DEx
U+1DFx᷿
Notes
1. ^ As of Unicode version 13.0
2. ^ Grey area indicates non-assigned code point

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Combining Diacritical Marks Supplement block:

Related Research Articles

In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks.

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

As of Unicode version 13.0 Cyrillic script is encoded across several blocks, all in the BMP:

Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context. Its block name in Unicode 1.0 was Generic Diacritical Marks.

Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with combining diacritics, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages and the Vietnamese alphabet. Latin Extended-C contains additions for Uighur and the Claudian letters. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (Teuthonista).

Unicode supports several phonetic scripts and notations through the existing writing systems and the addition of extra blocks with phonetic characters. These phonetic extras are derived of an existing script, usually Latin, Greek or Cyrillic. In Unicode there is no "IPA script". Apart from IPA, extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

Phonetic Extensions is a Unicode block containing phonetic characters used in the Uralic Phonetic Alphabet, Old Irish phonetic notation, the Oxford English dictionary and American dictionaries, and Americanist and Russianist phonetic notations. Its character set is continued in the following Unicode block, Phonetic Extensions Supplement.

GNU FreeFont Font family

GNU FreeFont is a family of free OpenType, TrueType and WOFF vector fonts, implementing as much of the Universal Character Set (UCS) as possible, aside from the very large CJK Asian character set. The project was initiated in 2002 by Primož Peterlin and is now maintained by Steve White.

Phonetic Extensions Supplement is a Unicode block containing characters for specialized and deprecated forms of the International Phonetic Alphabet.

Combining Diacritical Marks for Symbols is a Unicode block containing arrows, dots, enclosures, and overlays for modifying symbol characters.

Macron below, U+0331◌̱COMBINING MACRON BELOW, is a combining diacritical mark that is used in various orthographies.

In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 13.0, seven of the planes have assigned code points (characters), and five are named.

Combining Half Marks is a Unicode block containing diacritic mark parts for spanning multiple characters.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). Controls C1 (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

IPA Extensions is a block (0250–02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

Teuthonista is a phonetic transcription system used predominantly for the transcription of (High) German dialects. It is very similar to other Central European transcription systems from the early 20th century. The base characters are mostly based on the Latin alphabet, which can be modified by various diacritics.

Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally used for writing Coptic, using the similar Greek letters, in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Unicode Standard, a separate Coptic block has been included in Unicode, allowing for mixed Greek/Coptic text that is stylistically contrastive, as is convention in scholarly works. Writing polytonic Greek requires the use of combining characters or the precomposed vowel + tone characters in the Greek Extended character block.

Katakana is a Unicode block containing katakana characters for the Japanese and Ainu languages.

Combining Diacritical Marks Extended is a Unicode block containing diacritical marks used in German dialectology (Teuthonista).

References

  1. "Unicode character database". The Unicode Standard. Retrieved 2016-07-09.
  2. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2016-07-09.
  3. Everson, Michael; Dicklberger, Alois; Pentzlin, Karl; Wandl-Vogt, Eveline (2011-06-02). "Revised proposal to encode "Teuthonista" phonetic characters in the UCS" (PDF).