Combining Diacritical Marks

Last updated
Combining Diacritical Marks
RangeU+0300..U+036F
(112 code points)
Plane BMP
Scripts Inherited
Major alphabets IPA, UPA
Symbol setsaccents
diacritics
Assigned112 code points
Unused0 reserved code points
Unicode version history
1.0.0 (1991)66 (+66)
1.0.1 (1992)68 (+2)
1.1 (1993)72 (+4)
3.0 (1999)82 (+10)
3.2 (2002)96 (+14)
4.0 (2003)107 (+11)
4.1 (2005)112 (+5)
Unicode documentation
Code chart ∣ Web page
Note: Two characters were moved from the Greek and Coptic block to the Combining Diacritical Marks block in version 1.0.1 during the process of unifying with ISO 10646. [1] [2] [3]

Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actually separates characters that would otherwise be considered a single grapheme in a given context. Its block name in Unicode 1.0 was Generic Diacritical Marks. [4]

Contents

Block

Combining Diacritical Marks [1]
Official Unicode Consortium code chart (PDF)
 0123456789ABCDEF
U+030x ̀ ́ ̂ ̃ ̄ ̅ ̆ ̇ ̈ ̉ ̊ ̋ ̌ ̍ ̎ ̏
U+031x ̐ ̑ ̒ ̓ ̔ ̕ ̖ ̗ ̘ ̙ ̚ ̛ ̜ ̝ ̞ ̟
U+032x ̠ ̡ ̢ ̣ ̤ ̥ ̦ ̧ ̨ ̩ ̪ ̫ ̬ ̭ ̮ ̯
U+033x ̰ ̱ ̲ ̳ ̴ ̵ ̶ ̷ ̸ ̹ ̺ ̻ ̼ ̽ ̾ ̿
U+034x ̀ ́ ͂ ̓ ̈́ ͅ ͆ ͇ ͈ ͉ ͊ ͋ ͌ ͍ ͎  CGJ 
U+035x ͐ ͑ ͒ ͓ ͔ ͕ ͖ ͗ ͘ ͙ ͚ ͛ ͜ ͝ ͞ ͟
U+036x ͠ ͡ ͢ ͣ ͤ ͥ ͦ ͧ ͨ ͩ ͪ ͫ ͬ ͭ ͮ ͯ
Notes
1. ^ As of Unicode version 15.1

Character table

CodeGlyphDecimalDescription
U+0300   ̀768Combining Grave Accent
U+0301   ́769Combining Acute Accent
U+0302   ̂770Combining Circumflex Accent
U+0303   ̃771Combining Tilde
U+0304   ̄772Combining Macron
U+0305   ̅773Combining Overline
U+0306   ̆774Combining Breve
U+0307   ̇775Combining Dot Above
U+0308   ̈776Combining Diaeresis
U+0309   ̉777Combining Hook Above
U+030A   ̊778Combining Ring Above
U+030B   ̋779Combining Double Acute Accent
U+030C   ̌780Combining Caron
U+030D   ̍781Combining Vertical Line Above
U+030E   ̎782Combining Double Vertical Line Above
U+030F   ̏783Combining Double Grave Accent
U+0310   ̐784Combining Candrabindu
U+0311   ̑785Combining Inverted Breve
U+0312   ̒786Combining Turned Comma Above
U+0313   ̓787Combining Comma Above
U+0314   ̔788Combining Reversed Comma Above
U+0315   ̕789Combining Comma Above Right
U+0316   ̖790Combining Grave Accent Below
U+0317   ̗791Combining Acute Accent Below
U+0318   ̘792Combining Left Tack Below
U+0319   ̙793Combining Right Tack Below
U+031A   ̚794Combining Left Angle Above
U+031B   ̛795Combining Horn
U+031C   ̜796Combining Left Half Ring Below
U+031D   ̝797Combining Up Tack Below
U+031E   ̞798Combining Down Tack Below
U+031F   ̟799Combining Plus Sign Below
U+0320   ̠800Combining Minus Sign Below
U+0321   ̡801Combining Palatalized Hook Below
U+0322   ̢802Combining Retroflex Hook Below
U+0323   ̣803Combining Dot Below
U+0324   ̤804Combining Diaeresis Below
U+0325   ̥805Combining Ring Below
U+0326   ̦806Combining Comma Below
U+0327   ̧807Combining Cedilla
U+0328   ̨808Combining Ogonek
U+0329   ̩809Combining Vertical Line Below
U+032A   ̪810Combining Bridge Below
U+032B   ̫811Combining Inverted Double Arch Below
U+032C   ̬812Combining Caron Below
U+032D   ̭813Combining Circumflex Accent Below
U+032E   ̮814Combining Breve Below
U+032F   ̯815Combining Inverted Breve Below
U+0330   ̰816Combining Tilde Below
U+0331   ̱817Combining Macron Below
U+0332   ̲818Combining Low Line
U+0333   ̳819Combining Double Low Line
U+0334   ̴820Combining Tilde Overlay
U+0335   ̵821Combining Short Stroke Overlay
U+0336   ̶822Combining Long Stroke Overlay
U+0337   ̷823Combining Short Solidus Overlay
U+0338   ̸824Combining Long Solidus Overlay
U+0339   ̹825Combining Right Half Ring Below
U+033A   ̺826Combining Inverted Bridge Below
U+033B   ̻827Combining Square Below
U+033C   ̼828Combining Seagull Below
U+033D   ̽829Combining X Above
U+033E   ̾830Combining Vertical Tilde
U+033F   ̿831Combining Double Overline
U+0340   ̀832Combining Grave Tone Mark
U+0341   ́833Combining Acute Tone Mark
U+0342   ͂834Combining Greek Perispomeni
U+0343   ̓835Combining Greek Koronis
U+0344   ̈́836Combining Greek Dialytika Tonos
U+0345   ͅ837Combining Greek Ypogegrammeni
U+0346   ͆838Combining Bridge Above
U+0347   ͇839Combining Equals Sign Below
U+0348   ͈840Combining Double Vertical Line Below
U+0349   ͉841Combining Left Angle Below
U+034A   ͊842Combining Not Tilde Above
U+034B   ͋843Combining Homothetic Above
U+034C   ͌844Combining Almost Equal To Above
U+034D   ͍845Combining Left Right Arrow Below
U+034E   ͎846Combining Upwards Arrow Below
U+034F   ͏847Combining Grapheme Joiner
U+0350   ͐848Combining Right Arrowhead Above
U+0351   ͑849Combining Left Half Ring Above
U+0352   ͒850Combining Fermata
U+0353   ͓851Combining X Below
U+0354   ͔852Combining Left Arrowhead Below
U+0355   ͕853Combining Right Arrowhead Below
U+0356   ͖854Combining Right Arrowhead And Up Arrowhead Below
U+0357   ͗855Combining Right Half Ring Above
U+0358   ͘856Combining Dot Above Right
U+0359   ͙857Combining Asterisk Below
U+035A   ͚858Combining Double Ring Below
U+035B   ͛859Combining Zigzag Above
U+035C   ͜860Combining Double Breve Below
U+035D   ͝861Combining Double Breve
U+035E   ͞862Combining Double Macron
U+035F   ͟863Combining Double Macron Below
U+0360   ͠864Combining Double Tilde
U+0361   ͡865Combining Double Inverted Breve
U+0362   ͢866Combining Double Rightwards Arrow Below
U+0363   ͣ867Combining Latin Small Letter A
U+0364   ͤ868Combining Latin Small Letter E
U+0365   ͥ869Combining Latin Small Letter I
U+0366   ͦ870Combining Latin Small Letter O
U+0367   ͧ871Combining Latin Small Letter U
U+0368   ͨ872Combining Latin Small Letter C
U+0369   ͩ873Combining Latin Small Letter D
U+036A   ͪ874Combining Latin Small Letter H
U+036B   ͫ875Combining Latin Small Letter M
U+036C   ͬ876Combining Latin Small Letter R
U+036D   ͭ877Combining Latin Small Letter T
U+036E   ͮ878Combining Latin Small Letter V
U+036F   ͯ879Combining Latin Small Letter X

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Combining Diacritical Marks block:

See also

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks.

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

As of Unicode version 15.1, Cyrillic script is encoded across several blocks:

Combining Diacritical Marks Supplement is a Unicode block containing combining characters for the Uralic Phonetic Alphabet, Medievalist notations, and German dialectology (Teuthonista). It is an extension of the diacritic characters found in the Combining Diacritical Marks block.

Unicode supports several phonetic scripts and notations through its existing scripts and the addition of extra blocks with phonetic characters. These phonetic characters are derived from an existing script, usually Latin, Greek or Cyrillic. Apart from the International Phonetic Alphabet (IPA), extensions to the IPA and obsolete and nonstandard IPA symbols, these blocks also contain characters from the Uralic Phonetic Alphabet and the Americanist Phonetic Alphabet.

Combining Diacritical Marks for Symbols is a Unicode block containing arrows, dots, enclosures, and overlays for modifying symbol characters.

Macron below is a combining diacritical mark that is used in various orthographies.

The combining grapheme joiner (CGJ), U+034F͏COMBINING GRAPHEME JOINER is a Unicode character that has no visible glyph and is "default ignorable" by applications. Its name is a misnomer and does not describe its function: the character does not join graphemes. Its purpose is to semantically separate characters that should not be considered digraphs as well as to block canonical reordering of combining marks during normalization.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

Specials is a short Unicode block of characters allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 code points, five have been assigned since Unicode 3.0:

Combining Half Marks is a Unicode block containing diacritical combining characters for spanning multiple characters.

The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.

Latin Extended Additional is a Unicode block.

The Unicode Standard assigns various properties to each Unicode character and code point.

<span class="mw-page-title-main">Greek and Coptic</span> Unicode character block

Greek and Coptic is the Unicode block for representing modern (monotonic) Greek. It was originally also used for writing Coptic, using the similar Greek letters in addition to the uniquely Coptic additions. Beginning with version 4.1 of the Unicode Standard, a separate Coptic block has been included in Unicode, allowing for mixed Greek/Coptic text that is stylistically contrastive, as is convention in scholarly works. Writing polytonic Greek requires the use of combining characters or the precomposed vowel + tone characters in the Greek Extended character block.

Combining Diacritical Marks Extended is a Unicode block containing diacritical marks used in German dialectology (Teuthonista).

Newa is a Unicode block containing characters from the Newa alphabet, which is used to write Nepal Bhasa.

References

  1. "Unicode 1.0.1 Addendum" (PDF). The Unicode Standard. 1992-11-03. Retrieved 2016-07-09.
  2. "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  3. "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  4. "3.8: Block-by-Block Charts" (PDF). The Unicode Standard. version 1.0. Unicode Consortium.