List of precomposed Latin characters in Unicode

Last updated

This is a list of precomposed Latin characters in Unicode. [1] [2] Unicode typefaces may be needed for these to display correctly.

Contents

Letters with diacritics

Aa Ææ Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Øø Pp Qq Rr Ssſ Tt Uu Vv Ww Xx Yy Zz Ʒʒ
acute Áá Ǽǽ Ćć Éé Ǵǵ Íí Ḱḱ Ĺĺ Ḿḿ Ńń Óó Ǿǿ Ṕṕ Ŕŕ Śś Úú Ẃẃ Ýý Źź
acute and dot aboveṤṥ
breve Ăă Ĕĕ Ğğ ĬĭŎŏŬŭ
breve and acuteẮắ
breve and dot belowẶặ
breve and graveẰằ
breve and hook aboveẲẳ
breve and tildeẴẵ
breve below Ḫḫ
caron Ǎǎ Čč Ďď Ěě Ǧǧ Ȟȟ Ǐǐ ǰ Ǩǩ Ľľ Ňň Ǒǒ Řř Šš Ťť Ǔǔ Žž Ǯǯ
caron and dot aboveṦṧ
cedilla Çç Ḑḑ ȨȩĢģḨḩ Ķķ ĻļŅņŖŗ Şş Ţţ
cedilla and acute Ḉḉ
cedilla and breveḜḝ
circumflex Ââ Ĉĉ Êê Ĝĝ Ĥĥ Îî Ĵĵ Ôô Ŝŝ Ûû ŴŵŶŷ Ẑẑ
circumflex and acuteẤấẾếỐố
circumflex and dot belowẬậỆệỘộ
circumflex and graveẦầỀềỒồ
circumflex and hook aboveẨẩỂểỔổ
circumflex and tildeẪẫỄễỖỗ
circumflex belowḒḓḘḙḼḽṊṋṰṱṶṷ
comma below Șș Țț
diaeresis Ää Ëë Ḧḧ Ïï Öö Üü ẄẅẌẍŸÿ
diaeresis and acuteḮḯǗǘ
diaeresis and caronǙǚ
diaeresis and graveǛǜ
diaeresis and macronǞǟȪȫǕǖ
diaeresis belowṲṳ
dot above Ȧȧ Ḃḃ Ċċ Ḋḋ Ėė Ḟḟ Ġġ Ḣḣ İ Ṁṁ Ṅṅ ȮȯṖṗṘṙ Ṡṡẛ ṪṫẆẇ Ẋẋ Ẏẏ Żż
dot above and macronǠǡȰȱ
dot below ẠạḄḅ Ḍḍ Ẹẹ Ḥḥ ỊịḲḳ Ḷḷ ṂṃṆṇỌọ Ṛṛ Ṣṣ Ṭṭ ỤụṾṿẈẉỴỵ Ẓẓ
dot below and dot aboveṨṩ
dot below and macron Ḹḹ Ṝṝ
double acute ŐőŰű
double grave ȀȁȄȅȈȉȌȍȐȑȔȕ
grave Àà Èè Ìì Ǹǹ Òò ÙùẀẁỲỳ
hook above Ảả ẺẻỈỉỎỏỦủỶỷ
horn Ơơ Ưư
horn and acuteỚớỨứ
horn and dot belowỢợỰự
horn and graveỜờỪừ
horn and hook aboveỞởỬử
horn and tildeỠỡỮữ
inverted breve ȂȃȆȇ Ȋȋ Ȏȏ Ȓȓ Ȗȗ
macron Āā ǢǣĒēḠḡĪīŌōŪū Ȳȳ
macron and acuteḖḗṒṓ
macron and diaeresisṺṻ
macron and graveḔḕṐṑ
macron below ḆḇḎḏḴḵḺḻṈṉṞṟṮṯẔẕ
ogonek Ąą Ęę Įį Ǫǫ Ųų
ogonek and macronǬǭ
ring above Åå Ůů
ring above and acuteǺǻ
ring below Ḁḁ
tilde Ãã Ẽẽ Ĩĩ Ññ Õõ ŨũṼṽỸỹ
tilde and acuteṌṍṸṹ
tilde and diaeresisṎṏ
tilde and macronȬȭ
tilde belowḚḛḬḭṴṵ

Digraphs and ligatures

Other characters

NameUppercaseLowercase
angstrom sign
a with right half ring
kelvin sign
l with interpunct Ŀŀ
apostrophe n ʼn
long s ſ

A collection of precomposed Latin characters (mostly abbreviations of units of measurement) is also included in the CJK Compatibility and Enclosed CJK Letters and Months sections of Unicode, as are a set of precomposed Roman numerals; these characters are intended for use in East Asian languages and are not meant to be mixed with Latin languages. Several enclosed alphanumerics are also featured in Unicode.

Some characters in the Letterlike Symbols block can be substituted with characters in the ASCII range.

See also

Related Research Articles

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. The ISO working group maintaining this series of standards has been disbanded.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

<span class="mw-page-title-main">Digraph (orthography)</span> Pair of characters used to write one phoneme

A digraph or digram is a pair of characters used in the orthography of a language to write either a single phoneme, or a sequence of phonemes that does not correspond to the normal values of the two characters combined.

A precomposed character is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacritical mark, such as é. Technically, é (U+00E9) is a character that can be decomposed into an equivalent string of the base letter e (U+0065) and combining acute accent (U+0301). Similarly, ligatures are precompositions of their constituent letters or graphemes.

<span class="mw-page-title-main">Gaj's Latin alphabet</span> Form of Latin script used to write Serbo-Croatian

Gaj's Latin alphabet, also known as abeceda or gajica, is the form of the Latin script used for writing Serbo-Croatian and all of its standard varieties: Bosnian, Croatian, Montenegrin, and Serbian.

is the seventh letter of the Gaj's Latin alphabet for Serbo-Croatian, after D and before Đ. It is pronounced. Dž is a digraph that corresponds to the letter Dzhe (Џ/џ) of the Serbian Cyrillic alphabet. It is also the tenth letter of the Slovak alphabet. Although several other languages also use the letter combination , they treat it as a pair of the letters D and Ž, not as a single distinct letter.

YUSCII is an informal name for several JUS standards for 7-bit character encoding. These include:

New Gulim (새굴림/SaeGulRim) is a sans-serif type Unicode font designed especially for the Korean-language script, designed by HanYang System Co., Limited. It is an expanded version of Hanyang Gulrim.

Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with combining diacritics, as well as some ligatures and distinct letters, used for example in the orthographies of various African languages and the Vietnamese alphabet. Latin Extended-C contains additions for Uighur and the Claudian letters. Latin Extended-D comprises characters that are mostly of interest to medievalists. Latin Extended-E mostly comprises characters used for German dialectology (Teuthonista). Latin Extended-F and -G contain characters for phonetic transcription.

In computing, a Unicode symbol is a Unicode character which is not part of a script used to write a natural language, but is nonetheless available for use as part of a text.

Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters.

In Unicode and the UCS, a compatibility character is a character that is encoded solely to maintain round-trip convertibility with other, often older, standards. As the Unicode Glossary says:

A character that would not have been encoded except for compatibility and round-trip convertibility with other standards

Enclosed Alphanumerics is a Unicode block of typographical symbols of an alphanumeric within a circle, a bracket or other not-closed enclosure, or ending in a full stop.

CJK Symbols and Punctuation is a Unicode block containing symbols and punctuation used for writing the Chinese, Japanese and Korean languages. It also contains one Chinese character.

Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplementary Multilingual Plane.

<span class="mw-page-title-main">Hangul Jamo (Unicode block)</span> Unicode character block

Hangul Jamo is a Unicode block containing positional forms of the Hangul consonant and vowel clusters. While the Hangul Syllables Unicode block contains precomposed syllables used in standard modern Korean, the Hangul Jamo block can be used to compose arbitrary syllables dynamically including those not included in the Hangul Syllables block.

Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed Alphanumerics: a few unit abbreviations, circled numbers from 21 to 50, and circled multiples of 10 from 10 to 80 enclosed in black squares.

CJK Compatibility is a Unicode block containing square symbols encoded for compatibility with East Asian character sets. In Unicode 1.0, it was divided into two blocks, named CJK Squared Words (U+3300–U+337F) and CJK Squared Abbreviations (U+3380–U+33FF).

<span class="mw-page-title-main">Noto fonts</span> Multilingual font family from Google

Noto is a font family comprising over 100 individual computer fonts, which are together designed to cover all the scripts encoded in the Unicode standard. As of October 2016, Noto fonts cover all 93 scripts defined in Unicode version 6.1, although fewer than 30,000 of the nearly 75,000 CJK unified ideographs in version 6.0 are covered. In total, Noto fonts cover over 77,000 characters, which is around half of the 149,186 characters defined in Unicode 15.0.

References

  1. "Chapter 3: Conformance, section 3.7: Decomposition" (PDF). The Unicode Standard. Retrieved 2016-09-10.
  2. "UCD: UnicodeData.txt". The Unicode Standard. Retrieved 2016-09-10.