Latin Extended-A | |
---|---|
Range | U+0100..U+017F (128 code points) |
Plane | BMP |
Scripts | Latin |
Major alphabets | Afrikaans Catalan Croatian Czech Esperanto Greenlandic Hungarian Kashubian Kurdish Latin Latvian Lithuanian Maltese Northern Sami Polish Romanian Serbian Slovak Slovene Sorbian Turkish Welsh |
Assigned | 128 code points |
Unused | 0 reserved code points 1 deprecated |
Source standards | ISO/IEC 8859, ISO 6937 |
Unicode version history | |
1.0.0 (1991) | 127 (+127) |
1.1 (1993) | 128 (+1) |
Chart | |
Code chart | |
Note: [1] [2] |
Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characters from the ISO 6937 standard.
The Latin Extended-A block has been in the Unicode Standard since version 1.0, with its entire character repertoire, except for the Latin Small Letter Long S, which was added during unification with ISO 10646 in version 1.1. [3] Its block name in Unicode 1.0 was European Latin. [4]
Code (hex) | Grapheme | Names |
---|---|---|
European Latin | ||
U+0100 | Ā | Latin Capital letter A with macron |
U+0101 | ā | Latin Small letter A with macron |
U+0102 | Ă | Latin Capital letter A with breve |
U+0103 | ă | Latin Small letter A with breve |
U+0104 | Ą | Latin Capital letter A with ogonek |
U+0105 | ą | Latin Small letter A with ogonek |
U+0106 | Ć | Latin Capital letter C with acute |
U+0107 | ć | Latin Small letter C with acute |
U+0108 | Ĉ | Latin Capital letter C with circumflex |
U+0109 | ĉ | Latin Small letter C with circumflex |
U+010A | Ċ | Latin Capital letter C with dot above |
U+010B | ċ | Latin Small letter C with dot above |
U+010C | Č | Latin Capital letter C with caron |
U+010D | č | Latin Small letter C with caron |
U+010E | Ď | Latin Capital letter D with caron |
U+010F | ď | Latin Small letter D with caron |
U+0110 | Đ | Latin Capital letter D with stroke |
U+0111 | đ | Latin Small letter D with stroke |
U+0112 | Ē | Latin Capital letter E with macron |
U+0113 | ē | Latin Small letter E with macron |
U+0114 | Ĕ | Latin Capital letter E with breve |
U+0115 | ĕ | Latin Small letter E with breve |
U+0116 | Ė | Latin Capital letter E with dot above |
U+0117 | ė | Latin Small letter E with dot above |
U+0118 | Ę | Latin Capital letter E with ogonek |
U+0119 | ę | Latin Small letter E with ogonek |
U+011A | Ě | Latin Capital letter E with caron |
U+011B | ě | Latin Small letter E with caron |
U+011C | Ĝ | Latin Capital letter G with circumflex |
U+011D | ĝ | Latin Small letter G with circumflex |
U+011E | Ğ | Latin Capital letter G with breve |
U+011F | ğ | Latin Small letter G with breve |
U+0120 | Ġ | Latin Capital letter G with dot above |
U+0121 | ġ | Latin Small letter G with dot above |
U+0122 | Ģ | Latin Capital letter G with cedilla |
U+0123 | ģ | Latin Small letter G with cedilla |
U+0124 | Ĥ | Latin Capital letter H with circumflex |
U+0125 | ĥ | Latin Small letter H with circumflex |
U+0126 | Ħ | Latin Capital letter H with stroke |
U+0127 | ħ | Latin Small letter H with stroke |
U+0128 | Ĩ | Latin Capital letter I with tilde |
U+0129 | ĩ | Latin Small letter I with tilde |
U+012A | Ī | Latin Capital letter I with macron |
U+012B | ī | Latin Small letter I with macron |
U+012C | Ĭ | Latin Capital letter I with breve |
U+012D | ĭ | Latin Small letter I with breve |
U+012E | Į | Latin Capital letter I with ogonek |
U+012F | į | Latin Small letter I with ogonek |
U+0130 | İ | Latin Capital letter I with dot above |
U+0131 | ı | Latin Small letter dotless I |
U+0132 | IJ | Latin Capital Ligature IJ |
U+0133 | ij | Latin Small Ligature IJ |
U+0134 | Ĵ | Latin Capital letter J with circumflex |
U+0135 | ĵ | Latin Small letter J with circumflex |
U+0136 | Ķ | Latin Capital letter K with cedilla |
U+0137 | ķ | Latin Small letter K with cedilla |
U+0138 | ĸ | Latin Small letter Kra |
U+0139 | Ĺ | Latin Capital letter L with acute |
U+013A | ĺ | Latin Small letter L with acute |
U+013B | Ļ | Latin Capital letter L with cedilla |
U+013C | ļ | Latin Small letter L with cedilla |
U+013D | Ľ | Latin Capital letter L with caron |
U+013E | ľ | Latin Small letter L with caron |
U+013F | Ŀ | Latin Capital letter L with middle dot |
U+0140 | ŀ | Latin Small letter L with middle dot |
U+0141 | Ł | Latin Capital letter L with stroke |
U+0142 | ł | Latin Small letter L with stroke |
U+0143 | Ń | Latin Capital letter N with acute |
U+0144 | ń | Latin Small letter N with acute |
U+0145 | Ņ | Latin Capital letter N with cedilla |
U+0146 | ņ | Latin Small letter N with cedilla |
U+0147 | Ň | Latin Capital letter N with caron |
U+0148 | ň | Latin Small letter N with caron |
Deprecated Letter | ||
U+0149 | ʼn | Latin Small letter N preceded by apostrophe (Deprecated letter) |
European Latin | ||
U+014A | Ŋ | Latin Capital letter Eng |
U+014B | ŋ | Latin Small letter Eng |
U+014C | Ō | Latin Capital letter O with macron |
U+014D | ō | Latin Small letter O with macron |
U+014E | Ŏ | Latin Capital letter O with breve |
U+014F | ŏ | Latin Small letter O with breve |
U+0150 | Ő | Latin Capital Letter O with double acute |
U+0151 | ő | Latin Small Letter O with double acute |
U+0152 | Œ | Latin Capital Ligature OE |
U+0153 | œ | Latin Small Ligature OE |
U+0154 | Ŕ | Latin Capital letter R with acute |
U+0155 | ŕ | Latin Small letter R with acute |
U+0156 | Ŗ | Latin Capital letter R with cedilla |
U+0157 | ŗ | Latin Small letter R with cedilla |
U+0158 | Ř | Latin Capital letter R with caron |
U+0159 | ř | Latin Small letter R with caron |
U+015A | Ś | Latin Capital letter S with acute |
U+015B | ś | Latin Small letter S with acute |
U+015C | Ŝ | Latin Capital letter S with circumflex |
U+015D | ŝ | Latin Small letter S with circumflex |
U+015E | Ş | Latin Capital letter S with cedilla |
U+015F | ş | Latin Small letter S with cedilla |
U+0160 | Š | Latin Capital letter S with caron |
U+0161 | š | Latin Small letter S with caron |
U+0162 | Ţ | Latin Capital letter T with cedilla |
U+0163 | ţ | Latin Small letter T with cedilla |
U+0164 | Ť | Latin Capital letter T with caron |
U+0165 | ť | Latin Small letter T with caron |
U+0166 | Ŧ | Latin Capital letter T with stroke |
U+0167 | ŧ | Latin Small letter T with stroke |
U+0168 | Ũ | Latin Capital letter U with tilde |
U+0169 | ũ | Latin Small letter U with tilde |
U+016A | Ū | Latin Capital letter U with macron |
U+016B | ū | Latin Small letter U with macron |
U+016C | Ŭ | Latin Capital letter U with breve |
U+016D | ŭ | Latin Small letter U with breve |
U+016E | Ů | Latin Capital letter U with ring above |
U+016F | ů | Latin Small letter U with ring above |
U+0170 | Ű | Latin Capital Letter U with double acute |
U+0171 | ű | Latin Small Letter U with double acute |
U+0172 | Ų | Latin Capital letter U with ogonek |
U+0173 | ų | Latin Small letter U with ogonek |
U+0174 | Ŵ | Latin Capital letter W with circumflex |
U+0175 | ŵ | Latin Small letter W with circumflex |
U+0176 | Ŷ | Latin Capital letter Y with circumflex |
U+0177 | ŷ | Latin Small letter Y with circumflex |
U+0178 | Ÿ | Latin Capital letter Y with diaeresis |
U+0179 | Ź | Latin Capital letter Z with acute |
U+017A | ź | Latin Small letter Z with acute |
U+017B | Ż | Latin Capital letter Z with dot above |
U+017C | ż | Latin Small letter Z with dot above |
U+017D | Ž | Latin Capital letter Z with caron |
U+017E | ž | Latin Small letter Z with caron |
U+017F | ſ | Latin Small letter long S |
The Latin Extended-A block contains only two subheadings: European Latin and Deprecated letter. [5]
The European Latin subheading contains all but one character in the Latin Extended-A block. It is populated with accented and variant majuscule and minuscule Latin letters for writing mostly eastern European languages. [6]
The Deprecated letter subheading contains a single character, Latin Small Letter N Preceded by Apostrophe, which was included for compatibility with the ISO/IEC 6937 standard. [5] It was deprecated as of Unicode version 5.2.0, [7] with the comment that “U+0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE” was encoded for use in Afrikaans. The character is deprecated, and its use is strongly discouraged. In nearly all cases it is better represented by a sequence of an apostrophe followed by “n”: [6] ’n.
Type of subheading | Number of symbols | Range of characters |
---|---|---|
European Latin | 63 pairs of European Latin letters, Latin Small Letter N preceded by apostrophe (ʼn) U+0149 and Latin Small Letter long S (ſ) U+017F | U+0100 to U+017F(Including the Deprecated Letter, ʼn, U+0149) |
Deprecated Letter | Latin Small Letter N preceded by apostrophe (ʼn) U+0149 | U+0149 |
Latin Extended-A [1] [2] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+010x | Ā | ā | Ă | ă | Ą | ą | Ć | ć | Ĉ | ĉ | Ċ | ċ | Č | č | Ď | ď |
U+011x | Đ | đ | Ē | ē | Ĕ | ĕ | Ė | ė | Ę | ę | Ě | ě | Ĝ | ĝ | Ğ | ğ |
U+012x | Ġ | ġ | Ģ | ģ | Ĥ | ĥ | Ħ | ħ | Ĩ | ĩ | Ī | ī | Ĭ | ĭ | Į | į |
U+013x | İ | ı | IJ | ij | Ĵ | ĵ | Ķ | ķ | ĸ | Ĺ | ĺ | Ļ | ļ | Ľ | ľ | Ŀ |
U+014x | ŀ | Ł | ł | Ń | ń | Ņ | ņ | Ň | ň | ʼn | Ŋ | ŋ | Ō | ō | Ŏ | ŏ |
U+015x | Ő | ő | Œ | œ | Ŕ | ŕ | Ŗ | ŗ | Ř | ř | Ś | ś | Ŝ | ŝ | Ş | ş |
U+016x | Š | š | Ţ | ţ | Ť | ť | Ŧ | ŧ | Ũ | ũ | Ū | ū | Ŭ | ŭ | Ů | ů |
U+017x | Ű | ű | Ų | ų | Ŵ | ŵ | Ŷ | ŷ | Ÿ | Ź | ź | Ż | ż | Ž | ž | ſ |
Notes |
The following Unicode-related documents record the purpose and process of defining specific characters in the Latin Extended-A block:
Version | Final code points [lower-alpha 1] | Count | L2 ID | Document |
---|---|---|---|---|
1.0.0 | U+0100..017E | 127 | (to be determined) | |
L2/08-275 | Freytag, Asmus (2008-07-31), Comments on the proposed deprecation of characters (public review item #122) | |||
L2/08-278 | Pentzlin, Karl (2008-08-04), Comments on Public Review Issue #122 | |||
L2/08-287 | Davis, Mark (2008-08-04), Public Review Issue #122: Proposal for Additional Deprecated Characters | |||
L2/08-253R2 | Moore, Lisa (2008-08-19), "Consensus 116-C13", UTC #116 Minutes, Change the deprecated property by removing 0340, 0341, 17D3, and adding 0149, 0F77, 0F79, 17A4, 2329, 232A. | |||
L2/08-328 (html, xls) | Whistler, Ken (2008-10-14), Spreadsheet of Deprecation and Discouragement | |||
L2/10-268 | Priest, Lorna (2010-07-29), Annotation additions resulting from encoding LATIN CAPITAL LETTER H WITH HOOK | |||
1.1 | U+017F | 1 | (to be determined) | |
|
Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji, and non-visual control and formatting codes.
ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.
T.61 is an ITU-T Recommendation for a Teletex character set. T.61 predated Unicode, and was the primary character set in ASN.1 used in early versions of X.500 and X.509 for encoding strings containing characters used in Western European languages. It is also used by older versions of LDAP. While T.61 continues to be supported in modern versions of X.500 and X.509, it has been deprecated in favor of Unicode. It is also called Code page 1036, CP1036, or IBM 01036.
Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.
T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.
Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string, since the string ends once the program reads the null character.
Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.
The Basic Latin Unicode block, sometimes informally called C0 Controls and Basic Latin, is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.
The Latin-1 Supplement is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.
N-apostrophe (ʼn) is a Unicode code point for the Afrikaans language of South Africa and Namibia. The code point (U+0149) is currently deprecated, and the Unicode standard recommends that a sequence of an apostrophe followed by n be used instead, as the use of deprecated characters such as ʼn is "strongly discouraged", despite being required for CP853 compatibility. In fact, it was removed from the Charis SIL and Doulos SIL fonts. It is however in quite general use in the Afrikaans versions of Facebook and other publications, probably to avoid the tendency of auto-correction to turn a typed ′n into ‘n which is incorrect but common.
Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version 1.1, the block range was extended by 80 code points and another 35 characters were assigned. In version 3.0 and later, the last 60 available code points in the block were assigned. Its block name in Unicode 1.0 was Extended Latin.
IPA Extensions is a block (U+0250–U+02AF) of the Unicode standard that contains full size letters used in the International Phonetic Alphabet (IPA). Both modern and historical characters are included, as well as former and proposed IPA signs and non-IPA phonetic letters. Additional characters employed for phonetics, like the palatalization sign, are encoded in the blocks Phonetic Extensions (1D00–1D7F) and Phonetic Extensions Supplement (1D80–1DBF). Diacritics are found in the Spacing Modifier Letters (02B0–02FF) and Combining Diacritical Marks (0300–036F) blocks. Its block name in Unicode 1.0 was Standard Phonetic.
The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.
The Unicode Standard assigns various properties to each Unicode character and code point.
Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark. The block name in Unicode 1.0 was Basic Glyphs for Arabic Language; its characters were re-ordered in the process of merging with ISO 10646 in Unicode 1.0.1 and 1.1.
Optical Character Recognition is a Unicode block containing signal characters for OCR and MICR standards.