This article relies largely or entirely on a single source .(November 2012) |
The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts. [1] It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX. [1] It contains 256 characters supporting most west- and east-European languages with the Latin alphabet. [2]
In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used. [3] In LaTeX one can switch to this encoding with \usepackage[T1]{fontenc}
, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ` 0060 | ´ 00B4 | ˆ 02C6 | ˜ 02DC | ¨ 00A8 | ˝ 02DD | ˚ 02DA | ˇ 02C7 | ˘ 02D8 | ¯ 00AF | ˙ 02D9 | ¸ 00B8 | ˛ 02DB | ‚ 201A | ‹ 2039 | › 203A |
1x | “ 201C | ” 201D | „ 201E | « 00AB | » 00BB | – 2013 | — 2014 | ZWSP [lower-alpha 1] 200B | ₀ [lower-alpha 2] 2080 | ı [lower-alpha 3] 0131 | ȷ [lower-alpha 3] 0237 | ff FB00 | fi FB01 | fl FB02 | ffi FB03 | ffl FB04 |
2x | SP | ! | " | # | $ | % | & | ’ 2019 | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ‘ 2018 | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | SHY [lower-alpha 4] |
8x | Ă 0102 | Ą 0104 | Ć 0106 | Č 010C | Ď 010E | Ě 011A | Ę 0118 | Ğ 011E | Ĺ 0139 | Ľ 013D | Ł 0141 | Ń 0143 | Ň 0147 | Ŋ 014A | Ő 0150 | Ŕ 0154 |
9x | Ř 0158 | Ś 015A | Š 0160 | Ş 015E | Ť 0164 | Ţ 0162 | Ű 0170 | Ů 016E | Ÿ 0178 | Ź 0179 | Ž 017D | Ż 017B | IJ 0132 | İ 0130 | đ 0111 | § 00A7 |
Ax | ă 0103 | ą 0105 | ć 0107 | č 010D | ď 010F | ě 011B | ę 0119 | ğ 011F | ĺ 013A | ľ 013E | ł 0142 | ń 0144 | ň 0148 | ŋ 014B | ő 0151 | ŕ 0155 |
Bx | ř 0159 | ś 015B | š 0161 | ş 015F | ť 0165 | ţ 0163 | ű 0171 | ů 016F | ÿ 00FF | ź 017A | ž 017E | ż 017C | ij 0133 | ¡ 00A1 | ¿ 00BF | £ 00A3 |
Cx | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
Dx | Ð [lower-alpha 5] | Ñ | Ò | Ó | Ô | Õ | Ö | Œ 0152 | Ø | Ù | Ú | Û | Ü | Ý | Þ | SS [lower-alpha 6] 1E9E |
Ex | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
Fx | ð | ñ | ò | ó | ô | õ | ö | œ 0153 | ø | ù | ú | û | ü | ý | þ | ß 00DF |
The encoding supports most European languages written in Latin alphabet. Notable exceptions are:
Languages with slightly suboptimal support include:
The Cyrillic script, Slavonic script or simply Slavic script is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Eastern Europe, the Caucasus, Central Asia, North Asia, and East Asia, and used by many other minority languages.
Mojibake is the garbled or gibberish text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.
The letter F with hook is a letter of the Latin script, based on the italic form of f; or on its regular form with a descender hook added. A very similar-looking letter, ⟨ʄ⟩, is used in the IPA for a voiced palatal implosive.
In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters ⟨æ⟩ and ⟨œ⟩ used in English and French, in which the letters ⟨a⟩ and ⟨e⟩ are joined for the first ligature and the letters ⟨o⟩ and ⟨e⟩ are joined for the second ligature. For stylistic and legibility reasons, ⟨f⟩ and ⟨i⟩ are often merged to create ⟨fi⟩ ; the same is true of ⟨s⟩ and ⟨t⟩ to create ⟨st⟩. The common ampersand, ⟨&⟩, developed from a ligature in which the handwritten Latin letters ⟨e⟩ and ⟨t⟩ were combined.
PETSCII, also known as CBM ASCII, is the character set used in Commodore Business Machines' 8-bit home computers.
Letter case is the distinction between the letters that are in larger uppercase or capitals and smaller lowercase in the written representation of certain languages. The writing systems that distinguish between the upper- and lowercase have two parallel sets of letters: each in the majuscule set has a counterpart in the minuscule set. Some counterpart letters have the same shape, and differ only in size, but for others the shapes are different. The two case variants are alternative representations of the same letter: they have the same name and pronunciation and are typically treated identically when sorting in alphabetical order.
The ʻokina, also called by several other names, is a consonant letter used within the Latin script to mark the phonemic glottal stop in many Polynesian languages. It does not have distinct uppercase and lowercase forms.
The character ⟨ʔ⟩ called glottal stop, is an alphabetic letter in some Latin alphabets, most notably in several languages of Canada where it indicates a glottal stop sound. Such usage derives from phonetic transcription, for example the International Phonetic Alphabet (IPA), that use this letter for the glottal stop sound. The letter derives graphically from use of the apostrophe ⟨ʼ⟩ or the symbol ʾ for glottal stop.
The Latin-derived letters dotted İ i and dotless I ı, which are distinct letters in the alphabets of a number of Turkic languages, unlike in English and most languages using the Latin script, have caused some issues in computing.
The internationalized domain name (IDN) homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike. For example, the Cyrillic, Greek and Latin alphabets each have a letter ⟨o⟩ that has the same shape but different meaning from its counterparts.
Ou is a ligature of the Greek letters ο and υ which was frequently used in Byzantine manuscripts. This omicron-upsilon ligature is still seen today on icon artwork in Greek Orthodox churches, and sometimes in graffiti or other forms of informal or decorative writing.
Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals. These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.
Small capital I is an additional letter of the Latin alphabet similar in its dimensions to the letter "i" but with a shape based on ⟨I⟩, its capital form. Although ⟨ɪ⟩ is usually an allograph of the letter I, it is considered as an additional letter in the African reference alphabet and has been used as such in some publications in the Kulango languages in Côte d'Ivoire in the 1990s. In the International Phonetic Alphabet, the lowercase small capital I is used as the symbol for the near-close near-front unrounded vowel, like the letter i in the word "fit".
L, or l, is the twelfth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is el, plural els.
Dz is a digraph of the Latin script, consisting of the consonants D and Z. It may represent, , or, depending on the language.
Transformations of text are strategies to perform geometric transformations on text, particularly in systems that do not natively support transformation, such as HTML, seven-segment displays and plain text.
The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.
I, or i, is the ninth letter and the third vowel letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is i, plural ies.
The Vietnamese language is written with a Latin script with diacritics which requires several accommodations when typing on phone or computers. Software-based systems are a form of writing Vietnamese on phones or computers with software that can be installed on the device or from third-party software such as UniKey. Telex is the oldest input method devised to encode the Vietnamese language with its tones. Other input methods may also include VNI and VIQR. VNI input method is not to be confused with VNI code page.
The Atari ST character set is the character set of the Atari ST personal computer family including the Atari STE, TT and Falcon. It is based on code page 437, the original character set of the IBM PC.