Macintosh Latin encoding

Last updated
Macintosh Latin
KermitMACINTOSH-LATIN
Created by Kermit project
Current statusUsed by Kermit
Based on Mac OS Icelandic [1]
Transforms / Encodes ISO/IEC 8859-1, DEC MCS, PostScript Standard Encoding

Macintosh Latin is an obsolete character encoding which was used by Kermit (which as of 2022 supports Unicode UTF-8, [2] though not UTF-16) to represent text on the Apple Macintosh (but not by standard Mac OS fonts). It is a modification of Mac OS Icelandic [1] to include all characters in ISO/IEC 8859-1, DEC MCS, the PostScript Standard Encoding, and a Dutch ISO 646 variant [lower-alpha 1] (with ÿ or ij being a substitute for ij). [3] Although Macintosh Latin is designed to be compatible with the standard Macintosh Mac OS Roman encoding for the shared subset of characters, the two should not be confused.

Contents

Layout

Each character is shown with its equivalent Unicode code point. Only the second half of the table (code points 128255) is shown, the first half (code points 0127) being the same as ASCII.

Macintosh Latin
0123456789ABCDEF
8xÄÅÇÉÑÖÜáàâäãåçéè
9xêëíìîïñóòôöõúùûü
AxÝ°¢£§×ß®©²´¨³ÆØ
Bx¹±¼½¥μ¾ªºæø
Cx¿¡¬Łƒˋ«»¦ NBSP ÀÃÕŒœ
Dx SHY ł÷ÿŸ¤ÐðÞþ
Exý·ÂÊÁËÈÍÎÏÌÓÔ
FxÒÚÛÙıˆ˜¯˘˙˚¸˝˛ˇ
  Different from Mac OS Roman, matching Mac OS Icelandic
  Different from both Mac OS Icelandic and Mac OS Roman

See also

Footnotes

  1. The proposal mentions a "Dutch ISO 646 variant" contributing the Florin sign (ƒ). There is no Florin sign in Code page 1019, so it appears to mean Code page 1102.

Related Research Articles

<span class="mw-page-title-main">ISO/IEC 8859-1</span> Character encoding for the Latin alphabets of Western European languages

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is the basis for some popular 8-bit character sets and the first two blocks of characters in Unicode.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, emoji, and non-visual control and formatting codes.

<span class="mw-page-title-main">Mojibake</span> Garbled text as a result of incorrect character encoding

Mojibake is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 which is also referred to as "Latin-2" in Czech and Slovak regions. Code page 912 is an extension. Almost half the use of the encoding is for Polish, and it's the main legacy encoding for Polish, while virtually all use of it has been replaced by UTF-8.

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.

Mac OS Roman is a character encoding created by Apple Computer, Inc. for use by Macintosh computers. It is suitable for representing text in English and several other Western languages. Mac OS Roman encodes 256 characters, the first 128 of which are identical to ASCII, with the remaining characters including mathematical symbols, diacritics, and additional punctuation marks. Mac OS Roman is an extension of the original Macintosh character set, which encoded only 217 characters. Full support for Mac OS Roman first appeared in System 6.0.4, released in 1989, and the encoding is still supported in current versions of macOS, though the standard character encodings are now UTF-8 or UTF-16. Apple modified Mac OS Roman in 1998 with the release of Mac OS 8.5 by replacing the currency sign at position hexadecimal 0xDB with the euro sign, but otherwise the encoding has been unchanged since its release.

<span class="mw-page-title-main">ArmSCII</span> Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

Mac OS Cyrillic is a character encoding used on Apple Macintosh computers to represent texts in the Cyrillic script.

Mac OS Central European is a character encoding used on Apple Macintosh computers to represent texts in Central European and Southeastern European languages that use the Latin script. This encoding is also known as Code Page 10029. IBM assigns code page/CCSID 1282 to this encoding. This codepage contains diacritical letters that ISO 8859-2 does not have, and vice versa.

Several binary representations of 8-bit character sets for common Western European languages are compared in this article. These encodings were designed for representation of Italian, Spanish, Portuguese, French, German, Dutch, English, Danish, Swedish, Norwegian, and Icelandic, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols. Although they're called "Western European" many of these languages are spoken all over the world. Also, these character sets happen to support many other languages such as Malay, Swahili, and Classical Latin.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

<span class="mw-page-title-main">JIS X 0201</span> Japanese single byte character encoding

JIS X 0201, a Japanese Industrial Standard developed in 1969, was the first Japanese electronic character set to become widely used. It is either a 7-bit encoding or an 8-bit encoding, although the 8-bit form is dominant for modern use. The full name of this standard is 7-bit and 8-bit coded character sets for information interchange (7ビット及び8ビットの情報交換用符号化文字集合).

Mac OS Icelandic is an obsolete character encoding that was used in Apple Macintosh computers to represent Icelandic text. It is largely identical to Mac OS Roman, except for the Icelandic special characters Ý, Þ and Ð which have replaced typography characters.

The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

Mac OS Croatian is a character encoding used on Apple Macintosh computers to represent Gaj's Latin alphabet. It is a derivative of Mac OS Roman. The three digraphs, Dž, Lj, and Nj, are not encoded.

Mac OS Maltese/Esperanto, called MacOS Esperanto in older sources, is a character encoding for Esperanto, Maltese and Turkish created by Michael Everson on August 15 1997, based on the Mac OS Turkish encoding. It is used in his fonts, but not on official Mac OS fonts.

Mac OS Ogham is a character encoding for representing Ogham text on Apple Macintosh computers. It is a superset of the Irish Standard I.S. 434:1999 character encoding for Ogham, adding some punctuation characters from Mac OS Roman. It is not an official Mac OS Codepage.

Macintosh Font X is a character encoding which is used by Kermit to represent text on the Apple Macintosh. It is a modification of Mac OS Symbol to include all characters in DEC Special Graphics and the DEC Technical Character Set.

References

  1. 1 2 da Cruz, Frank (2010-04-02). "Kermit and MIME Character-Set Names". Kermit Project . Columbia University.
  2. "Kermit Character-Set Names". www.kermitproject.org. Retrieved 2022-10-26.
  3. "Macintosh Kermit code page".