PostScript Standard Encoding

Last updated
PostScript Standard Encoding
Alias(es)Code page 1276
Created by Adobe
Other related encoding(s)

The PostScript Standard Encoding (often spelled StandardEncoding, aliased as PostScript [1] ) is one of the character sets (or encoding vectors) used by Adobe Systems' PostScript (PS) since 1984. [2] In 1995, IBM assigned code page 1276 (CCSID 1276) to this character set. [3] [4] NeXT based the character set for its NeXTSTEP and OPENSTEP operating systems on this one.

Contents

Character set

The following table shows the PostScript Standard Encoding. Each character is shown with a potential Unicode equivalent. Codepoints 00hex (0) to 7Fhex (127) are nearly identical to ASCII. (The characters at positions 27hex and 60hex reflect an earlier interpretation of the visual appearance of those ASCII characters than the interpretation that was formalized in Unicode; see Quotation mark § Typewriters and early computers.) The upper half of the table contains punctuation and typographic characters, currency symbols, ligatured letters, a selection of modified base letters used in European languages, and a selection of diacritic marks to be used in composing accented letters.

PostScript Standard Encoding [5] [6] [7] [8] [2] [9] [1] [10] [11]
0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL   BS    HT    LF    VT    FF    CR    SO    SI   
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN   EM   SUB ESC   FS    GS    RS    US  
2x  SP   ! " # $ % & ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ DEL
8x
9x
Ax ¡ ¢ £ ¥ ƒ § ¤ ' «
Bx · » ¿
Cx ˋ ´ ˆ ˜ ˉ ˘ ˙ ¨ ˚ ¸ ˝ ˛ ˇ
Dx
Ex Æ ª Ł Ø Œ º
Fx æ ı ł ø œ ß

See also

Related Research Articles

ISO/IEC 8859-3:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No. 3, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin-3 or South European. It was designed to cover Turkish, Maltese and Esperanto, though the introduction of ISO/IEC 8859-9 superseded it for Turkish. The encoding was popular for users of Esperanto, but fell out of use as application support for Unicode became more common.

Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

<span class="mw-page-title-main">Windows-1252</span> Windows character set for Latin alphabet

Windows-1252 or CP-1252 is a single-byte character encoding of the Latin alphabet that was used by default in Microsoft Windows for English and many Romance and Germanic languages including Spanish, Portuguese, French, and German. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa.

ISO/IEC 8859-11:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin/Thai. It is nearly identical to the national Thai standard TIS-620 (1990). The sole difference is that ISO/IEC 8859-11 allocates non-breaking space to code 0xA0, while TIS-620 leaves it undefined.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

ISO/IEC 8859-7:2003, Information technology — 8-bit single-byte coded graphic character sets — Part 7: Latin/Greek alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Greek. It was designed to cover the modern Greek language. The original 1987 version of the standard had the same character assignments as the Greek national standard ELOT 928, published in 1986. The table in this article shows the updated 2003 version which adds three characters. Microsoft has assigned code page 28597 a.k.a. Windows-28597 to ISO-8859-7 in Windows. IBM has assigned code page 813 to ISO 8859-7. (IBM CCSID 813 is the original encoding. CCSID 4909 adds the euro sign. CCSID 9005 further adds the drachma sign and ypogegrammeni.)

ISO/IEC 8859-9:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No. 5, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1989. It is designated ECMA-128 by Ecma International and TS 5881 as a Turkish standard. It is informally referred to as Latin-5 or Turkish. It was designed to cover the Turkish language, designed as being of more use than the ISO/IEC 8859-3 encoding. It is identical to ISO/IEC 8859-1 except for the replacement of six Icelandic characters with characters unique to the Turkish alphabet. And the uppercase of i is İ; the lowercase of I is ı.

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters).

KOI8-R is an 8-bit character encoding, derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses a Cyrillic alphabet. KOI8-R was based on Russian Morse code, which was created from a phonetic version of Latin Morse code. As a result, Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order. Although this may seem unnatural, if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI-7. For example, "Русский Текст" in KOI8-R becomes rUSSKIJ tEKST.

Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages.

The Hong Kong Supplementary Character Set is a set of Chinese characters – 4,702 in total in the initial release—used in Cantonese, as well as when writing the names of some places in Hong Kong.

<span class="mw-page-title-main">Code page 866</span> Computer character set for Russian

Code page 866 is a code page used under DOS and OS/2 in Russia to write Cyrillic script. It is based on the "alternative code page" developed in 1984 in IHNA AS USSR and published in 1986 by a research group at the Academy of Science of the USSR. The code page was widely used during the DOS era because it preserves all of the pseudographic symbols of code page 437 and maintains alphabetic order of Cyrillic letters. Initially this encoding was only available in the Russian version of MS-DOS 4.01 (1990), but with MS-DOS 6.22 it became available in any language version.

Mac OS Central European is a character encoding used on Apple Macintosh computers to represent texts in Central European and Southeastern European languages that use the Latin script. This encoding is also known as Code Page 10029. IBM assigns code page/CCSID 1282 to this encoding. This codepage contains diacritical letters that ISO 8859-2 does not have, and vice versa.

<span class="mw-page-title-main">Unified Hangul Code</span> Windows character set for Korean

Unified Hangul Code (UHC), or Extended Wansung, also known under Microsoft Windows as Code Page 949, is the Microsoft Windows code page for the Korean language. It is an extension of Wansung Code to include all 11172 non-partial Hangul syllables present in Johab. This corresponds to the pre-composed syllables available in Unicode 2.0 and later.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

Code page 895 is a 7-bit character set and is Japan's national ISO 646 variant. It is the Roman set of the JIS X 0201 Japanese Standard and is variously called Japan 7-Bit Latin, JISCII, JIS Roman, JIS C6220-1969-ro, ISO646-JP or Japanese-Roman. Its ISO-IR registration number is 14.

<span class="mw-page-title-main">Code page 949 (IBM)</span>

IBM code page 949 (IBM-949) is a character encoding which has been used by IBM to represent Korean language text on computers. It is a variable-width encoding which represents the characters from the Wansung code defined by the South Korean standard KS X 1001 in a format compatible with EUC-KR, but adds IBM extensions for additional hanja, additional precomposed Hangul syllables, and user-defined characters.

The NeXT character set was used by the NeXTSTEP and OPENSTEP operating systems on NeXT workstations beginning in 1988. It is based on Adobe Systems' PostScript (PS) character set aka Adobe Standard Encoding where unused code points were filled up with characters from ISO 8859-1 (Latin 1), although at differing code points.

The PostScript Latin 1 Encoding is one of the character sets used by Adobe Systems' PostScript (PS) since 1984 (1982). In 1995, IBM assigned code page 1277 to this character set. It is a superset of ISO 8859-1.

References

  1. 1 2 Czyborra, Roman (1998-06-27). "Codepage & Co". AdobeStandardEncoding. Archived from the original on 2016-12-07. Retrieved 2016-12-06.
  2. 1 2 Adobe Systems Incorporated (February 1999) [1985]. PostScript Language Reference Manual (PDF) (1st printing, 3rd ed.). Addison-Wesley Publishing Company. ISBN   0-201-37922-8. Archived (PDF) from the original on 2017-02-18. Retrieved 2017-02-18. (NB. This book is informally called "red book" due to its red cover.)
  3. "Code page 1276 information document". Archived from the original on 2017-02-18.
  4. "CCSID 1276 information document". Archived from the original on 2016-03-27.
  5. Code Page CPGID 01276 (pdf) (PDF), IBM
  6. Code Page CPGID 01276 (txt), IBM
  7. International Components for Unicode (ICU), ibm-1276_P100-1995.ucm, 2002-12-03
  8. "Adobe Standard Encoding to Unicode". 1.0. Unicode, Inc. 2011-07-12 [1995-05-05]. Retrieved 2017-02-25.
  9. Adobe Systems Incorporated (1990) [1985]. PostScript Language Reference Manual (2nd ed.). Addison-Wesley Publishing Company. (NB. This edition also contains a description of Display PostScript, which is no longer discussed in the third edition.)
  10. Sicherman, George (2011). "PostScript Standard Encoding" . Retrieved 2023-04-20.
  11. Kostis, Kosta (2000). "Adobe StandardEncoding Encoding Vector". 1.20. Archived from the original on 2017-02-18. Retrieved 2017-02-18.