NeXT character set

Last updated
NeXTSTEP Multinational
Kermitnext-multinational
Alias(es)WE8NEXTSTEP
Created by NeXT
Extends PostScript Standard Encoding
Transforms / Encodes ISO-8859-1 [lower-alpha 1]
Other related encoding(s)

The NeXT character set (often aliased as NeXTSTEP encoding vector, WE8NEXTSTEP [1] or next-multinational [2] ) was used by the NeXTSTEP and OPENSTEP operating systems on NeXT workstations beginning in 1988. It is based on Adobe Systems' PostScript (PS) character set aka Adobe Standard Encoding where unused code points were filled up with characters from ISO 8859-1 (Latin 1), although at differing code points. [3]

Contents

Character set

The following table shows the NeXT character set. Each character is shown with a potential Unicode equivalent. Codepoints 00hex (0) to 7Fhex (127) are nearly identical to ASCII.

NeXT character set [4] [5] [6] [3] [7] [8]
0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL   BS    HT    LF    VT    FF    CR    SO    SI   
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN   EM   SUB ESC   FS    GS    RS    US  
2x  SP   ! " # $ % & [3] ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x [3] a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ DEL
8x fsp À Á Â Ã Ä Å Ç È É Ê Ë Ì Í Î Ï
9x Ð Ñ Ò Ó Ô Õ Ö Ù Ú Û Ü Ý Þ µ × ÷
Ax © ¡ ¢ £ ¥ ƒ § ¤ ' [3] «
Bx ® · ¦ » [3] ¬ ¿
Cx ¹ ˋ ´ ˆ ˜ ¯ ˘ ˙ ¨ ² ˚ [3] ¸ ³ ˝ ˛ ˇ
Dx ± ¼ ½ ¾ à á â ã ä å ç è é ê ë
Ex ì Æ í ª î ï ð ñ Ł Ø Œ º ò ó ô õ
Fx ö æ ù ú û ı ü ý ł ø œ ß þ ÿ
  Differences from Adobe Standard Encoding

See also

Footnotes

  1. If the left single quotation mark and/or the modifier letter grave accent is unified with the backtick, the degree sign is unified with the high ring, and the soft hyphen is unified with the en dash. Not counting C1 control codes.

Related Research Articles

<span class="mw-page-title-main">ISO/IEC 8859-1</span> Character encoding

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is the basis for some popular 8-bit character sets and the first two blocks of characters in Unicode.

<span class="mw-page-title-main">NeXT</span> American technology company (1985–1997)

NeXT, Inc. was an American technology company headquartered in Redwood City, California that specialized in computer workstations for higher education and business markets, and later developed web software. It was founded in 1985 by CEO Steve Jobs, the Apple Computer co-founder who had been forcibly removed from Apple that year. NeXT debuted with the NeXT Computer in 1988, and released the NeXTcube and smaller NeXTstation in 1990. The series had relatively limited sales, with only about 50,000 total units shipped. Nevertheless, the object-oriented programming and graphical user interface were highly influential trendsetters of computer innovation.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit.

ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9. It is similar to ISO 8859-1, and thus also intended for “Western European” languages, but replaces some less common symbols with the euro sign and some letters that were deemed necessary: This encoding is by far most used, close to half the use, by German, though this is the least used encoding for German.

Display PostScript is a 2D graphics engine system for computers that uses the PostScript (PS) imaging model and language to generate on-screen graphics. To the basic PS system, DPS adds a number of features intended to ease working with bitmapped displays and improve performance of some common tasks.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.

<span class="mw-page-title-main">Windows-1252</span> Windows character set for Latin alphabet

Windows-1252 or CP-1252 is a single-byte character encoding of the Latin alphabet that was used by default in Microsoft Windows for English and many Romance and Germanic languages including Spanish, Portuguese, French, and German. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa.

The Multinational Character Set is a character encoding created in 1983 by Digital Equipment Corporation (DEC) for use in the popular VT220 terminal. It was an 8-bit extension of ASCII that added accented characters, currency symbols, and other character glyphs missing from 7-bit ASCII. It is only one of the code pages implemented for the VT220 National Replacement Character Set (NRCS). MCS is registered as IBM code page/CCSID 1100 since 1992. Depending on associated sorting Oracle calls it WE8DEC, N8DEC, DK8DEC, S8DEC, or SF8DEC.

KOI8-R is an 8-bit character encoding, derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses a Cyrillic alphabet. KOI8-R was based on Russian Morse code, which was created from a phonetic version of Latin Morse code. As a result, Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order. Although this may seem unnatural, if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI-7. For example, "Русский Текст" in KOI8-R becomes rUSSKIJ tEKST.

KOI8-U is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.

KOI (КОИ) is a family of several code pages for the Cyrillic script. The name stands for Kod obmena informatsiey which means "Code for Information Interchange".

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

MIK (МИК) is an 8-bit Cyrillic code page used with DOS. It is based on the character set used in the Bulgarian Pravetz 16 IBM PC compatible system. Kermit calls this character set "BULGARIA-PC" / "bulgaria-pc". In Bulgaria, it was sometimes incorrectly referred to as code page 856. This code page is known by FreeDOS as Code page 3021.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 15.1, Unicode defines a total of 97,680 characters.

CWI-2 is a Hungarian code page frequently used in the 1980s and early 1990s. If this code page is erroneously interpreted as code page 437, it will still be fairly readable.

The PostScript Standard Encoding is one of the character sets used by Adobe Systems' PostScript (PS) since 1984. In 1995, IBM assigned code page 1276 to this character set. NeXT based the character set for its NeXTSTEP and OPENSTEP operating systems on this one.

The PostScript Latin 1 Encoding is one of the character sets used by Adobe Systems' PostScript (PS) since 1984 (1982). In 1995, IBM assigned code page 1277 to this character set. It is a superset of ISO 8859-1.

References

  1. Baird, Cathy; Chiba, Dan; Chu, Winson; Fan, Jessica; Ho, Claire; Law, Simon; Lee, Geoff; Linsley, Peter; Matsuda, Keni; Oscroft, Tamzin; Takeda, Shige; Tanaka, Linus; Tozawa, Makoto; Trute, Barry; Tsujimoto, Mayumi; Wu, Ying; Yau, Michael; Yu, Tim; Wang, Chao; Wong, Simon; Zhang, Weiran; Zheng, Lei; Zhu, Yan; Moore, Valarie (2002) [1996]. "Appendix A: Locale Data". Oracle9i Database Globalization Support Guide (PDF) (Release 2 (9.2) ed.). Oracle Corporation. Oracle A96529-01. Archived (PDF) from the original on 2017-02-14. Retrieved 2017-02-14.
  2. "Character sets". Kermit . Columbia University. 2000-01-01. Archived from the original on 2017-02-15. Retrieved 2017-02-15.
  3. 1 2 3 4 5 6 7 "Keyboard Event Information - Encoding Vectors". NeXT Computer, Inc. 1995. Archived from the original on 2017-02-12. Retrieved 2017-02-12.
  4. McGowan, Rick (1999-09-23). "NextStep Encoding to Unicode". 0.1. Unicode, Inc. Retrieved 2017-02-12.
  5. Czyborra, Roman (1998-06-27). "Codepage & Co". NeXTSTEP. Archived from the original on 2016-12-07. Retrieved 2016-12-06.
  6. Flohr, Guido (2016) [2002]. "Locale::RecodeData::NEXTSTEP - Conversion routines for NEXTSTEP". CPAN libintl-perl. Archived from the original on 2017-02-18. Retrieved 2017-02-18.
  7. Kostis, Kosta (2000). "NeXTSTEP Encoding Vector". 1.20. Archived from the original on 2017-02-18. Retrieved 2017-02-18.
  8. "NeXT Character Set". Kermit . Columbia University . Retrieved 2020-06-24.