National Replacement Character Set

Last updated
DEC NRCS encoding family
NRCS-infobox.svg
Invariant subset of NRCS. Red Bowen knots (⌘) denote national code points.
Alias(es)National Replacement Character Set
Preceded by US-ASCII
Succeeded by ISO 8859, ISO 10646
Other related encoding(s) ISO 646

The National Replacement Character Set (NRCS) was a feature supported by later models of Digital's (DEC) computer terminal systems, starting with the VT200 series in 1983. NRCS allowed individual characters from one character set to be replaced by one from another set, allowing the construction of different character sets on the fly. It was used to customize the character set to different local languages, without having to change the terminal's ROM for different countries, or alternately, include many different sets in a larger ROM. Many 3rd party terminals and terminal emulators supporting VT200 codes also supported NRCS.

Contents

Description

ASCII is a 7-bit standard, allowing a total of 128 characters in the character set. Some of these are reserved as control characters, leaving 96 printable characters. This set of 96 printable characters includes upper and lower case letters, numbers, and basic math and punctuation.

ASCII does not have enough room to include other common characters such as multi-national currency symbols or the various accented letters common in European languages. This led to a number of country-specific varieties of 7-bit ASCII with certain characters replaced. For instance, the UK standard simply replaced ASCII's hash mark, #, with the pound symbol, £. This normally led to different models of a given computer terminal or printer, differing only in the glyphs stored in ROM. Some of these were standardized as part of ISO/IEC 646. [1] [2]

On an 8-bit clean serial link, ASCII can be expanded to support a total of 256 characters. In this case, instead of replacing the characters in the original printable characters range from 32 to 127, new characters are added in the 128 to 255 range. This offers enough room for a single character set to include all the variety of characters used in North America and western Europe. This capability led to the introduction of the ISO/IEC 8859-1 standard character set containing 191 characters of what it calls the "Latin alphabet no. 1", but normally referred to as "ISO Latin". Windows-1252 is a slightly expanded superset of ISO Latin. [2]

NRCS was introduced to solve the problem of requiring different terminals for each country by allowing characters in the basic 7-bit ASCII set to be re-defined by copying the glyph from the DEC's version of ISO Latin, the Multinational Character Set (MCS). This meant that the ROM had to store only two character sets, standard ASCII and MCS, and could build any required local ASCII variant on the fly. For instance, instead of having a separate "UK ASCII" version of the terminal with a modified glyph in ROM, the terminal included an NRCS with instructions to replace the hash mark glyph with the pound. When used in the UK, typing Shift 3 produced the pound, the same keys pressed on a US terminal produced hash. [2]

The NRCS could be set through a setup command, or more commonly, by replacing the keyboard with a model that sent back a code when first booted. That way simply plugging in a UK keyboard, which had a pound sign on the 3 key, automatically set the NRCS to that same replacement. [2]

NRC Sets

DEC terminals from the VT220 on had 12 different NRCS sets in addition to standard ASCII: [2]

Character setCode pageStandard0x230x400x5B0x5C0x5D0x5E0x5F0x600x7B0x7C0x7D0x7E
Standard ASCII [3] 367 ASCII, ISO 646-US IR 6 # @ [ \ ] ^ _ ` { | } ~
United Kingdom [4] [3] 1101 [5] DEC, ISO [6] £ @ [ \ ] ^ _ ` { | } ~
Denmark/Norway [4] [3] (Alternate) 1107 [7] DEC, ISO [6] # @ Æ Ø Å ^ _ ` æ ø å ~
Denmark/Norway 1105 [8] DEC # Ä Æ Ø Å Ü _ ä æ ø å ü
Dutch [4] 1102 [9] DEC £ ¾ ij   [4] [9] ½ |   [9] ^ _ ` ¨ ƒ ¼ ´   [9]
Finnish [4] [3] 1103 [10] DEC [6] # @ Ä Ö Å Ü _ é ä ö å ü
French [4] [3] 1104 [11] DEC, ISO [6] £ à °   [11] ç § ^ _ ` é ù è ¨   [11]
French Canadian [4] [3] 1020 [12] DEC [6] # à â ç ê î _ ô é ù è û
German [4] [3] 1011, [13] 20106 [14] [15] [16] ISO 646-DE IR 21, [17] [18] DIN   66003 # § Ä Ö Ü ^ _ ` ä ö ü ß
Italian [4] [3] 1012 [19] ISO 646-IT IR 15, [17] [18] UNI 0204-70 £ § °   [19] ç é ^ _ ù à ò è ì
Portuguese [2] [3] [nb 1] DEC [6] # @   [2] [3] [nb 1] Ã Ç Õ ^ _ ` ã ç õ ~
Spanish [4] [3] 1023 [20] DEC, ISO [6] £ § ¡ Ñ ¿ ^ _ ` ˚   [20] (°) ñ ç ~
Swedish [4] [3] 1106 [21] DEC, ISO [6] # É Ä Ö Å Ü _ é ä ö å ü
Swiss [4] [3] 1021 [22] DEC [6] ù à é ç ê î è ô ä ö ü û

See also

Notes

  1. 1 2 This DEC character set is similar to ISO 646-PT2  / IR 84 aka IBM code page 1015, except for code point 64 (0x40), which is assigned to "@" in the DEC character set, but to "´" in the ISO character set.

Related Research Articles

ISO/IEC 646 is the name of a set of ISO standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

The Multinational Character Set is a character encoding created in 1983 by Digital Equipment Corporation (DEC) for use in the popular VT220 terminal. It was an 8-bit extension of ASCII that added accented characters, currency symbols, and other character glyphs missing from 7-bit ASCII. It is only one of the code pages implemented for the VT220 National Replacement Character Set (NRCS). MCS is registered as IBM code page/CCSID 1100 since 1992. Depending on associated sorting Oracle calls it WE8DEC, N8DEC, DK8DEC, S8DEC, or SF8DEC.

KOI8-U is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.

The currency sign¤ is a character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225° (SW) and 315° (NW). It is raised slightly above the baseline. The character is sometimes called scarab.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

Code page 1009, also known as CP1009 (IBM) and CP20105 (Microsoft), is the International Reference Version (IRV) of ISO 646:1983 until its redefinition in ISO/IEC 646:1991.

The German standard DIN 66003, also known as Code page 1011 by IBM, Code page 20106 by Microsoft and D7DEC by Oracle, is a modification of 7-bit ASCII with adaptations for the German language, replacing certain symbol characters with umlauts and the eszett. It is the German national version of ISO/IEC 646, and also a localised option in DEC's National Replacement Character Set (NRCS) for their VT220 terminals.

Code page 1012, also known as CP1012 or I7DEC, is IBM's code page for the Italian version of ISO 646, also known as ISO 646-IT IR 15. The character set was originally specified in UNI 0204-70. It is also part of DEC's National Replacement Character Set (NRCS) for their VT220 terminals.

Code page 1101, also known as CP1101, is an IBM code page number assigned to the UK variant of DEC's National Replacement Character Set (NRCS). The 7-bit character set was introduced for DEC's computer terminal systems, starting with the VT200 series in 1983, but is also used by IBM for their DEC emulation. Similar but not identical to the series of ISO 646 character sets, the character set is a close derivation from ASCII with only code point 0x23 differing.

Code page 1107, also known as CP1107, is an IBM code page number assigned to the alternate Denmark/Norway variant of DEC's National Replacement Character Set (NRCS). The 7-bit character set was introduced for DEC's computer terminal systems, starting with the VT200 series in 1983, but is also used by IBM for their DEC emulation. Similar but not identical to the series of ISO 646 character sets, the character set is a close derivation from ASCII with only six code points differing.

Code page 1105, also known as CP1105, is an IBM code page number assigned to the Denmark/Norway variant of DEC's National Replacement Character Set (NRCS). The 7-bit character set was introduced for DEC's computer terminal systems, starting with the VT200 series in 1983, but is also used by IBM for their DEC emulation. Similar but not identical to the series of ISO 646 character sets, the character set is a close derivation from ASCII with only ten code points differing.

Code page 1103, also known as CP1103, or SF7DEC, is an IBM code page number assigned to the Finnish variant of DEC's National Replacement Character Set (NRCS). The 7-bit character set was introduced for DEC's computer terminal systems, starting with the VT200 series in 1983, but is also used by IBM for their DEC emulation. Similar but not identical to the series of ISO 646 character sets, the character set is a close derivation from ASCII with only nine code points differing.

Code page 1106, also known as CP1106 or S7DEC, is an IBM code page number assigned to the Swedish variant of DEC's National Replacement Character Set (NRCS). The 7-bit character set was introduced for DEC's computer terminal systems, starting with the VT200 series in 1983, but is also used by IBM for their DEC emulation. Similar but not identical to the series of ISO 646 character sets, the character set is a close derivation from ASCII with only ten code points differing.

Code page 1104, also known as CP1104, F7DEC, ISO-IR-025 or NF Z 62-010 (1973) is an IBM code page number assigned to the French variant of DEC's National Replacement Character Set (NRCS). The 7-bit character set was introduced for DEC's computer terminal systems, starting with the VT200 series in 1983, but it is also used by IBM for their DEC emulation.

Code page 1020, also known as CP1020, is an IBM code page number assigned to the French-Canadian variant of DEC's National Replacement Character Set (NRCS). The 7-bit character set was introduced for DEC's computer terminal systems, starting with the VT200 series in 1983, but is also used by IBM for their DEC emulation. Similar but not identical to the series of ISO 646 character sets, the character set is a close derivation from ASCII with only ten code points differing.

Code page 1021, also known as CP1021 or CH7DEC, is an IBM code page number assigned to the Swiss variant of DEC's National Replacement Character Set (NRCS). The 7-bit character set was introduced for DEC's computer terminal systems, starting with the VT200 series in 1983, but is also used by IBM for their DEC emulation. Similar but not identical to the series of ISO 646 character sets, the character set is a close derivation from ASCII with only twelve code points differing.

Code page 1102, also known as CP1102 or NL7DEC, is an IBM code page number assigned to the Dutch variant of DEC's National Replacement Character Set (NRCS). The 7-bit character set was introduced for DEC's computer terminal systems, starting with the VT200 series in 1983, but is also used by IBM for their DEC emulation. It is called DUTCH by Kermit.

Code page 1023, also known as CP1023 or E7DEC, is an IBM code page number assigned to the Spanish variant of DEC's National Replacement Character Set (NRCS). The 7-bit character set was introduced for DEC's computer terminal systems, starting with the VT200 series in 1983, but is also used by IBM for their DEC emulation. Similar but not identical to the series of ISO 646 character sets, the character set is a close derivation from ASCII with only eight code points differing.

Code page 1287, also known as CP1287, DEC Greek (8-bit) and EL8DEC, is one of the code pages implemented for the VT220 terminals. It supports the Greek language.

Code page 1288, also known as CP1288, DEC Turkish (8-bit) and TR8DEC, is one of the code pages implemented for the VT220 terminals. It supports the Turkish language.

References

  1. Hartman Kennelly, Cynthia (1991). Unch, Jacqueline (ed.). Digital Guide To Developing International Software (1 ed.). Digital Equipment Corporation. ISBN   1-55558-063-7. EY-F577E-DP.
  2. 1 2 3 4 5 6 7 DEC (June 1987). "Appendix E". VT320 Programming Summary. Digital Press. (The provided link goes to a digitized version, which contains some subtle OCR errors, therefore isn't a reliable reference for the character set mappings)
  3. 1 2 3 4 5 6 7 8 9 10 11 12 13 DEC (February 1992) [November 1989]. "Chapter 2: Character Encoding - National Replacement Character Sets (NRC Sets) (Worldwide Models Only)". VT420 Programmer Reference Manual (PDF) (2 ed.). Digital Equipment Corporation. p. 28. EK–VT420–RM.002. Archived (PDF) from the original on 2017-01-29. Retrieved 2017-01-29.
  4. 1 2 3 4 5 6 7 8 9 10 11 12 "VT220 Programmer Reference Manual" (2 ed.). Digital Equipment Corporation (DEC). 1984 [1983].
  5. "SBCS code page information - CPGID: 01101 / Name: British NRC Set". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-12-05. Retrieved 2016-12-05.
  6. 1 2 3 4 5 6 7 8 9 10 11 12 Digital Equipment Corporation (DEC). "7. Character Sets". VT510 Video Terminal Programmer Information . Retrieved 2017-02-18.
  7. "SBCS code page information - CPGID: 01107 / Name: Norwegian/Danish NRC Alternate". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-12-05. Retrieved 2016-12-05.
  8. "SBCS code page information - CPGID: 01105 / Name: Norwegian/Danish NRC Set". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-12-05. Retrieved 2016-12-05.
  9. 1 2 3 4 "SBCS code page information - CPGID: 01102 / Name: Dutch NRC Set". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-12-05. Retrieved 2016-12-05.
  10. "SBCS code page information - CPGID: 01103 / Name: Finnish NRC Set". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-12-05. Retrieved 2016-12-05.
  11. 1 2 3 "SBCS code page information - CPGID: 01104 / Name: French NRC Set". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-12-05. Retrieved 2016-12-05.
  12. "SBCS code page information - CPGID: 01020 / Name: Canadian (French) Variant". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-12-05. Retrieved 2016-12-05.
  13. "SBCS code page information - CPGID: 01011 / Name: 7-Bit Germany F.R." IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  14. "Code Page Identifiers". Microsoft Developer Network . Microsoft. 2014. Archived from the original on 2016-06-19. Retrieved 2016-06-19.
  15. "Web Encodings - Internet Explorer - Encodings". WHATWG Wiki. 2012-10-23. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
  16. Foller, Antonin (2014) [2011]. "German (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
  17. 1 2 Bemer, Robert William (1980). "Chapter 1: Inside ASCII". General Purpose Software (PDF). Best of Interface Age. Vol. 2. Portland, OR, USA: dilithium Press. pp. 1–50. ISBN   0-918398-37-1. LCCN   79-67462. Archived from the original on 2016-08-27. Retrieved 2016-08-27, from: Bemer, Robert William (May 1978). "Inside ASCII - Part I". Interface Age. Portland, OR, USA: dilithium Press. 3 (5): 96–102., Bemer, Robert William (June 1978). "Inside ASCII - Part II". Interface Age. Portland, OR, USA: dilithium Press. 3 (6): 64–74., Bemer, Robert William (July 1978). "Inside ASCII - Part III". Interface Age. Portland, OR, USA: dilithium Press. 3 (7): 80–87.
  18. 1 2 "HP PCL/PJL Reference PCL 5 Comparison Guide" (PDF) (2 ed.). Hewlett-Packard Company, LP. June 2003. HP part-number 502-0378. Archived from the original (PDF) on 2016-08-10. Retrieved 2016-08-10.
  19. 1 2 "SBCS code page information - CPGID: 01012 / Name: 7-Bit Italy". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  20. 1 2 "SBCS code page information - CPGID: 01023 / Name: Spain Variant". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-12-05. Retrieved 2016-12-05.
  21. "SBCS code page information - CPGID: 01106 / Name: Swedish British NRC Set". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-12-05. Retrieved 2016-12-05.
  22. "SBCS code page information - CPGID: 01021 / Name: Switzerland Variant". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-12-05. Retrieved 2016-12-05.