Xerox Character Code Standard

Last updated
Xerox Character Code Standard (XCCS)
Language(s)English, French, German, Russian, Chinese, Japanese, Korean
Created byXerox

The Xerox Character Code Standard (XCCS) is a historical 16-bit character encoding that was created by Xerox [1] in 1980 for the exchange of information between elements of the Xerox Network Systems Architecture. [2] It encodes the characters required for languages using the Latin, Arabic, Hebrew, Greek and Cyrillic scripts, the Chinese, Japanese and Korean writing systems, and technical symbols. [3]

Contents

It can be viewed as an early precursor of, and inspiration for, the Unicode Standard. [4] [1]

The International Character Set (ICS) is compatible with XCCS. [5]

The XCCS 2.0 (1990) revision covers Latin, Arabic, Hebrew, Gothic, Armenian, Runic, Georgian, Greek, Cyrillic, Hiragana, Katakana, Bopomofo scripts, technical, and mathematical symbols. [6]

Code charts

Character sets overview

XCCS Lead byte
0123456789ABCDEF
0x 00
1x
2x 21 22 23 24 25 26 27 28
3x 30 31 32333435363738393A3B3C3D3E3F
4x404142434445464748494A4B4C4D4E4F
5x505152535455565758595A5B5C5D5E5F
6x606162636465666768696A6B6C6D6E6F
7x7071727374
8x
9x
Ax
Bx
Cx
Dx
Ex E0 E1 E2 E3 EE EF
Fx F0 F1 FEFF

Character set 0x00

XCCS (prefixed with 0x00)
0123456789ABCDEF
0x
1x
2x  SP   ! " # ¤ % & ʼ ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~
8x
9x
Ax ¡ ¢ £ $ ¥ § «
Bx ° ± ² ³ × µ · ÷ » ¼ ½ ¾ ¿
Cx ` ´ ˆ ˜ ¯ ˘ ˙ ¨ ˚ ¸ ˍ ˝ ˛ ˇ
Dx ¹ © ®
Ex Æ Ð ª Ħ ȷ IJ Ŀ Ł Ø Œ º Þ Ŧ Ŋ ʼn
Fx ĸ æ đ ð ħ ı ij ŀ ł ø œ ß þ ŧ ŋ

Character set 0x21

XCCS (prefixed with 0x21)
0123456789ABCDEF
0x
1x
2x IDSP · ´ ¨
3x
4x
5x ×
6x ÷ ° ¥
7x ¢ £ §
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0x22

XCCS (prefixed with 0x22)
0123456789ABCDEF
0x
1x
2x
3x
4x ¬
5x
6x
7x
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0x23

XCCS (prefixed with 0x23)
0123456789ABCDEF
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
Ax
Bx
Cx
Dx ̣
Ex
Fx

Character set 0x24

XCCS (prefixed with 0x24)
0123456789ABCDEF
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0x25

XCCS (prefixed with 0x25)
0123456789ABCDEF
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0x26

XCCS (prefixed with 0x26)
0123456789ABCDEF
0x
1x
2x Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο
3x Π Ρ Σ Τ Υ Φ Χ Ψ Ω ;
4x α β γ δ ε ζ η θ ι κ λ μ ν ξ ο
5x π ρ σ τ υ φ χ ψ ω
6x
7x ς
8x
9x
Ax
Bx ΄ ΅
Cx
Dx
Ex
Fx

Character set 0x27

XCCS (prefixed with 0x27)
0123456789ABCDEF
0x
1x
2x А Б В Г Д Е Ё Ж З И Й К Л М Н
3x О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э
4x Ю Я
5x а б в г д е ё ж з и й к л м н
6x о п р с т у ф х ц ч ш щ ъ ы ь э
7x ю я
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0x28

XCCS (prefixed with 0x28)
0123456789ABCDEF
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0x30

XCCS (prefixed with 0x30)
0123456789ABCDEF
0x
1x
2x
3x
4x
5x 禿
6x
7x
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0x31

XCCS (prefixed with 0x31)
0123456789ABCDEF
0x
1x
2x
3x
4x
5x
6x 沿
7x
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0xE0

XCCS (prefixed with 0xE0)
0123456789ABCDEF
0x
1x
2x
3x
4x
5x
6x א ב ג ד ה ו ז ח ט י ך כ ל ם מ
7x ן נ ס ע ף פ ץ צ ק ר ש ת
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0xE1

XCCS (prefixed with 0xE1)
0123456789ABCDEF
0x
1x
2x
3x
4x ء آ أ ؤ إ ئ ا ب ة ت ث ج ح خ د
5x ذ ر ز س ش ص ض ط ظ ع غ
6x ـ ف ق ك ل م ن ه و ى ي ً ٌ ٍ َ ُ
7x ِ ّ ْ ٓ ٔ ٕ ٖ ٗ ٘ ٙ ٚ
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0xE2

XCCS (prefixed with 0xE2)
0123456789ABCDEF
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
Ax
Bx
Cx
Dx j ʎ ŋ k ɡ
Ex x ɣ ɰ g ɴ ƞ q ɢ χ ʁ ʀ ħ ʕ ʔ h ɦ
Fx

Character set 0xE3

XCCS (prefixed with 0xE3)
0123456789ABCDEF
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
Ax
Bx
Cx
Dx HCF
Ex
Fx HVF

Character set 0xEE

XCCS (prefixed with 0xEE)
0123456789ABCDEF
0x
1x
2x NBSP 3/MSP 4/MSP HSP PSP ENSP EMSP FSP .
3x
4x
5x
6x
7x / / / / |
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0xEF

XCCS (prefixed with 0xEF)
0123456789ABCDEF
0x
1x
2x ' /
3x
4x
5x
6x ¬ ¦
7x
8x
9x
Ax ƒ
Bx
Cx
Dx
Ex
Fx

Character set 0xF0

XCCS (prefixed with 0xF0)
0123456789ABCDEF
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
Ax
Bx
Cx
Dx
Ex
Fx

Character set 0xF1

XCCS (prefixed with 0xF1)
0123456789ABCDEF
0x
1x
2x Á À Â É Ü Î Ä Å Ó Ò Ú Ù Ç Í Ì
3x Æ Ø Œ
4x
5x Ö
6x
7x
8x
9x
Ax á à â é ü î ä å ó ò ú ù ç í ì
Bx æ ø œ
Cx
Dx ö
Ex
Fx

See also

Related Research Articles

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. The ISO working group maintaining this series of standards has been disbanded.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji, and non-visual control and formatting codes.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

<span class="mw-page-title-main">Open-source Unicode typefaces</span>

There are Unicode typefaces which are open-source and designed to contain glyphs of all Unicode characters, or at least a broad selection of Unicode scripts. There are also numerous projects aimed at providing only a certain script, such as the Arabeyes Arabic font. The advantage of targeting only some scripts with a font was that certain Unicode characters should be rendered differently depending on which language they are used in, and that a font that only includes the characters a certain user needs will be much smaller in file size compared to one with many glyphs. Unicode fonts in modern formats such as OpenType can in theory cover multiple languages by including multiple glyphs per character, though very few actually cover more than one language's forms of the unified Han characters.

The currency sign¤ is a character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225° (SW) and 315° (NW). It is raised slightly above the baseline. The character is sometimes called scarab.

In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but sometimes represent symbols, control characters, or formatting. The set of all possible code points within a given encoding/character set make up that encoding's codespace.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 15.0, five of the planes have assigned code points (characters), and seven are named.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

Joseph D. Becker is an American computer scientist and one of the co-founders of the Unicode project, and a Technical Vice President Emeritus of the Unicode Consortium. He has worked on artificial intelligence at BBN and multilingual workstation software at Xerox.

The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

In mobile telephony GSM 03.38 or 3GPP 23.038 is a character encoding used in GSM networks for SMS, CB and USSD. The 3GPP TS 23.038 standard defines GSM 7-bit default alphabet which is mandatory for GSM handsets and network elements, but the character set is suitable only for English and a number of Western-European languages. Languages such as Chinese, Korean or Japanese must be transferred using the 16-bit UCS-2 character encoding. A limited number of languages, like Portuguese, Spanish, Turkish and a number of languages used in India written with a Brahmic scripts may use 7-bit encoding with national language shift table defined in 3GPP 23.038. For binary messages, 8-bit encoding is used.

<span class="mw-page-title-main">Lee Collins (Unicode)</span> Software engineer and co-founder of the Unicode Consortium

Lee Collins is a software engineer and co-founder of the Unicode Consortium. In 1987, along with Joe Becker and Mark Davis they began to develop what is today known as Unicode. Collins has a Master of Arts in East Asian Languages and Cultures from Columbia University and was the Technical Vice President of Unicode Consortium from 1991 to 1993.

The Lotus Multi-Byte Character Set (LMBCS) is a proprietary multi-byte character encoding originally conceived in 1988 at Lotus Development Corporation with input from Bob Balaban and others. Created around the same time and addressing some of the same problems, LMBCS could be viewed as parallel development and possible alternative to Unicode. For maximum compatibility, later issues of LMBCS incorporate UTF-16 as a subset.

David G. Opstad is a retired American computer scientist specializing during his career in computer typography and information processing, leading to several breakthroughs. Opstad was a contributor to Unicode 1.0, together with Joe Becker, Lee Collins, Huan-mei Liao, and Nelson Ng.

References

  1. 1 2 Haralambous, Yannis (September 2007). Fonts & Encodings . Translated by Horne, P. Scott (1st ed.). Sebastopol, California, USA: O'Reilly Media, Inc. p.  53. ISBN   978-0-596-10242-5.
  2. "Xerox System Network Architecture General Information Manual". Xerox Corporation. April 1985. pp.  57–63. Retrieved 2016-10-25.
  3. Centerlind, Tomas (1987-06-18). "International Character Code Standard for the BE2" (PDF). Information Technology Center (ITC), Carnegie Mellon University. CMU-ITC-87-091. Archived (PDF) from the original on 2016-11-25. Retrieved 2016-10-25.
  4. Becker, Joseph D. (1998-09-10) [1988-08-29]. "Unicode 88" (PDF). unicode.org (10th anniversary reprint ed.). Unicode Consortium. Archived (PDF) from the original on 2016-11-25. Retrieved 2016-10-25. In 1978, the initial proposal for a set of "Universal Signs" was made by Bob Belleville at Xerox PARC. Many persons contributed ideas to the development of a new encoding design. Beginning in 1980, these efforts evolved into the Xerox Character Code Standard (XCCS) by the present author, a multilingual encoding which has been maintained by Xerox as an internal corporate standard since 1982, through the efforts of Ed Smura, Ron Pellar, and others.
    Unicode arose as the result of eight years of working experience with XCCS. Its fundamental differences from XCCS were proposed by Peter Fenwick and Dave Opstad (pure 16-bit codes), and by Lee Collins (ideographic character unification). Unicode retains the many features of XCCS whose utility have been proved over the years in an international line of communication multilingual system products.
  5. Salmons, Jim; Babitshky, Timlynn (1992). International OOP Directory. COOT, Inc. pp. 3–98.
  6. Whistler, Kenneth. "Re: Questions about Unicode history" . Retrieved 6 October 2017.

Further reading