Stanford Extended ASCII

Last updated

Stanford Extended ASCII (SEASCII) is a derivation of the 7-bit ASCII character set developed at the Stanford Artificial Intelligence Laboratory (SAIL/SU-AI) in the early 1970s. [1] Not all symbols match ASCII.

Contents

Carnegie Mellon University, the Massachusetts Institute of Technology, and the University of Southern California also had their own modified versions of ASCII. [1]

Character set

Each character is given with a potential Unicode equivalent.

SEASCII [2] [3] [1]
0123456789ABCDEF
0x · α β /^ ¬ ε π λ γ δ ± /
1x _ ~ /
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ]
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ^
  Differences from ASCII

See also

Related Research Articles

ASCII American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Most modern character-encoding schemes are based on ASCII, although they support many additional characters.

<span class="mw-page-title-main">ASCII art</span> Computer art form using text characters

ASCII art is a graphic design technique that uses computers for presentation and consists of pictures pieced together from the 95 printable characters defined by the ASCII Standard from 1963 and ASCII compliant character sets with proprietary extended characters. The term is also loosely used to refer to text based visual art in general. ASCII art can be created with any text editor, and is often used with free-form languages. Most examples of ASCII art require a fixed-width font such as Courier for presentation.

In computing and telecommunication, a control character or non-printing character (NPC) is a code point in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly printing, printable, or graphic characters, except perhaps for the "space" character.

Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (14.0) 144,697 characters covering 159 modern and historic scripts, as well as symbols, emoji, and non-visual control and formatting codes.

UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.

<span class="mw-page-title-main">Internationalized domain name</span> Type of Internet domain name

An internationalized domain name (IDN) is an Internet domain name that contains at least one label displayed in software applications, in whole or in part, in non-latin script or alphabet, such as Arabic, Bengali, Chinese, Cyrillic, Devanagari, Greek, Hebrew, Hindi, Tamil or Thai or in the Latin alphabet-based characters with diacritics or ligatures, such as French, German, Italian, Polish, Portuguese or Spanish. These writing systems are encoded by computers in multibyte Unicode. Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription.

PETSCII Character encoding on Commodore computers

PETSCII, also known as CBM ASCII, is the character set used in Commodore Business Machines (CBM)'s 8-bit home computers, starting with the PET from 1977 and including the C16, C64, C116, C128, CBM-II, Plus/4, and VIC-20.

An underscore, _, also called an underline, low line or low dash, is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript as an instruction to the printer. Its use to add emphasis in modern documents is a deprecated practice. The underscore character, _, originally appeared on the typewriter and was primarily used to emphasise words as in the proofreader's convention. To produce an underscored word, the word was typed, the typewriter carriage was moved back to the beginning of the word, and the word was overtyped with the underscore character.

SAIL, the Stanford Artificial Intelligence Language, was developed by Dan Swinehart and Bob Sproull of the Stanford AI Lab in 1970. It was originally a large ALGOL 60-like language for the PDP-10 and DECSYSTEM-20.

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.

ArmSCII Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

The Internationalized Resource Identifier (IRI) is an internet protocol standard which builds on the Uniform Resource Identifier (URI) protocol by greatly expanding the set of permitted characters. It was defined by the Internet Engineering Task Force (IETF) in 2005 in RFC 3987. While URIs are limited to a subset of the US-ASCII character set, IRIs may additionally contain most characters from the Universal Character Set, including Chinese, Japanese, Korean, and Cyrillic characters.

Unified Hangul Code Windows character encoding for Korean

Unified Hangul Code (UHC), or Extended Wansung, also known under Microsoft Windows as Code Page 949, is the Microsoft Windows code page for the Korean language. It is an extension of Wansung Code to include all 11172 non-partial Hangul syllables present in Johab. This corresponds to the pre-composed syllables available in Unicode 2.0 and later.

JIS X 0201 Japanese single byte character encoding

JIS X 0201, a Japanese Industrial Standard developed in 1969, was the first Japanese electronic character set to become widely used. It is either a 7-bit encoding or an 8-bit encoding, although the 8-bit form is dominant for modern use. The full name of this standard is 7-bit and 8-bit coded character sets for information interchange (7ビット及び8ビットの情報交換用符号化文字集合).

<span class="mw-page-title-main">GNU FreeFont</span> Font family

GNU FreeFont is a family of free OpenType, TrueType and WOFF vector fonts, implementing as much of the Universal Character Set (UCS) as possible, aside from the very large CJK Asian character set. The project was initiated in 2002 by Primož Peterlin and is now maintained by Steve White.

The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

The PostScript Standard Encoding is one of the character sets used by Adobe Systems' PostScript (PS) since 1984 (1982). In 1995, IBM assigned code page 1276 to this character set. NeXT based the character set for its NeXTSTEP and OPENSTEP operating systems on this one.

In computing, the caret is the name used familiarly of the character ^, the 'freestanding' circumflex, provided on QWERTY keyboards using ⇧ Shift+6. The symbol has a variety of uses in programming and mathematics. This nomenclature arose from its visual similarity to the original proofreader's caret and is the name for the keyboard symbol that has come to predominate outside the publishing industry. Although some websites call the symbol an "ASCII caret", the formal ASCII standard X3.64.1977 calls it a circumflex.

References

  1. 1 2 3 Beebe, Nelson H. F. (2005). "Proceedings of the Practical TEX 2005 Conference: The design of TEX and METAFONT: A retrospective" (PDF). TUGboat . Salt Lake City, Utah, USA: University of Utah, Department of Mathematics. 26 (1): 39-40. Retrieved 2017-03-07. The underscore operator in SAIL source-code assignments printed as a left arrow in the Stanford variant of ASCII, but PDP-10 sites elsewhere just saw it as a plain underscore. However, its use as the assignment operator meant that it could not be used as an extended letter to make compound names more readable, as is now common in many other programming languages. The left arrow in the Stanford variant of ASCII was not the only unusual character. (NB. Shows a table of Stanford extended ASCII following that described in RFC 698.)
  2. Mock, T. (1975-07-23). "RFC 698: Telnet extended ASCII option". RFC   698 . NIC #32964. Archived from the original on 2017-03-07. Retrieved 2017-03-07. (NB. Replaced by RFC 5198.)
  3. Cowan, John Woldemar (1999-09-08). "Stanford Extended ASCII to Unicode". 0.1. Unicode, Inc.

Further reading