ANSEL

ANSEL
Alias(es)	ISO-IR 231
Standard	ANSI/NISO Z39.47 (withdrawn)
Classification	Extended ASCII, 8-bit encoding
Extends	US-ASCII
Extensions	MARC Extended Latin, GEDCOM ANSEL
	v ; t ; e ;

Last updated July 25, 2023

ANSEL, the American National Standard for Extended Latin Alphabet Coded Character Set for Bibliographic Use, was a character set used in text encoding. It provided a table of coded values for the representation of characters of the extended Latin alphabet in machine-readable form for thirty-five languages written in the Latin alphabet and for fifty-one romanized languages. ANSEL adds 63 graphic characters to ASCII,^[1] including 29 combining diacritic characters.

The initial revision of ANSEL was released in 1985, and before 1993 it was registered as Registration #231 in the ISO International Register of Coded Character Sets to be Used with Escape Sequences.^[2] The standard was reaffirmed in 2003 although it has been administratively withdrawn by ANSI effective 14 February 2013.^[3]

The requirement of hardware capable of overprinting accents doomed this from ever becoming a popular extended ASCII.

Code page layout

The following table shows ANSI/NISO Z39.47-1993 (R2003).^[3] Non-ASCII characters are shown with their Unicode code point. A combining diacritic precedes the spacing character on which it should be superimposed^[1] (in Unicode the combining diacritic is after the base character).

ANSI/NISO Z39.47-1993 (R2003)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0x	NUL	SOH	STX	ETX	EOT	ENQ	ACK	BEL	BS	HT	LF	VT	FF	CR	SO	SI
1x	DLE	DC1	DC2	DC3	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS	RS	US
2x	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	DEL
8x
9x
Ax		Ł 0141	Ø 00D8	Đ 0110	Þ 00DE	Æ 00C6	Œ 0152	ʹ 02B9	· 00B7	♭ 266D	® 00AE	± 00B1	Ơ 01A0	Ư 01AF	ʼ 02BC
Bx	ʻ 02BB	ł 0142	ø 00F8	đ 0111	þ 00FE	æ 00E6	œ 0153	ʺ 02BA	ı 0131	£ 00A3	ð 00F0		ơ 01A1	ư 01B0
Cx	° 00B0	ℓ 2113	℗ 2117	© 00A9	♯ 266F	¿ 00BF	¡ 00A1
Dx
Ex	◌̉ 0309	◌̀ 0300	◌́ 0301	◌̂ 0302	◌̃ 0303	◌̄ 0304	◌̆ 0306	◌̇ 0307	◌̈ 0308	◌̌ 030C	◌̊ 030A	◌︠ FE20	◌︡ FE21	◌̕ 0315	◌̋ 030B	◌̐ 0310
Fx	◌̧ 0327	◌̨ 0328	◌̣ 0323	◌̤ 0324	◌̥ 0325	◌̳ 0333	◌̲ 0332	◌̦ 0326	◌̜ 031C	◌̮ 032E	◌︢ FE22	◌︣ FE23			◌̓ 0313

Use

GEDCOM

The GEDCOM specification for exchanging genealogical data refers to ANSEL (ANSI/NISO Z39.47-1985) as a valid text encoding for GEDCOM files and extends it with additional characters which are shown in the following table.^[4]^[5]

Hex	Unicode	Glyph	Description
0xBE	25A1	□	empty box
0xBF	25A0	■	black box
0xCD	0065	e	midline e
0xCE	006F	o	midline o
0xCF	00DF	ß	es zet
0xFC	0338	̸	diacritic slash through char

MARC21

The Extended Latin character set from MARC 21 is synchronized with ANSEL^[2] but additionally supports the eszett (ß) character at C7 and the euro sign (€) at C8.^[6]

Related Research Articles

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Many computer systems instead use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

In computing, AAP DTD is a set of three SGML Document Type Definitions for scientific documents, defined by the Association of American Publishers. It was ratified as a U.S. standard under the name ANSI/NISO Z39.59 in 1988, and evolved into the international ISO 12083 standard in 1993. It was supplanted as a U.S. standard by ANSI/ISO 12083 in 1995.

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

<span class="mw-page-title-main">Windows-1252</span> Character encoding

Windows-1252 or CP-1252 is a single-byte character encoding of the Latin alphabet that was used by default in Microsoft Windows for English and many Romance and Germanic languages including Spanish, Portuguese, French, and German. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. All modern operating systems, including Windows, now use Unicode code points and text encodings by default, which are portable across all of the world's major languages.

VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable characters of ASCII unmodified, but it replaces 6 of the 33 control characters with printable characters. It adds 128 precomposed characters. Unicode and the Windows-1258 code page are now used for virtually all Vietnamese computer data, but legacy VSCII and VISCII files may need conversion.

A technical report is a document that describes the process, progress, or results of technical or scientific research or the state of a technical or scientific research problem. It might also include recommendations and conclusions of the research. Unlike other scientific literature, such as scientific journals and the proceedings of some academic conferences, technical reports rarely undergo comprehensive independent peer review before publication. They may be considered as grey literature. Where there is a review process, it is often limited to within the originating organization. Similarly, there are no formal publishing procedures for such reports, except where established locally.

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

The phrase ANSI character set has no well-defined meaning and has been used to refer to the following, among other things:

ISO 2709 is an ISO standard for bibliographic descriptions, titled Information and documentation—Format for information exchange.

YUSCII is an informal name for several JUS standards for 7-bit character encoding. These include:

<span class="mw-page-title-main">Latin script</span> Writing system based on the alphabet used by the Romans

The Latin script, also known as the Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae, in southern Italy. The Greek alphabet was adopted by the Etruscans, and subsequently their alphabet was adopted by the Romans. Several Latin-script alphabets exist, which differ in graphemes, collation and phonetic values from the classical Latin alphabet.

Extended ASCII is a repertoire of character encodings that include the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.

The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.

The Vietnamese language is written with a Latin script with diacritics which requires several accommodations when typing on phone or computers. Software-based systems are a form of writing Vietnamese on phones or computers with software that can be installed on the device or from third-party software such as UniKey. Telex is the oldest input method devised to encode the Vietnamese language with its tones. Other input methods may also include VNI and VIQR. VNI input method is not to be confused with VNI code page.

ISO 12083 is an international SGML standard for document interchange between authors and publishers. It features separate Document Type Definitions for books, serials, articles, and math. Derived from AAP DTD, it was first published in 1993, revised in 1994, and last confirmed in 2016.

VSCII, also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietnamese language with computers, developed by the TCVN Technical Committee on Information Technology (TCVN/TC1) and first adopted in 1993.

References

1 2 Extended Latin Alphabet Coded Character Set for Bibliographic Use (PDF) (National information standard specification). 1993 (R2003). Bethesda, Maryland: NISO Press. 3 May 1993. ISBN 1-880124-02-5. ISSN 1041-5653. OCLC 25546245. OL 12137795M. ANSI/NISO Z39.47-1993 (R2003). Archived from the original (PDF) on 14 March 2014. Retrieved 5 May 2014.
1 2 "International Register Of Coded Character Sets To Be Used With Escape Sequences (Registration Listing Ordered By Registration Number)". International Register Of Coded Character Sets To Be Used With Escape Sequences. Information Technology Standards Commission of Japan. Archived from the original on 9 April 2014. Retrieved 5 May 2014.
1 2 "Project Overview: ANSI/NISO Z39.47-1993 (R2003) Extended Latin Alphabet Coded Character Set for Bibliographic Use (ANSEL) (Inactive)". National Information Standards Organization. Archived from the original on 14 March 2014. Retrieved 5 May 2014.
↑ The Church of Jesus Christ of Latter-day Saints, Family History Department (2 December 1995). "Appendix D: ANSEL Character Set". The GEDCOM Standard Release 5.5 (Information standard specification). Salt Lake City, Utah: The Church of Jesus Christ of Latter-day Saints. pp. 87–89.
↑ The Church of Jesus Christ of Latter-day Saints, Family History Department (4 November 1993). The GEDCOM Standard Release 5.3 (Information standard specification). Salt Lake City, Utah: The Church of Jesus Christ of Latter-day Saints. pp. 67–72.
↑ "MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media: Code Table Extended Latin (ANSEL)". Library Standards at the Library of Congress. Library of Congress. December 2007.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[ANSEL-1] 1 2 Extended Latin Alphabet Coded Character Set for Bibliographic Use (PDF) (National information standard specification). 1993 (R2003). Bethesda, Maryland: NISO Press. 3 May 1993. ISBN 1-880124-02-5. ISSN 1041-5653. OCLC 25546245. OL 12137795M. ANSI/NISO Z39.47-1993 (R2003). Archived from the original (PDF) on 14 March 2014. Retrieved 5 May 2014.

[ISO-IR_#231-2] 1 2 "International Register Of Coded Character Sets To Be Used With Escape Sequences (Registration Listing Ordered By Registration Number)". International Register Of Coded Character Sets To Be Used With Escape Sequences. Information Technology Standards Commission of Japan. Archived from the original on 9 April 2014. Retrieved 5 May 2014.

[Z39.47-1993-3] 1 2 "Project Overview: ANSI/NISO Z39.47-1993 (R2003) Extended Latin Alphabet Coded Character Set for Bibliographic Use (ANSEL) (Inactive)". National Information Standards Organization. Archived from the original on 14 March 2014. Retrieved 5 May 2014.

[GEDCOM_5.5-4] The Church of Jesus Christ of Latter-day Saints, Family History Department (2 December 1995). "Appendix D: ANSEL Character Set". The GEDCOM Standard Release 5.5 (Information standard specification). Salt Lake City, Utah: The Church of Jesus Christ of Latter-day Saints. pp. 87–89.

[GEDCOM_5.3-5] The Church of Jesus Christ of Latter-day Saints, Family History Department (4 November 1993). The GEDCOM Standard Release 5.3 (Information standard specification). Salt Lake City, Utah: The Church of Jesus Christ of Latter-day Saints. pp. 67–72.

[6] "MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media: Code Table Extended Latin (ANSEL)". Library Standards at the Library of Congress. Library of Congress. December 2007.

[1]

[2]

[3]

[4]

[5]

[6]

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Korean Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex and Videotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Barents Cyrillic Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII BraSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OS Code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Farsi (Persian) Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 668 708 720 737 770 773 775 776 777 778 850 851 852 853 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 897 899 903 904 932 936 942 949 950 951 1034 1040 1042 1043 1044 1098 1115 1116 1117 1118 1127 3846 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1124 1133
Windows code pages	CER-GS 932 936 (GBK) 950 1169 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + Finnish Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	37 Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1056 1057 1058 Acorn Adobe Standard Adobe Latin 1 Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Fieldata GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Symbol Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Big5 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets