Windows-1252

Windows-1252
MIME / IANA	windows-1252
Alias(es)	cp1252 (code page 1252)
Language(s)	All supported by ISO/IEC 8859-1 plus full support for French and Finnish and ligature forms for English; e.g. Danish (except for a rare exceptional letter), Irish, Italian, Norwegian, Portuguese, Spanish, Swedish, German (missing uppercase ẞ), Icelandic, Faroese, Luxembourgish, Albanian, Estonian, Swahili, Tswana, Catalan, Basque, Occitan, Rotokas, Toki Pona, Lojban, Romansh, Dutch (except the Ĳ/ĳ character, substituted by IJ/ij or ÿ), and Slovene (except the č character, substituted by ç).
Created by	Microsoft
Standard	WHATWG Encoding Standard
Classification	extended ASCII, Windows-125x
Extends	ISO 8859-1 (excluding C1 controls)
Transforms / Encodes	ISO 8859-15
	v ; t ; e ;

Last updated June 01, 2024

Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet that was used by default in Microsoft Windows for English and many Romance and Germanic languages including Spanish, Portuguese, French, and German. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. Initially the same as ISO 8859-1, it began to diverge starting in Windows 2.0.

It is the most-used single-byte character encoding in the world. As of April 2024^[update], 1.2%^[2] of all web sites declare ISO 8859-1 which is treated as Windows-1252 by all modern browsers (as demanded by the HTML5 standard^[3]), plus 0.3% of all websites declared use of Windows-1252,^[2]^[4] for a total of 1.5% (also measured as 15 of the top 1000 websites^[5]). Some countries or languages show a higher usage than the global average, in 2024 Brazil according to website use, use is at 3.8%,^[6] and in Germany at 2.8%.^[7]^[8] (these are the sums of ISO-8859-1 and CP-1252 declarations).

Details

This character encoding is a superset of ISO 8859-1 in terms of printable characters, but differs from the IANA's ISO-8859-1 by adding additional characters in the 0x80 to 0x9F (hex) range (the ISO standards reserve this range for C1 control codes). Notable additional characters include curly quotation marks and all printable characters from ISO 8859-15. It is known to Windows by the code page number 1252, and by the IANA-approved name "windows-1252".

Starting in the 1990s, many Microsoft products that could produce HTML included Windows-1252-exclusive characters, but marked the encoding as ISO-8859-1, ASCII, or undeclared.^{[ citation needed ]} Characters exclusive to Windows-1252 would often render incorrectly on non-Windows operating systems (often as question marks, blanks, or boxes).^[9]^[10] In particular, typographers' quotes — curly variants of the standard straight apostrophes and quotation marks in US-ASCII — were commonly used in files produced in Windows applications such as Microsoft Word due to the smart quotes feature, which can automatically convert straight apostrophes and quotation marks to the curly variants.^[11] To fix this, by 2000 most web browsers and e-mail clients treated the charsets ISO-8859-1 and US-ASCII as Windows-1252^{[ citation needed ]} — this behavior is now required by the HTML5 specification.^[3] Undeclared charsets in HTML are also assumed to be Windows-1252.^[12]

Historically, the phrase "ANSI Code Page" was used in Windows to refer to non-DOS encodings; the intention was that most of these would be ANSI standards such as ISO-8859-1. Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard. Microsoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."^[13]

In LaTeX packages, CP-1252 is referred to as "ansinew".

IBM uses code page 1252 (CCSID 1252 and euro sign extended CCSID 5348) for Windows-1252.^[14]^[15]^[16]

It is called "WE8MSWIN1252" by Oracle Database.^[17]

Codepage layout

The following table shows Windows-1252. Differences from ISO-8859-1 have the Unicode code point number below the character, based on the Unicode.org mapping of Windows-1252 with "best fit". A tooltip, generally available only when one points to the immediate left of the character, shows the Unicode code point name and the decimal Alt code.

Windows-1252 (CP1252)^[18]^[19]^[20]^[21]^[22]
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0_	NUL	SOH	STX	ETX	EOT	ENQ	ACK	BEL	BS	HT	LF	VT	FF	CR	SO	SI
1_	DLE	DC1	DC2	DC3	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS	RS	US
2_	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3_	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4_	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5_	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6_	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7_	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	DEL
8_	€ 20AC		‚ 201A	ƒ 0192	„ 201E	… 2026	† 2020	‡ 2021	ˆ 02C6	‰ 2030	Š 0160	‹ 2039	Œ 0152		Ž 017D
9_		‘ 2018	’ 2019	“ 201C	” 201D	• 2022	– 2013	— 2014	˜ 02DC	™ 2122	š 0161	› 203A	œ 0153		ž 017E	Ÿ 0178
A_	NBSP	¡	¢	£	¤	¥	¦	§	¨	©	ª	«	¬	SHY	®	¯
B_	°	±	²	³	´	µ	¶	·	¸	¹	º	»	¼	½	¾	¿
C_	À	Á	Â	Ã	Ä	Å	Æ	Ç	È	É	Ê	Ë	Ì	Í	Î	Ï
D_	Ð	Ñ	Ò	Ó	Ô	Õ	Ö	×	Ø	Ù	Ú	Û	Ü	Ý	Þ	ß
E_	à	á	â	ã	ä	å	æ	ç	è	é	ê	ë	ì	í	î	ï
F_	ð	ñ	ò	ó	ô	õ	ö	÷	ø	ù	ú	û	ü	ý	þ	ÿ

According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API MultiByteToWideChar maps these to the corresponding C1 control codes. The "best fit" mapping documents this behavior, too.^[18]

History

The first version of the codepage was used in Microsoft Windows 1.0. It matched the ISO-8859-1 standard (including leaving code points 0xD7 and 0xF7 undefined, as they were not in the standard at that time).
The second version of the codepage was introduced in Microsoft Windows 2.0. In this version, code points 0xD7, 0xF7, 0x91, and 0x92 are defined.
The third version of the codepage was introduced in Microsoft Windows 3.1. It defined all code points used in the final version except the euro sign and the Z with caron character pair.
The final version was introduced in Microsoft Windows 98. It defined all of the code points listed above.

OS/2 extensions

The OS/2 operating system supports an encoding by the name of Code page 1004 (CCSID 1004) or "Windows Extended".^[23]^[24] This mostly matches code page 1252, with the exception of certain C0 control characters being replaced by diacritic characters.

Code page 1004 (differing rows only)^[25]^[26]^[27]^[28]
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0_	NUL	SOH	STX	ETX	ˉ 02C9	˘ 02D8	˙ 02D9	BEL	˚ 02DA	HT	˝ 02DD	˛ 02DB	ˇ 02C7	CR	SO	SI

MSDOS extensions [rare]

There is a rarely used, but useful, graphics extended code page 1252 where codes 0x00 to 0x1f allow for box drawing as used in applications such as MSDOS Edit and Codeview. One of the applications to use this code page was an Intel Corporation Install/Recovery disk image utility from mid/late 1995. These programs were written for its P6 User Test Program machines (US example^[29]). It was used exclusively in its then EMEA region (Europe, Middle East & Africa). In time the programs were changed to use code page 850.

Graphics Extended Code Page 1252^{[ citation needed ]}
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0_	○	■	↑	↓	→	←	║	═	╔	╗	╚	╝	░	▒	►	◄
1_	│	─	┌	┐	└	┘	├	┤	┴	┬	♦	┼	█	▄	▀	▬

Palm OS variant

Each Palm OS device supports a single language and a single character encoding, depending on its locale.^[30]

For languages such as English and French, Palm OS uses a custom character encoding based on Windows-1252. For Japanese, it instead uses a multibyte character encoding based on code page 932. Regardless of the system locale, all characters in the range 0x00 to 0x7F are guaranteed to be the same, except 0x5D which is the Yen sign in Japanese and a backslash on all others.^[30]

Palm OS 3.1 introduced several changes to the character encoding to better align with Windows-1252:^[31]

The special Palm OS glyphs "shortcut stroke" (0x9D) and "command stroke" (0x9E) were copied to 0x16 and 0x17, to ensure they were in the range guaranteed to be consistent between locales.^[31] Starting in Palm OS 3.3, 0x16 and 0x17 are the only code points for those characters,^[32] leaving 0x9D and 0x9E undefined.^[33]
The numeric space (0x80) and horizontal ellipsis (0x85) were copied to 0x19 and 0x18 (respectively), to ensure they were in the range guaranteed to be consistent between locales.^[31]^[32]
The Euro sign was added at 0x80, replacing what was previously the numeric space.^[32]
The playing card suits were copied to the font Symbol 9,^[31] although their original code points remain valid.^[32]^[33]

The following is the variant of Windows-1252 used by Palm OS 3.3 onward for English and several other locales.^[32] Python gives it the palmos label, describing it as the encoding for Palm OS 3.5.^[34]^[35] Differences from Windows-1252 have their Unicode code point.

Palm OS 3.3 character encoding^[33]^[35]
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
8_	€ ^{[lower-alpha 1]}		‚	ƒ	„	… ^{[lower-alpha 2]}	†	‡	ˆ	‰	Š	‹	Œ	♦ 2666	♣ 2663	♥ 2665
9_	♠ 2660	‘	’	“	”	•	–	—	˜	™	š	›	œ	^{[lower-alpha 3]}	^{[lower-alpha 4]}	Ÿ

Notes

↑ Prior to Palm OS 3.1, the character at code point 0x80 was U+2007 NUMERIC SPACE; starting in Palm OS 3.1, 0x80 is the Euro sign and 0x19 is U+2007 NUMERIC SPACE instead.^[32]
↑ Starting in Palm OS 3.1, this character is also duplicated at 0x18.^{[lower-alpha 5]}^{[lower-alpha 6]}
↑ Prior to Palm OS 3.3, this code point was the Palm OS-exclusive character "shortcut stroke"; starting in Palm OS 3.3, this code point is undefined.^[31]^[32]
↑ Prior to Palm OS 3.3, this code point was the Palm OS-exclusive character "command stroke"; starting in Palm OS 3.3, this code point is undefined.^[31]^[32]

Related Research Articles

<span class="mw-page-title-main">ISO/IEC 8859-1</span> Character encoding

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is the basis for some popular 8-bit character sets and the first two blocks of characters in Unicode.

Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

ISO/IEC 8859-11:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin/Thai. It is nearly identical to the national Thai standard TIS-620 (1990). The sole difference is that ISO/IEC 8859-11 allocates non-breaking space to code 0xA0, while TIS-620 leaves it undefined.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

ISO/IEC 8859-9:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No. 5, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1989. It is designated ECMA-128 by Ecma International and TS 5881 as a Turkish standard. It is informally referred to as Latin-5 or Turkish. It was designed to cover the Turkish language, designed as being of more use than the ISO/IEC 8859-3 encoding. It is identical to ISO/IEC 8859-1 except for the replacement of six Icelandic characters with characters unique to the Turkish alphabet. And the uppercase of i is İ; the lowercase of I is ı.

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters).

Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages.

Windows-1250 is a code page used under Microsoft Windows to represent texts in Central European and Eastern European languages that use the Latin script. It is primarily used by Czech, though Czech has now moved to UTF-8 and mostly abandoned this legacy encoding. It is also used for Polish, Slovak, Hungarian, Slovene, Serbo-Croatian, Romanian, Rotokas and Albanian. It may also be used with the German language, though it's missing uppercase ẞ. German-language texts encoded with Windows-1250 and Windows-1252 are identical.

Windows code page 1253, commonly known by its IANA-registered name Windows-1253 or abbreviated as cp1253, is a Microsoft Windows code page used to write modern Greek. It is not capable of supporting the older polytonic Greek.

Windows-1254 is a code page used under Microsoft Windows, to write Turkish that it was designed for. Characters with codepoints A0 through FF are compatible with ISO 8859-9, but the CR range, which is reserved for C1 control codes in ISO 8859, is instead used for additional characters. It matches Windows-1252 except for the replacement of six Icelandic characters with characters unique to the Turkish alphabet.

Windows-1255 is a code page used under Microsoft Windows to write Hebrew. It is an almost compatible superset of ISO-8859-8 – most of the symbols are in the same positions, but Windows-1255 adds vowel-points and other signs in lower positions.

Windows-1256 is a code page used under Microsoft Windows to write Arabic and other languages that use Arabic script, such as Persian and Urdu.

Windows-1257 is an 8-bit, single-byte extended ASCII code page used to support the Estonian, Latvian and Lithuanian languages under Microsoft Windows. In Lithuania, it is standardised as LST 1590-3, alongside a modified variant named LST 1590-4.

Mac OS Central European is a character encoding used on Apple Macintosh computers to represent texts in Central European and Southeastern European languages that use the Latin script. This encoding is also known as Code Page 10029. IBM assigns code page/CCSID 1282 to this encoding. This codepage contains diacritical letters that ISO 8859-2 does not have, and vice versa.

The currency sign¤ is a character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225° (SW) and 315° (NW). It is raised slightly above the baseline. The character is sometimes called scarab.

Several 8-bit character sets (encodings) were designed for binary representation of common Western European languages, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols. These character sets also happen to support many other languages such as Malay, Swahili, and Classical Latin.

Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Windows, although they are still supported both within Windows and other platforms, and still apply when Alt code shortcuts are used.

Extended ASCII is a repertoire of character encodings that include the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.

IBM code page 949 (IBM-949) is a character encoding which has been used by IBM to represent Korean language text on computers. It is a variable-width encoding which represents the characters from the Wansung code defined by the South Korean standard KS X 1001 in a format compatible with EUC-KR, but adds IBM extensions for additional hanja, additional precomposed Hangul syllables, and user-defined characters.

References

↑ Character Sets, Internet Assigned Numbers Authority (IANA), 2018-12-12
1 2 "Historical trends in the usage statistics of character encodings for websites, December 2023". w3techs.com. Retrieved 2023-12-01.
1 2 "Encoding". WHATWG. 27 January 2015. sec. 5.2 Names and labels. Archived from the original on 4 February 2015. Retrieved 4 February 2015.
↑ "Frequenty Asked Questions". w3techs.com.
↑ "Usage Survey of Character Encodings broken down by Ranking". w3techs.com. Retrieved 2024-04-29.
↑ "Distribution of Character Encodings among websites that use Brazil". W3Techs. Archived from the original on 4 Apr 2024. Retrieved 2024-04-29.
↑ "Distribution of Character Encodings among websites that use .de". W3Techs. Archived from the original on 4 Apr 2024. Retrieved 2024-04-29.
↑ "Distribution of Character Encodings among websites that use German". w3techs.com. Retrieved 2023-01-16.
↑ Texin, Tex. "Comparing Characters in Windows-1252, ISO-8859-1, ISO-8859-15". I18nQA.com.
↑ van Emden, Eva (28 January 2011). "How to make typographers' quotes in HTML". vancouvereditor.com. Retrieved 7 January 2024. If you use typographers' quotes without specifying the right character encoding for your HTML file, some of your viewers are going to see question marks, boxes, or other crazy symbols instead of the beautiful curly quotes you intended them to see.
↑ "Smart quotes in Word". Microsoft Support. Microsoft. Retrieved 7 January 2024.
↑ "NetWare Web Search: Understanding Character Set Encodings". Novell Documentation. Novell. if a document does not contain a CHARSET encoding value, the default encoding for HTML documents is ISO-8859-1, also known as Latin1. The default encoding for plain text documents is US-ASCII.
↑ Wissink, Cathy (5 April 2002). "Unicode and Windows XP" (PDF). Microsoft. p. 1. Archived from the original (PDF) on 4 February 2015. Retrieved 4 February 2015.
↑ "Code page 1252 information document". IBM. 30 September 1997. Archived from the original on 2016-03-03.
↑ "CCSID 1252 information document". IBM. Archived from the original on 2016-03-26.
↑ "CCSID 5348 information document". IBM. Archived from the original on 2014-11-29.
↑ "Database Client Installation Guide". Oracle. Retrieved 2021-02-14.
1 2 "Unicode mappings of Windows-1252 with 'Best Fit'". Unicode. Archived from the original on 4 February 2015. Retrieved 4 February 2015.
↑ Code Page 01252 (PDF), IBM, 1998, archived (PDF) from the original on 27 October 2023
↑ Code Page (CPGID) 01252 (txt), IBM, 1998, archived from the original on 8 April 2023
↑ International Components for Unicode (ICU), ibm-1252_P100-2000.ucm, 2002-12-03
↑ International Components for Unicode (ICU), ibm-5348_P100-1997.ucm, 2002-12-03
↑ "Code page 1004 information document". Archived from the original on 2015-06-25.
↑ "CCSID 1004 information document". Archived from the original on 2016-03-26.
↑ "Code Page 01004" (PDF). IBM. Archived from the original (PDF) on 2015-07-08. (version based on Windows 3.1 version of Windows-1252)
↑ Code Page CPGID 01004 (pdf) (PDF), IBM
↑ Code Page CPGID 01004 (txt), IBM
↑ Borgendale, Ken (2001). "Codepage 1004 - Windows Extended". OS/2 codepages by number. Archived from the original on 2018-05-13. Retrieved 2018-05-13. (version based on current version of Windows-1252)
↑ Storaasli, Olaf (1996). "Performance of the NASA equation solvers on computational mechanics applications" (PDF). Performance of NASA Equation Solvers on Computational Mechanics Applications. NASA. doi:10.2514/6.1996-1505. S2CID 15711051. Archived from the original (PDF) on 2019-05-03.
1 2 "Chapter 13: Localized Applications". Palm OS Programmer's Companion (PDF). Palm Computing Platform. March 16, 2000. p. 321.
1 2 3 4 5 6 "Appendix B: Compatibility Guide". Palm OS SDK Reference (PDF). Palm Computing Platform. March 16, 2000. pp. 1181–1182.
1 2 3 4 5 6 7 8 Walleij, Linus. "Palm Pilot Character Sets And Unicode Mappings". GNU Recode. Datorföreningen vid Lunds Universitet och Lunds Tekniska Högskola. Retrieved 10 October 2023.
1 2 3 Parker, Greg. "Palm OS Built-in Fonts". Sealie Software. Retrieved 10 October 2023.
↑ "codecs — Codec registry and base classes (§ Text Encodings)". The Python Standard Library—Python 3.9.4 Documentation. Python Software Foundation.
1 2 Mullender, Sjoerd (13 July 2002). "Python Character Mapping Codec for Palm OS 3.5". CPython source tree. Python Software Foundation . Retrieved 9 December 2021.

External links

Microsoft's code charts for Windows-1252 ("Code Page 1252 Windows Latin 1 (ANSI)")
Unicode mapping table and code page definition with best fit mappings for Windows-1252

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[euro-36] Prior to Palm OS 3.1, the character at code point 0x80 was U+2007 NUMERIC SPACE; starting in Palm OS 3.1, 0x80 is the Euro sign and 0x19 is U+2007 NUMERIC SPACE instead.^[32]

[ellipsis-37] Starting in Palm OS 3.1, this character is also duplicated at 0x18.^{[lower-alpha 5]}^{[lower-alpha 6]}

[shortcut_stroke-38] Prior to Palm OS 3.3, this code point was the Palm OS-exclusive character "shortcut stroke"; starting in Palm OS 3.3, this code point is undefined.^[31]^[32]

[command_stroke-39] Prior to Palm OS 3.3, this code point was the Palm OS-exclusive character "command stroke"; starting in Palm OS 3.3, this code point is undefined.^[31]^[32]

[1] Character Sets, Internet Assigned Numbers Authority (IANA), 2018-12-12

[encoding-2] 1 2 "Historical trends in the usage statistics of character encodings for websites, December 2023". w3techs.com. Retrieved 2023-12-01.

[WHATWG-3] 1 2 "Encoding". WHATWG. 27 January 2015. sec. 5.2 Names and labels. Archived from the original on 4 February 2015. Retrieved 4 February 2015.

[4] "Frequenty Asked Questions". w3techs.com.

[5] "Usage Survey of Character Encodings broken down by Ranking". w3techs.com. Retrieved 2024-04-29.

[6] "Distribution of Character Encodings among websites that use Brazil". W3Techs. Archived from the original on 4 Apr 2024. Retrieved 2024-04-29.

[7] "Distribution of Character Encodings among websites that use .de". W3Techs. Archived from the original on 4 Apr 2024. Retrieved 2024-04-29.

[8] "Distribution of Character Encodings among websites that use German". w3techs.com. Retrieved 2023-01-16.

[9] Texin, Tex. "Comparing Characters in Windows-1252, ISO-8859-1, ISO-8859-15". I18nQA.com.

[10] van Emden, Eva (28 January 2011). "How to make typographers' quotes in HTML". vancouvereditor.com. Retrieved 7 January 2024. If you use typographers' quotes without specifying the right character encoding for your HTML file, some of your viewers are going to see question marks, boxes, or other crazy symbols instead of the beautiful curly quotes you intended them to see.

[11] "Smart quotes in Word". Microsoft Support. Microsoft. Retrieved 7 January 2024.

[netware-12] "NetWare Web Search: Understanding Character Set Encodings". Novell Documentation. Novell. if a document does not contain a CHARSET encoding value, the default encoding for HTML documents is ISO-8859-1, also known as Latin1. The default encoding for plain text documents is US-ASCII.

[13] Wissink, Cathy (5 April 2002). "Unicode and Windows XP" (PDF). Microsoft. p. 1. Archived from the original (PDF) on 4 February 2015. Retrieved 4 February 2015.

[14] "Code page 1252 information document". IBM. 30 September 1997. Archived from the original on 2016-03-03.

[15] "CCSID 1252 information document". IBM. Archived from the original on 2016-03-26.

[16] "CCSID 5348 information document". IBM. Archived from the original on 2014-11-29.

[17] "Database Client Installation Guide". Oracle. Retrieved 2021-02-14.

[bestfit-18] 1 2 "Unicode mappings of Windows-1252 with 'Best Fit'". Unicode. Archived from the original on 4 February 2015. Retrieved 4 February 2015.

[19] Code Page 01252 (PDF), IBM, 1998, archived (PDF) from the original on 27 October 2023

[20] Code Page (CPGID) 01252 (txt), IBM, 1998, archived from the original on 8 April 2023

[21] International Components for Unicode (ICU), ibm-1252_P100-2000.ucm, 2002-12-03

[22] International Components for Unicode (ICU), ibm-5348_P100-1997.ucm, 2002-12-03

[23] "Code page 1004 information document". Archived from the original on 2015-06-25.

[24] "CCSID 1004 information document". Archived from the original on 2016-03-26.

[ibm1004-25] "Code Page 01004" (PDF). IBM. Archived from the original (PDF) on 2015-07-08. (version based on Windows 3.1 version of Windows-1252)

[26] Code Page CPGID 01004 (pdf) (PDF), IBM

[27] Code Page CPGID 01004 (txt), IBM

[borgendale1004-28] Borgendale, Ken (2001). "Codepage 1004 - Windows Extended". OS/2 codepages by number. Archived from the original on 2018-05-13. Retrieved 2018-05-13. (version based on current version of Windows-1252)

[P6_UTP-29] Storaasli, Olaf (1996). "Performance of the NASA equation solvers on computational mechanics applications" (PDF). Performance of NASA Equation Solvers on Computational Mechanics Applications. NASA. doi:10.2514/6.1996-1505. S2CID 15711051. Archived from the original (PDF) on 2019-05-03.

[Palm_OS_Programmer's_Companion-30] 1 2 "Chapter 13: Localized Applications". Palm OS Programmer's Companion (PDF). Palm Computing Platform. March 16, 2000. p. 321.

[Palm_OS_SDK-31] 1 2 3 4 5 6 "Appendix B: Compatibility Guide". Palm OS SDK Reference (PDF). Palm Computing Platform. March 16, 2000. pp. 1181–1182.

[dflund-32] 1 2 3 4 5 6 7 8 Walleij, Linus. "Palm Pilot Character Sets And Unicode Mappings". GNU Recode. Datorföreningen vid Lunds Universitet och Lunds Tekniska Högskola. Retrieved 10 October 2023.

[Sealie-33] 1 2 3 Parker, Greg. "Palm OS Built-in Fonts". Sealie Software. Retrieved 10 October 2023.

[Python_3.9_codecs-34] "codecs — Codec registry and base classes (§ Text Encodings)". The Python Standard Library—Python 3.9.4 Documentation. Python Software Foundation.

[cpython-35] 1 2 Mullender, Sjoerd (13 July 2002). "Python Character Mapping Codec for Palm OS 3.5". CPython source tree. Python Software Foundation . Retrieved 9 December 2021.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[lower-alpha 1]

[lower-alpha 2]

[lower-alpha 3]

[lower-alpha 4]

[lower-alpha 5]

[lower-alpha 6]

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Korean Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex and Videotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Barents Cyrillic Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII Big5 BraSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OS Code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Farsi (Persian) Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 668 708 720 737 770 773 775 776 777 778 850 851 852 853 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 897 899 903 904 932 936 942 949 950 951 1034 1040 1042 1043 1044 1098 1115 1116 1117 1118 1127 3846 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1124 1133
Windows code pages	CER-GS 932 936 (GBK) 950 1169 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + Finnish Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1056 1057 1058 Acorn RISC OS Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Galaksija GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII PostScript Standard PostScript Latin 1 SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 Fieldata HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS Symbol TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets