RACE encoding

Last updated

RACE encoding is a method for encoding foreign languages that use non-English characters (Chinese, Japanese, etc.) in ASCII characters for storage in domain name system servers. [1] All names without non-English characters are unchanged. RACE codes are made up of digits, letters and dashes. [2]

Contents

RACE encoding is part of the larger scheme of the Universal Character Set specifically the ISO/IEC 10646. The assignment of characters also coincides with Unicode. [1]

Today, it is mostly abandoned in favor of punycode.

Nomenclature

RACE is an acronym for its main purpose.

Related Research Articles

ASCII American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Most modern character-encoding schemes are based on ASCII, although they support many additional characters.

Email Method of exchanging digital messages between people over a network

Electronic mail is a method of exchanging messages ("mail") between people using electronic devices. Email entered limited use in the 1960s, but users could only send to users of the same computer, and some early email systems required the author and the recipient to both be online simultaneously, similar to instant messaging. Ray Tomlinson is credited as the inventor of email; in 1971, he developed the first system able to send mail between users on different hosts across the ARPANET, using the @ sign to link the user name with a destination server. By the mid-1970s, this was the form recognized as email.

UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.

The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFFBYTE ORDER MARK, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

In computer science, Base64 is a group of binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The term Base64 originates from a specific MIME content transfer encoding. Each Base64 digit represents exactly 6 bits of data. Three 8-bit bytes can therefore be represented by four 6-bit Base64 digits.

An email address identifies an email box to which email messages are delivered. A wide variety of formats were used in early email systems, but only a single format is used today, following the specifications developed for Internet mail systems since the 1980s. This article uses the term email address to refer to the addr-spec defined in RFC 5322, not to the address that is commonly used; the difference is that an address may contain a display name, a comment, or both.

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, which is called the Letter-Digit-Hyphen (LDH) subset. For example, München is encoded as Mnchen-3ya.

Internationalized domain name Internet domain name that contains at least one label that is displayed in software applications, in whole or in part, in a language-specific script or alphabet

An internationalized domain name (IDN) is an Internet domain name that contains at least one label that is displayed in software applications, in whole or in part, in a language-specific script or alphabet, such as Arabic, Chinese, Cyrillic, Devanagari, Hebrew or the Latin alphabet-based characters with diacritics or ligatures, such as French. These writing systems are encoded by computers in multibyte Unicode. Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription.

ISO/IEC 8859-5:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1988. It is informally referred to as Latin/Cyrillic. It was designed to cover languages using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian, Serbian and Macedonian but was never widely used. It would also have been usable for Ukrainian in the Soviet Union from 1933–1990, but it is missing the Ukrainian letter ge, ґ, which is required in Ukrainian orthography before and since, and during that period outside Soviet Ukraine. As a result, IBM created Code page 1124.

VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable characters of ASCII unmodified, but it replaces 6 of the 33 control characters with printable characters. It adds 128 precomposed characters. Unicode and the Windows-1258 code page are now used for virtually all Vietnamese computer data, but legacy VSCII and VISCII files may need conversion.

Shift JIS is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1. By 2020, 0.2% of all web pages used Shift JIS, a decline from 1.3% in July 2014.

The Internationalized Resource Identifier (IRI) is an internet protocol standard which builds on the Uniform Resource Identifier (URI) protocol by greatly expanding the set of permitted characters. It was defined by the Internet Engineering Task Force (IETF) in 2005 in RFC 3987. While URIs are limited to a subset of the ASCII character set, IRIs may additionally contain most characters from the Universal Character Set, including Chinese, Japanese, Korean, and Cyrillic characters.

Percent-encoding, also known as URL encoding, is a method to encode information in a Uniform Resource Identifier (URI) under certain circumstances.

Many email clients now offer some support for Unicode. While some use Unicode by default, many others will automatically choose between a legacy encoding and Unicode depending on the mail's content, either automatically or when the user requests it.

This article compares Unicode encodings. Two situations are considered: 8-bit-clean environments, and environments that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in the standards and so software must generate messages that comply with the restrictions. Standard Compression Scheme for Unicode and Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.

ThaiURL is a technology enabling the use of Thai domain names in applications that have been modified to support this technology. It is one of several such systems that were marketed before the advent of IDNA.

VNI Software Company is a developer of various education, entertainment, office, and utility software packages. They are known for developing an encoding and input method for Vietnamese.

International email arises from the combined provision of internationalized domain names (IDN) and email address internationalization (EAI). The result is email that contains international characters, encoded as UTF-8, in the email header and in supporting mail transfer protocols. The most significant aspect of this is the allowance of email addresses in most of the world's writing systems, at both interface and transport levels.

Extended ASCII eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others

Extended ASCII character encodings are eight-bit or larger encodings that include the standard seven-bit ASCII characters, plus additional characters. Using the term "extended ASCII" on its own is sometimes criticized, because it can be mistakenly interpreted to mean that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, neither of which is the case.

An internationalized country code top-level domain is a top-level domain in the Domain Name System (DNS) of the Internet. IDN ccTLDs are specially encoded domain names that are displayed in an end user application, such as a web browser, in their language-native script or alphabet, such as the Arabic alphabet, or a non-alphabetic writing system, such as Chinese characters. IDN ccTLDs are an application of the internationalized domain name system to top-level Internet domains assigned to countries, or independent geographic regions.

References