ASMO 449

Last updated
ASMO 449
Alias(es)iso-ir-89
StandardASMO 449, ISO 9036
Classification7-bit encoding, non-Latin ISO 646 modification with natural letter ordering
Succeeded by ASMO 708 (ISO-8859-6)

ASMO 449 is a, now technologically obsolete, [1] 7-bit coded character set to encode the Arabic language.

Contents

History

This character set was devised by the now extinct [2] Arab Standardization and Metrology Organization in 1982 [2] to be the 7-bit standard to be used in Arabic-speaking countries. The design of this character set is derived [3] from the 7-bit ISO 646 (version of 1973) but with modifications suited for the Arabic language. In code points ranging from 0x41 to 0x72 (hexadecimal), Latin letters were replaced with Arabic letters. Punctuation marks which were identical in the Latin and Arabic scripts remained the same, but where they differed (comma, semicolon, question mark), the Latin ones were replaced by Arabic ones. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. This character set is not bidirectional and was intended to be used in right to left writing. Therefore, symmetrical pairs of punctuation marks (( and ), < and >, [ and ], { and }) appear reversed () and (, > and <, ] and [, } and {).

ASMO 449 was registered in the International Register of Coded Character Sets as IR 089 [3] in 1985 and approved as an ISO standard as ISO 9036:1987 Information processing - Arabic 7-bit coded character set for information interchange. [4]

Character set

ASMO 449 (1982)
0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
2x  SP   ! " # ¤ % & ' ) ( * + ، - . /
3x 0 1 2 3 4 5 6 7 8 9 : ؛ > = < ؟
4x @ ء آ أ ؤ إ ئ ا ب ة ت ث ج ح خ د
5x ذ ر ز س ش ص ض ط ظ ع غ ] \ [ ^ _
6x ـ ف ق ك ل م ن ه و ى ي ً ٌ ٍ َ ُ
7x ِ ّ ْ } | { ~ DEL

There is a variant, sometimes named ASMO 449+ [5] which adds the characters NBSP in 0x75, "ﹳ" in 0x76, "لآ" in 0x77, "لأ" in 0x78, "لإ" in 0x79 and "لا" in 0x7A.

Relationship with other character sets

ASMO 449 is a 7-bit character set. Although some encodings allocate this 7-bit character set in the upper part of the 8-bit character set, it should not be confused with ASMO 708. In the character sets that allocate ASMO 449 (or some variant of it) in the upper part of the 8-bit character set, the existence of apparently repeated characters is due to the fact that the characters in the lower part are for left-to-right script while the characters in the upper part are for right-to-left script. When ASMO 449 (or some variant of it) is allocated to the upper part of the 8-bit character set, it has Arabic digits.

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Modern computer systems have evolved to use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

<span class="mw-page-title-main">UTF-16</span> Variable-width encoding of Unicode, using one or two 16-bit code units

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding, now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 216 (65,536) code points were needed.

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

ISO/IEC 8859-11:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin/Thai. It is nearly identical to the national Thai standard TIS-620 (1990). The sole difference is that ISO/IEC 8859-11 allocates non-breaking space to code 0xA0, while TIS-620 leaves it undefined.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.

GB/T 2312-1980 is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. GB refers to the Guobiao standards (国家标准), whereas the T suffix denotes a non-mandatory standard.

<span class="mw-page-title-main">ArmSCII</span> Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

The currency sign¤ is a character used to denote an unspecified currency. It can be described as a circle the size of a lowercase character with four short radiating arms at 45° (NE), 135° (SE), 225° (SW) and 315° (NW). It is raised slightly above the baseline. The character is sometimes called scarab.

Several 8-bit character sets (encodings) were designed for binary representation of common Western European languages, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols. These character sets also happen to support many other languages such as Malay, Swahili, and Classical Latin.

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or more precisely ISO/IEC 646-IRV. It was developed in common with ITU-T for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

<span class="mw-page-title-main">JIS X 0201</span> Japanese single byte character encoding

JIS X 0201, a Japanese Industrial Standard developed in 1969, was the first Japanese electronic character set to become widely used. It is either a 7-bit encoding or an 8-bit encoding, although the 8-bit form is dominant for modern use. The full name of this standard is 7-bit and 8-bit coded character sets for information interchange (7ビット及び8ビットの情報交換用符号化文字集合).

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

<span class="mw-page-title-main">Extended ASCII</span> Nickname for 8-bit ASCII-derived character sets

Extended ASCII is a repertoire of character encodings that include the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.

KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.

JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current standard is 7-bit and 8-bit double byte coded KANJI sets for information interchange. It was originally established as JIS C 6226 in 1978, and has been revised in 1983, 1990, and 1997. It is also called Code page 952 by IBM. The 1978 version is also called Code page 955 by IBM.

In mathematics, the radical symbol, radical sign, root symbol, radix, or surd is a symbol for the square root or higher-order root of a number. The square root of a number x is written as

ISO-IR-197 is an 8-bit, single-byte character encoding which was designed for the Sámi languages. It is a modification of ISO 8859-1, replacing certain punctuation and symbol characters with additional letters used in certain Sámi orthographies.

References

  1. Computing and the Qurʾān - Some caveats, 2007, Thomas Milo
  2. 1 2 Le codage informatique de l'écriture arabe : d'ASMO 449 à Unicode et ISO/CEI 10646
  3. 1 2 "7-bit Arabic Code for Information Interchange, Arab standard ASMO-449, ISO 9036" (PDF). Archived from the original (PDF) on 2017-02-21. Retrieved 2017-02-20.
  4. ISO 9036:1987
  5. 1 2 3 4 5 Printronix ACA Emulation Programmer's Reference Manual
  6. Code Table 7