Perso-Arabic Script Code for Information Interchange (PASCII) is one of the Indian government standards for encoding languages using writing systems based on Perso-Arabic alphabet, in particular Kashmiri, Persian, Sindhi and Urdu. The ISCII encoding was originally intended to cover both the Brahmi-derived writing systems of India and the Arabic-based systems, but it was subsequently decided to encode the Arabic-based writing systems separately.
PASCII has not been widely used outside certain government institutions and has now been rendered largely obsolete by Unicode. Unicode uses a separate block for each writing system and largely preserves the PASCII layout within each block.
The following table shows the character set for PASCII. Each character is shown with its decimal code and its Unicode equivalent.
PASCII | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
1x | DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
8x | ـ | ا | آ | ب | ٻ | ڀ | پ | ڦ | ت | ة | ٿ | ٹ [lower-alpha 1] | ٺ | ث | ج | |
9x | ڄ | ڃ | چ | ڇ | ح | خ | د | ڌ | ڈ [lower-alpha 2] | ڏ | ڍ | ذ | ر | ڑ [lower-alpha 3] | ڙﻬ | ز |
Ax | ژ | س | ش | ص | ض | ط | ظ | ع | غ | ف | ق | ڪ [lower-alpha 4] | ک | گ | ڳ | ڱ |
Bx | ل | م | ن | ں | ڻ | و | ۄ | ه | ھ | ء | ى | ؠ | ۓ | ے | َ | ِ |
Cx | ُ | ٗ | ٔ | ٕ | ٔ [lower-alpha 5] | ٟ | ّ | ٓ | ۡ | ٰ | ٖ | [lower-alpha 6] | … | ؔ | ، | ؐ |
Dx | ؓ | ؒ | ؑ | ࣗ | [lower-alpha 7] | ۙ | ۚ | ۢ | [lower-alpha 8] | ٪ | ٬ | ٫ | ۰ | ۱ | ۲ | ۳ |
Ex | ۴ | ۵ | ۶ | ۷ | ۸ | ۹ | ! | “ | ” | ‘ | ’ | ( | ) | ٭ | + | ATR |
Fx | - | / | ؛ | : | ؟ | = | ۔ | | ● [lower-alpha 8] | ٬ | DM | LB | ◌ [lower-alpha 8] | · |
The Arabic alphabet, or Arabic abjad, is the Arabic script as it is codified for writing Arabic. It is written from right to left in a cursive style and includes 29 letters. Most letters have contextual letterforms.
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".
Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, emoji, and non-visual control and formatting codes.
Mojibake is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.
Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.
Windows-1256 is a code page used under Microsoft Windows to write Arabic and other languages that use Arabic script, such as Persian and Urdu.
Code page 852 is a code page used under DOS to write Central European languages that use Latin script.
The nuqta is a diacritic mark that was introduced in Devanagari and some other Indic scripts to represent sounds not present in the original scripts. It takes the form of a dot placed below a character. This idea is inspired from the Arabic script; for example, there are some letters in Urdu that share the same basic shape but differ in the placement of dots(s) or nuqta(s) in the Perso-Arabic script: the letter ع ain, with the addition of a nuqta on top, becomes the letter غ g͟hain.
Gaf, or gāf, can be the name of different Perso-Arabic letters, all representing. They are all forms of the letter kāf, with additional diacritics, such as dots and lines. There are four forms, each used in different places:
The Urdu alphabet, is the right-to-left alphabet used for Urdu. It is a modification of the Persian alphabet, which is itself a derivative of the Arabic alphabet. The Urdu alphabet has up to 39 or 40 distinct letters with no distinct letter cases and is typically written in the calligraphic Nastaʿlīq script, whereas Arabic is more commonly written in the Naskh style.
Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters.
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.
In Unicode and the UCS, a compatibility character is a character that is encoded solely to maintain round-trip convertibility with other, often older, standards. As the Unicode Glossary says:
A character that would not have been encoded except for compatibility and round-trip convertibility with other standards
In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. Some scripts support one and only one writing system and language, for example, Armenian. Other scripts support many different writing systems; for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in Turkish, the Arabic script was used before the 20th century but transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script, see the list of languages by writing system. More or less complementary to scripts are symbols and Unicode control characters.
Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms.
Khudabadi is a script of Sindhi generally used by some Sindhis in India to write the Sindhi language. The script originates from Khudabad, a city in Sindh, and is named after it. It is also known as Hathvanki script. Khudabadi is one of the four scripts used for writing the Sindhi language, the others being Perso-Arabic, Khojki and Devanagari script. It was used by traders and merchants to record their information and rose to importance as the script began to be used to record information kept secret from other non-Sindhi groups and Indian nations.
The Unicode Standard assigns various properties to each Unicode character and code point.
There are three writing systems for Saraiki, but very few of the language's speakers, even those who are literate in other languages, are able to read or write Saraiki in any writing system.
Sindhi is a language broadly spoken by the people of the historical Sindh region in the Indo subcontinent. Modern Sindhi is written in an extended Perso-Arabic script in Sindh province of Pakistan and (formally) in extended-Devanagari by Sindhis in partitioned India. Historically, Sindhi was written in various forms of Landa scripts and various other Indic scripts.