Perso-Arabic Script Code for Information Interchange

Last updated

Perso-Arabic Script Code for Information Interchange (PASCII) is one of the Indian government standards for encoding languages using writing systems based on Perso-Arabic alphabet, in particular Kashmiri, Persian, Sindhi and Urdu. The ISCII encoding was originally intended to cover both the Brahmi-derived writing systems of India and the Arabic-based systems, but it was subsequently decided to encode the Arabic-based writing systems separately.

PASCII has not been widely used outside certain government institutions and has now been rendered largely obsolete by Unicode. Unicode uses a separate block for each writing system and largely preserves the PASCII layout within each block.

Codepage layout

The following table shows the character set for PASCII. Each character is shown with its decimal code and its Unicode equivalent.

PASCII
0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL   BS    HT    LF    VT    FF    CR    SO    SI   
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN   EM   SUB ESC   FS    GS    RS    US  
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ DEL
8xـاآبٻڀپڦتةٿٹ [lower-alpha 1] ٺثج
9xڄڃچڇحخدڌڈ [lower-alpha 2] ڏڍذرڑ [lower-alpha 3] ڙز
Axژسشصضطظعغفقڪ [lower-alpha 4] کگڳڱ
Bxلمنںڻوۄهھءىؠۓےَِ
Cxُٕٗٔٔ [lower-alpha 5] ّٰٟٖٓۡ [lower-alpha 6] ؔ،ؐ
Dxؓؒؑ [lower-alpha 7] ۙۚۢ؁ [lower-alpha 8] ٪٬٫۰۱۲۳
Ex۴۵۶۷۸۹!()٭+ATR
Fx-/؛:؟=۔۝ [lower-alpha 8] ٬ DM  LB  [lower-alpha 8] ·
  1. This character is displayed as ٽ (Unicode point 067D) for Sindhi texts.
  2. This character is displayed as ڊ (Unicode point 068A) for Sindhi texts.
  3. This character is displayed as ڙ (Unicode point 0699) for Sindhi texts.
  4. This character is usually displayed as ک (Unicode point 06A9) in non-Sindhi texts. See also: Khē
  5. This code space is used for Kashmiri hamza, an orthographic variation not used in Unicode. This diacritic on an alif is 0672 in Unicode.
  6. This diacritic currently has no Unicode support. This diacritic on an alif is 0671 in Unicode.
  7. This combining superscript letter kaf currently has no Unicode support.
  8. 1 2 3 This punctuation mark currently has no Unicode support. Nearest equivalent shown.

Related Research Articles

<span class="mw-page-title-main">Arabic alphabet</span> Alphabets for Arabic and other languages

The Arabic alphabet, or Arabic abjad, is the Arabic script as it is codified for writing Arabic. It is written from right to left in a cursive style and includes 29 letters. Most letters have contextual letterforms.

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, emoji, and non-visual control and formatting codes.

<span class="mw-page-title-main">Mojibake</span> Garbled text as a result of incorrect character encoding

Mojibake is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

Windows-1256 is a code page used under Microsoft Windows to write Arabic and other languages that use Arabic script, such as Persian and Urdu.

Code page 852 is a code page used under DOS to write Central European languages that use Latin script.

The nuqta is a diacritic mark that was introduced in Devanagari and some other Indic scripts to represent sounds not present in the original scripts. It takes the form of a dot placed below a character. This idea is inspired from the Arabic script; for example, there are some letters in Urdu that share the same basic shape but differ in the placement of dots(s) or nuqta(s) in the Perso-Arabic script: the letter ع ain, with the addition of a nuqta on top, becomes the letter غ g͟hain.

<span class="mw-page-title-main">Gaf</span> Letter used to represent the /ɡ/ sound in Persian alphabet.

Gaf, or gāf, can be the name of different Perso-Arabic letters, all representing. They are all forms of the letter kāf, with additional diacritics, such as dots and lines. There are four forms, each used in different places:

<span class="mw-page-title-main">Urdu alphabet</span> Perso-Arabic-based alphabet for Urdu of 40 letters

The Urdu alphabet, is the right-to-left alphabet used for Urdu. It is a modification of the Persian alphabet, which is itself a derivative of the Arabic alphabet. The Urdu alphabet has up to 39 or 40 distinct letters with no distinct letter cases and is typically written in the calligraphic Nastaʿlīq script, whereas Arabic is more commonly written in the Naskh style.

Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

In Unicode and the UCS, a compatibility character is a character that is encoded solely to maintain round-trip convertibility with other, often older, standards. As the Unicode Glossary says:

A character that would not have been encoded except for compatibility and round-trip convertibility with other standards

In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. Some scripts support one and only one writing system and language, for example, Armenian. Other scripts support many different writing systems; for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in Turkish, the Arabic script was used before the 20th century but transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script, see the list of languages by writing system. More or less complementary to scripts are symbols and Unicode control characters.

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms.

<span class="mw-page-title-main">Khudabadi script</span> Abugida

Khudabadi is a script of Sindhi generally used by some Sindhis in India to write the Sindhi language. The script originates from Khudabad, a city in Sindh, and is named after it. It is also known as Hathvanki script. Khudabadi is one of the four scripts used for writing the Sindhi language, the others being Perso-Arabic, Khojki and Devanagari script. It was used by traders and merchants to record their information and rose to importance as the script began to be used to record information kept secret from other non-Sindhi groups and Indian nations.

The Unicode Standard assigns various properties to each Unicode character and code point.

There are three writing systems for Saraiki, but very few of the language's speakers, even those who are literate in other languages, are able to read or write Saraiki in any writing system.

Sindhi is a language broadly spoken by the people of the historical Sindh region in the Indo subcontinent. Modern Sindhi is written in an extended Perso-Arabic script in Sindh province of Pakistan and (formally) in extended-Devanagari by Sindhis in partitioned India. Historically, Sindhi was written in various forms of Landa scripts and various other Indic scripts.

References