Perso-Arabic Script Code for Information Interchange

Last updated

Perso-Arabic Script Code for Information Interchange (PASCII) is one of the Indian government standards for encoding languages using writing systems based on Perso-Arabic alphabet, in particular Kashmiri, Persian, Sindhi and Urdu. The ISCII encoding was originally intended to cover both the Brahmi-derived writing systems of India and the Arabic-based systems, but it was subsequently decided to encode the Arabic-based writing systems separately.

PASCII has now been rendered largely obsolete by Unicode. The encoding of the Arabic script in Unicode is based on ISO/IEC 8859-6 [1] rather than PASCII

Codepage layout

The following table shows the character set for PASCII. Each character is shown with its decimal code and its Unicode equivalent.

PASCII
0123456789ABCDEF
0x NUL SOH STX ETX EOT ENQ ACK BEL   BS    HT    LF    VT    FF    CR    SO    SI   
1x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN   EM   SUB ESC   FS    GS    RS    US  
2x  SP   ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ DEL
8xـاآبٻڀپڦتةٿٹ [lower-alpha 1] ٺثج
9xڄڃچڇحخدڌڈ [lower-alpha 2] ڏڍذرڑ [lower-alpha 3] ڙز
Axژسشصضطظعغفقڪ [lower-alpha 4] کگڳڱ
Bxلمنںڻوۄهھءىؠۓےَِ
Cxُٕٗٔٔ [lower-alpha 5] ّٰٟٖٓۡ [lower-alpha 6] ؔ،ؐ
Dxؓؒؑ [lower-alpha 7] ۙۚۢ؁ [lower-alpha 8] ٪٬٫۰۱۲۳
Ex۴۵۶۷۸۹!()٭+ATR
Fx-/؛:؟=۔۝ [lower-alpha 8] ٬ DM  LB  [lower-alpha 8] ·
  1. This character is displayed as ٽ (Unicode point 067D) for Sindhi texts.
  2. This character is displayed as ڊ (Unicode point 068A) for Sindhi texts.
  3. This character is displayed as ڙ (Unicode point 0699) for Sindhi texts.
  4. This character is usually displayed as ک (Unicode point 06A9) in non-Sindhi texts. See also: Khē
  5. This code space is used for Kashmiri hamza, an orthographic variation not used in Unicode. This diacritic on an alif is 0672 in Unicode.
  6. This diacritic currently has no Unicode support.[ citation needed ] This diacritic on an alif is 0671 in Unicode.
  7. This combining superscript letter kaf currently has no Unicode support.[ citation needed ]
  8. 1 2 3 This punctuation mark currently has no Unicode support.[ citation needed ] Nearest equivalent shown.

Related Research Articles

<span class="mw-page-title-main">Abugida</span> Writing system

An abugida – sometimes also called alphasyllabary, neosyllabary, or pseudo-alphabet – is a segmental writing system in which consonant–vowel sequences are written as units; each unit is based on a consonant letter, and vowel notation is secondary, similar to a diacritical mark. This contrasts with a full alphabet, in which vowels have status equal to consonants, and with an abjad, in which vowel marking is absent, partial, or optional – in less formal contexts, all three types of the script may be termed "alphabets". The terms also contrast them with a syllabary, in which a single symbol denotes the combination of one consonant and one vowel.

<span class="mw-page-title-main">Arabic alphabet</span>

The Arabic alphabet, or Arabic abjad, is the Arabic script as specifically codified for writing the Arabic language. It is written from right-to-left in a cursive style, and includes 28 letters, of which most have contextual letterforms. The Arabic alphabet is considered an abjad, with only consonants required to be written; due to its optional use of diacritics to notate vowels, it is considered an impure abjad.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

<span class="mw-page-title-main">Arabic diacritics</span> Diacritics used in the Arabic script

The Arabic script has numerous diacritics, which include consonant pointing known as iʻjām (إِعْجَام), and supplementary diacritics known as tashkīl (تَشْكِيل). The latter include the vowel marks termed ḥarakāt.

<span class="mw-page-title-main">Ligature (writing)</span> Glyph combining two or more letterforms

In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters ⟨æ⟩ and ⟨œ⟩ used in English and French, in which the letters ⟨a⟩ and ⟨e⟩ are joined for the first ligature and the letters ⟨o⟩ and ⟨e⟩ are joined for the second ligature. For stylistic and legibility reasons, ⟨f⟩ and ⟨i⟩ are often merged to create ⟨fi⟩ ; the same is true of ⟨s⟩ and ⟨t⟩ to create ⟨st⟩. The common ampersand, ⟨&⟩, developed from a ligature in which the handwritten Latin letters ⟨e⟩ and ⟨t⟩ were combined.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

<span class="mw-page-title-main">Persian alphabet</span> Writing system used for the Persian language

The Persian alphabet, also known as the Perso-Arabic script, is the right-to-left alphabet used for the Persian language. It is a variation of the Arabic script with five additional letters: پ چ ژ گ, in addition to the obsolete ڤ that was used for the sound. This letter is no longer used in Persian, as the -sound changed to, e.g. archaic زڤان > زبان 'language'.

Windows-1256 is a code page used under Microsoft Windows to write Arabic and other languages that use Arabic script, such as Persian and Urdu.

Several 8-bit character sets (encodings) were designed for binary representation of common Western European languages, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols. These character sets also happen to support many other languages such as Malay, Swahili, and Classical Latin.

The nuqta, is a diacritic mark that was introduced in Devanagari and some other Indic scripts to represent sounds not present in the original scripts. It takes the form of a dot placed below a character. This idea is inspired from the Arabic script; for example, there are some letters in Urdu that share the same basic shape but differ in the placement of dots(s) or nuqta(s) in the Perso-Arabic script: the letter ع ayn, with the addition of a nuqta on top, becomes the letter غ g͟hayn.

<span class="mw-page-title-main">Arabic script</span> Writing system for Arabic and several other languages

The Arabic script is the writing system used for Arabic and several other languages of Asia and Africa. It is the second-most widely used alphabetic writing system in the world, the second-most widely used writing system in the world by number of countries using it, and the third-most by number of users.

<span class="mw-page-title-main">Urdu alphabet</span> Writing system used for Urdu

The Urdu alphabet is the right-to-left alphabet used for writing Urdu. It is a modification of the Persian alphabet, which itself is derived from the Arabic script. It has official status in the republics of Pakistan, India and South Africa. The Urdu alphabet has up to 39 or 40 distinct letters with no distinct letter cases and is typically written in the calligraphic Nastaʿlīq script, whereas Arabic is more commonly written in the Naskh style.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

In Unicode and the UCS, a compatibility character is a character that is encoded solely to maintain round-trip convertibility with other, often older, standards. As the Unicode Glossary says:

A character that would not have been encoded except for compatibility and round-trip convertibility with other standards

In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. Some scripts support one and only one writing system and language, for example, Armenian. Other scripts support many different writing systems; for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in Turkish, the Arabic script was used before the 20th century but transitioned to Latin in the early part of the 20th century. More or less complementary to scripts are symbols and Unicode control characters.

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.

<span class="mw-page-title-main">Khudabadi script</span> Abugida

Khudabadi was a script used to write the Sindhi language, generally used by some Sindhi Hindus even in the present-day. The script originates from Khudabad, a city in Sindh, and is named after it. It is also known as Hathvanki script. Khudabadi is one of the four scripts used for writing Sindhi, the others being Perso-Arabic, Khojki and Devanagari script. It was used by traders and merchants to record their information and rose to importance as the script began to be used to record information kept secret from other non-Sindhi groups.

There are three writing systems for Saraiki:

The Hanifi Rohingya script is a unified script for the Rohingya language. Rohingya today is written in three scripts, Hanifi, Arabic, and Latin (Rohingyalish). The Rohingya language was first written in the 19th century with a version of the Perso-Arabic script. In 1975, an orthographic Arabic script was developed and approved by the community leaders, based on the Urdu alphabet but with unique innovations to make the script suitable to Rohingya.

Sindhi is a language broadly spoken by the people of the historical Sindh region in the Indo subcontinent. Modern Sindhi is written in an extended Perso-Arabic script in Sindh province of Pakistan and (formally) in extended-Devanagari by Sindhis in partitioned India. Historically, Sindhi was written in various forms of Landa scripts and various other Indic scripts.

References

  1. "The Unicode Standard v15.0 Chapter 9" (PDF).