Arabic letter mark

Last updated

The Arabic letter mark (ALM) is a non-printing character used in the computerized typesetting of bi-directional text containing mixed left-to-right scripts (such as Latin and Cyrillic) and right-to-left scripts (such as Persian, Arabic, Syriac and Hebrew).

Contents

Similar to the right-to-left mark (RLM), it is used to change the way adjacent characters are grouped with respect to text direction, with some difference on how it affects the bidirectional level resolutions for nearby characters.

Unicode

In Unicode, the ALM character is encoded at U+061C؜ARABIC LETTER MARK. In UTF-8 it is 0xD8 0x9C. Usage is prescribed in the Unicode Bidirectional Algorithm.

See also


Related Research Articles

<span class="mw-page-title-main">Arabic alphabet</span>

The Arabic alphabet, or Arabic abjad, is the Arabic script as specifically codified for writing the Arabic language. It is written from right-to-left in a cursive style, and includes 28 letters, of which most have contextual letterforms. The Arabic alphabet is considered an abjad, with only consonants required to be written; due to its optional use of diacritics to notate vowels, it is considered an impure abjad.

A bidirectional text contains two text directionalities, right-to-left (RTL) and left-to-right (LTR). It generally involves text containing different types of alphabets, but may also refer to boustrophedon, which is changing text direction in each row.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

ISO/IEC 8859-6:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin/Arabic. It was designed to cover Arabic. Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself.

The Mandaic alphabet is a writing system primarily used to write the Mandaic language. It is thought to have evolved between the second and seventh century CE from either a cursive form of Aramaic or from Inscriptional Parthian. The exact roots of the script are difficult to determine. It was developed by members of the Mandaean faith of Lower Mesopotamia to write the Mandaic language for liturgical purposes. Classical Mandaic and its descendant Neo-Mandaic are still in limited use. The script has changed very little over centuries of use.

<span class="mw-page-title-main">Complex text layout</span> Neighbour-dependent grapheme positioning

Complex text layout (CTL) or complex text rendering is the typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes. The term is used in the field of software internationalization, where each grapheme is a character.

<span class="mw-page-title-main">Right-to-left script</span> Type of writing system

In a right-to-left, top-to-bottom script, writing starts from the right of the page and continues to the left, proceeding from top to bottom for new lines. Arabic, Hebrew, and Persian are the most widespread RTL writing systems in modern times.

The Arabic star is a punctuation mark added to Unicode 1.1 because the asterisk (*) might appear similar to a Star of David in its six-lobed form ().

<span class="mw-page-title-main">Hebrew keyboard</span> Keyboard layout

A Hebrew keyboard comes in two different keyboard layouts. Most Hebrew keyboards are bilingual, with Latin characters, usually in a US Qwerty layout. Trilingual keyboard options also exist, with the third script being Arabic or Russian, due to the sizable Arabic- and Russian-speaking populations in Israel.

<span class="mw-page-title-main">Lydian alphabet</span> Alphabet used to write the Lydian language

Lydian script was used to write the Lydian language. Like other scripts of Anatolia in the Iron Age, the Lydian alphabet is based on the Phoenician alphabet. It is related to the East Greek alphabet, but it has unique features.

The left-to-right mark (LRM) is a control character used in computerized typesetting of text containing a mix of left-to-right scripts and right-to-left scripts. It is used to set the way adjacent characters are grouped with respect to text direction.

‏The right-to-left mark (RLM) is a non-printing character used in the computerized typesetting of bi-directional text containing a mix of left-to-right scripts and right-to-left scripts.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

Many Unicode characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string, since the string ends once the program reads the null character.

In Unicode, a script is a collection of letters and other written signs used to represent textual information in one or more writing systems. Some scripts support one and only one writing system and language, for example, Armenian. Other scripts support many different writing systems; for example, the Latin script supports English, French, German, Italian, Vietnamese, Latin itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in Turkish, the Arabic script was used before the 20th century but transitioned to Latin in the early part of the 20th century. More or less complementary to scripts are symbols and Unicode control characters.

Many scripts in Unicode, such as Arabic, have special orthographic rules that require certain combinations of letterforms to be combined into special ligature forms. In English, the common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by Thomas Milo's DecoType.

The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

The Unicode Standard assigns various properties to each Unicode character and code point.

Trojan Source is the name of a software vulnerability that abuses Unicode's bidirectional characters to display source code differently than the actual execution of the source code. The exploit utilizes how writing scripts of different reading directions are displayed and encoded on computers. It was discovered by Nicholas Boucher and Ross Anderson at Cambridge University in late 2021.