ArabTeX is a free software package providing support for the Arabic and Hebrew alphabets to TeX and LaTeX. Written by Klaus Lagally, it can take romanized ASCII or native script input to produce quality ligatures for Arabic, Persian, Urdu, Pashto, Sindhi, Western Punjabi (Lahnda), Maghribi, Uyghur, Kashmiri, Hebrew, Judeo-Arabic, Ladino and Yiddish. ArabTeX characters are placed within a TeX/LaTeX document using the command \RL{ ... }
or the environment \begin{RLtext} ... \end{RLtext}
. ArabTeX is released under the LaTeX Project Public License v1+. [1]
\novocalize\RL{al-salAm `alaykum}
\documentclass[12pt]{article}\usepackage{arabtex}\begin{document}\setarab\fullvocalize\transtrue\arabtrue\begin{RLtext} bismi al-ll_ahi al-rra.hm_ani al-rra.hImi العربية \end{RLtext}\end{document}
\setarab
(set language specific rendering)\setfarsi
(set language specific rendering)\setuighur
(set language specific rendering)\set
... (more language conventions, see the documentation)\novocalize
(individual vowel marks can be displayed using "a, "i, "u)\vocalize
(individual vowel marks can be cancelled using "a, "i, "u)\fullvocalize
(individual vowel marks can be cancelled using "a, "i, "u)\setcode{}
(switch input encodings)\settrans{}
(switch transliteration conventions)Letter | Transliteration | Unicode name |
---|---|---|
ا | A | ARABIC LETTER ALEF |
أ | a' | ARABIC LETTER ALEF WITH HAMZA ABOVE |
ب | b | ARABIC LETTER BEH |
ت | t | ARABIC LETTER TEH |
ث | _t | ARABIC LETTER THEH |
ج | j / ^g | ARABIC LETTER JEEM |
ح | .h | ARABIC LETTER HAH |
خ | x / _h | ARABIC LETTER KHAH |
د | d | ARABIC LETTER DAL |
ذ | _d | ARABIC LETTER THAL |
ر | r | ARABIC LETTER REH |
ز | z | ARABIC LETTER ZAIN |
س | s | ARABIC LETTER SEEN |
ش | ^s | ARABIC LETTER SHEEN |
ص | .s | ARABIC LETTER SAD |
ض | .d | ARABIC LETTER DAD |
ط | .t | ARABIC LETTER TAH |
ظ | .z | ARABIC LETTER ZAH |
ع | ` | ARABIC LETTER AIN |
غ | .g | ARABIC LETTER GHAIN |
ف | f | ARABIC LETTER FEH |
ق | q | ARABIC LETTER QAF |
ك | k | ARABIC LETTER KAF |
ل | l | ARABIC LETTER LAM |
م | m | ARABIC LETTER MEEM |
ن | n | ARABIC LETTER NOON |
و | w / U | ARABIC LETTER WAW |
ه | h | ARABIC LETTER HEH |
ي | y / I | ARABIC LETTER YEH |
َ | a | ARABIC FATHA |
ُ | u / o | ARABIC DAMMA |
ِ | i / e | ARABIC KASRA |
پ | p | ARABIC LETTER PEH |
چ | ^c | ARABIC LETTER TCHEH |
ژ | ^z | ARABIC LETTER JEH |
گ | g | ARABIC LETTER GAF |
ک | .k | ARABIC LETTER KEHEH |
ی | y / I * | ARABIC LETTER FARSI YEH |
ۀ | H-i | ARABIC LETTER HEH WITH YEH |
آ | 'A | ARABIC LETTER ALEF WITH MADDA ABOVE |
ة | T | ARABIC LETTER TEH MARBUTA |
ء | ' | ARABIC LETTER HAMZA ABOVE |
ئ | 'y | ARABIC LETTER YEH WITH HAMZA ABOVE |
ؤ | u' | ARABIC LETTER WAW WITH HAMZA ABOVE |
ً | aN | ARABIC FATHATAN |
ّ | xx | ARABIC SHADDA |
، | , | ARABIC COMMA |
؛ | ; | ARABIC SEMICOLON |
؟ | ? | ARABIC QUESTION MARK |
٪ | % | ARABIC PERCENT SIGN |
| SPACE | |
. | . | FULL STOP |
| - | ZERO WIDTH JOINER |
| \hspace{0ex} | ZERO WIDTH NON-JOINER |
^* Activated by \setfarsi |
Note that one can also overcome the problem with <yah> containing dots using the \yahnodots
command.
The Arabic alphabet, or Arabic abjad, is the Arabic script as specifically codified for writing the Arabic language. It is written from right-to-left in a cursive style, and includes 28 letters, of which most have contextual letterforms. The Arabic alphabet is considered an abjad, with only consonants required to be written; due to its optional use of diacritics to notate vowels, it is considered an impure abjad.
The Hebrew alphabet, known variously by scholars as the Ktav Ashuri, Jewish script, square script and block script, is traditionally an abjad script used in the writing of the Hebrew language and other Jewish languages, most notably Yiddish, Ladino, Judeo-Arabic, and Judeo-Persian. In modern Hebrew, vowels are increasingly introduced. It is also used informally in Israel to write Levantine Arabic, especially among Druze. It is an offshoot of the Imperial Aramaic alphabet, which flourished during the Achaemenid Empire and which itself derives from the Phoenician alphabet.
LaTeX is a software system for typesetting documents. LaTeX markup describes the content and layout of the document, as opposed to the formatted text found in WYSIWYG word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MiKTeX is used to produce an output file suitable for printing or digital distribution.
TeX, stylized within the system as TeX, is a typesetting program which was designed and written by computer scientist and Stanford University professor Donald Knuth and first released in 1978. The term now refers to the system of extensions - which includes software programs called TeX engines, sets of TeX macros, and packages which provide extra typesetting functionality - built around the original TeX language. TeX is a popular means of typesetting complex mathematical formulae; it has been noted as one of the most sophisticated digital typographical systems.
OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.
The device independent file format (DVI) is the output file format of the TeX typesetting program, designed by David R. Fuchs and implemented by Donald E. Knuth in 1982. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer. DVI files are typically used as input to a second program which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert from DVI to popular page description languages and for printing.
The Arabic script has numerous diacritics, which include consonant pointing known as iʻjām (إِعْجَام), and supplementary diacritics known as tashkīl (تَشْكِيل). The latter include the vowel marks termed ḥarakāt.
The Romanian alphabet is a variant of the Latin alphabet used for writing the Romanian language. It is a modification of the classical Latin alphabet and consists of 31 letters, five of which have been modified from their Latin originals for the phonetic requirements of the language.
When used as a diacritic mark, the term dot refers to the glyphs "combining dot above", and "combining dot below" which may be combined with some letters of the extended Latin alphabets in use in a variety of languages. Similar marks are used with other scripts.
XeTeX is a TeX typesetting engine using Unicode and supporting modern font technologies such as OpenType, Graphite and Apple Advanced Typography (AAT). It was originally written by Jonathan Kew and is distributed under the X11 free software license.
Kaph is the eleventh letter of the Semitic abjads, including Phoenician kāp 𐤊, Hebrew kāp̄כ, Aramaic kāp 𐡊, Syriac kāp̄ ܟ, and Arabic kāfك.
Ayin is the sixteenth letter of the Semitic scripts, including Phoenician ʿayin 𐤏, Hebrew ʿayinע, Aramaic ʿē 𐡏, Syriac ʿē ܥ, and Arabic ʿaynع.
Aleph is the first letter of the Semitic abjads, including Phoenician ʾālep 𐤀, Hebrew ʾālefא, Aramaic ʾālap 𐡀, Syriac ʾālap̄ ܐ, Arabic ʾalifا, and North Arabian 𐪑. It also appears as South Arabian 𐩱 and Ge'ez ʾälef አ.
The romanization of Arabic is the systematic rendering of written and spoken Arabic in the Latin script. Romanized Arabic is used for various purposes, among them transcription of names and titles, cataloging Arabic language works, language education when used instead of or alongside the Arabic script, and representation of the language in scientific publications by linguists. These formal systems, which often make use of diacritics and non-standard Latin characters and are used in academic settings or for the benefit of non-speakers, contrast with informal means of written communication used by speakers such as the Latin-based Arabic chat alphabet.
Diacritical marks of two dots¨, placed side-by-side over or under a letter, are used in several languages for several different purposes. The most familiar to English-language speakers are the diaeresis and the umlaut, though there are numerous others. For example, in Albanian, ë represents a schwa. Such diacritics are also sometimes used for stylistic reasons.
The computer program pdfTeX is an extension of Knuth's typesetting program TeX, and was originally written and developed into a publicly usable product by Hàn Thế Thành as a part of the work for his PhD thesis at the Faculty of Informatics, Masaryk University, Brno, Czech Republic. The idea of making this extension to TeX was conceived during the early 1990s, when Jiří Zlatuška and Phil Taylor discussed some developmental ideas with Donald Knuth at Stanford University. Knuth later met Hàn Thế Thành in Brno during his visit to the Faculty of Informatics to receive an honorary doctorate from Masaryk University.
Varieties of Arabic are the linguistic systems that Arabic speakers speak natively. Arabic is a Semitic language within the Afroasiatic family that originated in the Arabian Peninsula. There are considerable variations from region to region, with degrees of mutual intelligibility that are often related to geographical distance and some that are mutually unintelligible. Many aspects of the variability attested to in these modern variants can be found in the ancient Arabic dialects in the peninsula. Likewise, many of the features that characterize the various modern variants can be attributed to the original settler dialects as well as local native languages and dialects. Some organizations, such as SIL International, consider these approximately 30 different varieties to be separate languages, while others, such as the Library of Congress, consider them all to be dialects of Arabic.
Moroccan Arabic, also known as Darija, is the dialectal, vernacular form or forms of Arabic spoken in Morocco. It is part of the Maghrebi Arabic dialect continuum and as such is mutually intelligible to some extent with Algerian Arabic and to a lesser extent with Tunisian Arabic. It is spoken by 90.9% of the population of Morocco. While Modern Standard Arabic is used to varying degrees in formal situations such as religious sermons, books, newspapers, government communications, news broadcasts and political talk shows, Moroccan Arabic is the predominant spoken language of the country and has a strong presence in Moroccan television entertainment, cinema and commercial advertising. Moroccan Arabic has many regional dialects and accents as well, with its mainstream dialect being the one used in Casablanca, Rabat, Tangier, Marrakesh and Fez, and therefore it dominates the media and eclipses most of the other regional accents.
Palestinian Arabic is a dialect continuum of mutually intelligible varieties of Levantine Arabic spoken by Palestinians in Palestine, including the State of Palestine, Israel and in the Palestinian diaspora.