This article relies largely or entirely on a single source .(June 2022) |
ISO 11940 is an ISO standard for the transliteration of Thai characters, published in 1998 and updated in September 2003 and confirmed in 2008. An extension to this standard named ISO 11940-2 defines a simplified transcription based on it.
Thai | ก | ข | ฃ | ค | ฅ | ฆ | ง | จ | ฉ | ช | ซ | ฌ | ญ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ISO | k | k̄h | ḳ̄h | kh | k̛h | ḳh | ng | c | c̄h | ch | s | c̣h | ỵ |
Thai | ฎ | ฏ | ฐ | ฑ | ฒ | ณ | ด | ต | ถ | ท | ธ | น | |
ISO | ḍ | ṭ | ṭ̄h | ṯh | t̛h | ṇ | d | t | t̄h | th | ṭh | n | |
Thai | บ | ป | ผ | ฝ | พ | ฟ | ภ | ม | |||||
ISO | b | p | p̄h | f̄ | ph | f | p̣h | m | |||||
Thai | ย | ร | ฤ | ล | ฦ | ว | ศ | ษ | ส | ห | ฬ | อ | ฮ |
ISO | y | r | v | l | ł | w | ṣ̄ | s̛̄ | s̄ | h̄ | ḷ | x | ḥ |
The transliteration of the pure consonants is derived from their usual pronunciation as an initial consonant. An unmarked h is used to form digraphs denoting aspirated consonants. High and low pairs of consonants are systematically differentiated by applying a macron to the high class consonant. Further differentiation of consonants with identical phonetic function is obtained by leaving the most frequent unmarked, marking the second commonest by a dot below, marking the third commonest by a horn, and marking the fourth commonest by underlining. The use of a dot below has a similar effect to the Indological practice of distinguishing retroflex consonants by a dot below, but there are subtle differences – it is the transliterations of ธ tho thong and ศ so sala that are dotted below, not those of the corresponding retroflex consonants. The transliterations of consonants should be entered in the order base letter, macron if any, and then dot below, horn or "macron below". Only three consonants have the horn in their transliteration, ฅ kho khon, ฒ tho phuthao and ษ so ruesi, and only one consonant has an underline, ฑ tho nang montho.
Thai | ะ | –ั | า | ำ | –ิ | –ี | –ึ | –ื | –ุ | –ู | เ | แ | โ | ใ | ไ | ฤ | ฤๅ | ฦ | ฦๅ | ย | ว | อ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ISO | a | ạ | ā | å | i | ī | ụ | ụ̄ | u | ū | e | æ | o | ı | ị | v | vɨ | ł | łɨ | y | w | x |
The letter å is the only precomposed character specified in the output of transliteration.
Lakkhangyao (ๅ) has been shown only in combination with the vowel letters ฤ and ฦ. The standard simply lists ฤ and ฦ with the consonants and lakkhangyao with the vowels. An isolated lakkhangyao would also be transliterated by a small letter "i" with stroke (ɨ), but such should not occur in Thai, Pāli, or Sanskrit.
The transliterations of ว wo waen and อ o ang have been included here because of their use as complete vowel symbols, but their transliteration does not depend on how they are being used and the standard simply lists them with the consonants.
Compound vowel symbols are transliterated in accordance with their constituents.
Thai | –่ | –้ | –๊ | –๋ | –็ | –์ | –๎ | –ํ | –ฺ |
---|---|---|---|---|---|---|---|---|---|
ISO | –̀ | –̂ | –́ | –̌ | –̆ | –̒ | ~ | –̊ | –̥ |
Note that yamakkan (–๎) is represented by a spacing tilde, not a superscript tilde.
Thai | ๆ | ฯ | ๏ | ฯ | ๚ | ๛ | ๐ | ๑ | ๒ | ๓ | ๔ | ๕ | ๖ | ๗ | ๘ | ๙ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ISO | « | ǂ | § | ǀ | ǁ | » | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
ISO 11940:1998 distinguishes the abbreviation symbol paiyannoi (ฯ) from the sentence terminator angkhandiao (ฯ), even though neither the national character standard TIS 620-2533 nor Unicode Version 5.0 distinguishes them. Paiyannoi is transliterated as ǂ and angkhandiao is transliterated as ǀ. Note that paiyannoi, angkhandiao and angkhankhu (๚) are transliterated by the letters used for click consonants, not by double dagger, vertical bars or dandas .
In general characters are transliterated from left to right and, where characters have the same horizontal position, from top to bottom. The vertical sequencing is in fact simply specified as tone marks and thanthakhat (–์) preceding any other marks above or below the consonant. The standard denies at the end of Section 4.2 that the combination of sara u (◌ุ, ◌ู) and nikkhahit (◌ํ) can occur and then gives an example of it when specifying the transliteration of nikkhahit, but does not show the transliteration of the combination. The effect of these rules is that, except for nikkhahit, all the non-vowel marks attached to a consonant in Thai are attached to the consonant in the Roman transliteration.
The standard concedes that attempting to transpose preposed vowels and consonants may be comforting to those used to the Roman alphabet, but recommends that preposed vowels not be transposed.
For example, ภาษาไทย ( RTGS: Phasa Thai) should be transliterated to p̣hās̛̄āịthy and เชียงใหม่ ( RTGS: Chiang Mai) to echīyngıh̄m̀.
The standard specifies the order in which the accents should be typed, but not all input systems will record accents in the order in which they are typed. Unicode specifies two normalised forms for letters with multiple accents, and transliterated text is highly likely to be stored in one of these forms. This complicates automatic back-transliteration. As Unicode-compliant processes must handle such variations correctly, the transliterations on this page have been chosen for ease of display – present day rendering systems may display equivalent forms differently.
Many fonts display novel combinations of consonants and accents badly. For example, the Institute of the Estonian Language publishes an explanation of the application of the standard to Thai on the web, and with one exception this seems to be a comply with the standard. The exception is that, except for the macron, accents over consonants are actually offset to the right, giving the impression that they have been entered as the corresponding non-combining characters. The standard specifies the transliterations in code points, but someone working from this free explanation could easily deduce that the spacing forms of the tone accents should be used.
The ICU implementation, recorded in Version 1.4.1 of the Common Locale Data Repository sponsored by Unicode, [1] uses a prime instead of a horn in the transliteration of consonants. This affects the transliteration of ฅ kho khon, ฒ tho phuthao and ษ so bo ruesi. ฏ to patak is also transliterated differently, as t̩ rather than ṭ.
This implementation transliterates ำ as ả instead of å to avoid ambiguity with the hypothetical Thai script sequence ะํ (sara a, nikkhahit). The ICU implementation transliterates ฺ phinthu as ˌinstead of to avoid problems with Unicode normalisation. This has the side effect of improving legibility when applied to an underdotted consonant.
The ICU implementation transliterates ฯ paiyannoi as ‡ (double dagger) and angkhankhu as || (two ASCII vertical bars). As the ICU implementation uses Unicode, it cannot reliably distinguish angkhandiao from paiyannoi without a semantic analysis, and makes no such attempt.
The character sequencing of the ICU implementation is different. It transposes preposed vowels with the following consonant, and processes the marks on a consonant in the order in which they are stored in memory. (Most Thai input methods ensure that the marks are stored in bottom to top order.) It does not transpose preposed vowels with complete consonant clusters; consonant clusters cannot be identified with complete accuracy, and transposing vowels with clusters would require an additional symbol to permit reliable conversion back to the Thai script.
For example, under this implementation ภาษาไทย transliterates to p̣hās̄ʹāthịy and เชียงใหม่ to cheīyngh̄ım̀.
Finally, this implementation generates transliterations in Unicode Normalisation Form C (NFC).
A diacritic is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek διακριτικός, from διακρίνω. The word diacritic is a noun, though it is sometimes used in an attributive sense, whereas diacritical is only an adjective. Some diacritics, such as the acute ⟨ó⟩, grave ⟨ò⟩, and circumflex ⟨ô⟩, are often called accents. Diacritics may appear above or below a letter or in some other position such as within the letter or between two letters.
A macron is a diacritical mark: it is a straight bar ¯ placed above a letter, usually a vowel. Its name derives from Ancient Greek μακρόν (makrón) 'long' because it was originally used to mark long or heavy syllables in Greco-Roman metrics. It now more often marks a long vowel. In the International Phonetic Alphabet, the macron is used to indicate a mid-tone; the sign for a long vowel is instead a modified triangular colon ⟨ː⟩.
The Standard Alphabet is a Latin-script alphabet developed by Karl Richard Lepsius. Lepsius initially used it to transcribe Egyptian hieroglyphs in his Denkmäler aus Ägypten und Äthiopien and extended it to write African languages, published in 1853, 1854 and 1855, and in a revised edition in 1863. The alphabet was comprehensive but was not used much as it contained a lot of diacritic marks and was difficult to read and typeset at that time. It was, however, influential in later projects such as Ellis's Paleotype, and diacritics such as the acute accent for palatalization, under-dot for retroflex, underline for Arabic emphatics, and the click letters continue in modern use.
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used by various languages in several language families in South, East and Southeast Asia: Indo-Aryan, Dravidian, Tibeto-Burman, Mongolic, Austroasiatic, Austronesian, and Tai. They were also the source of the dictionary order (gojūon) of Japanese kana.
The Coptic script is the script used for writing the Coptic language, the most recent development of Egyptian. The repertoire of glyphs is based on the uncial Greek alphabet, augmented by letters borrowed from the Egyptian Demotic. It was the first alphabetic script used for the Egyptian language. There are several Coptic alphabets, as the script varies greatly among the various dialects and eras of the Coptic language.
The Thai script is the abugida used to write Thai, Southern Thai and many other languages spoken in Thailand. The Thai alphabet itself has 44 consonant symbols and 16 vowel symbols that combine into at least 32 vowel forms and four tone diacritics to create characters mostly representing syllables.
The circumflex is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from Latin: circumflexus "bent around"—a translation of the Greek: περισπωμένη.
A breve is the diacritic mark ◌̆, shaped like the bottom half of a circle. As used in Ancient Greek, it is also called brachy, βραχύ. It resembles the caron but is rounded, in contrast to the angular tip of the caron. In many forms of Latin, ◌̆ is used for a shorter, softer variant of a vowel, such as "Ĭ", where the sound is nearly identical to the English /i/.
Anusvara, also known as Bindu, is a symbol used in many Indic scripts to mark a type of nasal sound, typically transliterated ⟨ṃ⟩ or ⟨ṁ⟩ in standards like ISO 15919 and IAST. Depending on its location in the word and the language for which it is used, its exact pronunciation can vary. In the context of ancient Sanskrit, anusvara is the name of the particular nasal sound itself, regardless of written representation.
A ring diacritic may appear above or below letters. It may be combined with some letters of the extended Latin alphabets in various contexts.
The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that emerged during the 19th century from suggestions by Charles Trevelyan, William Jones, Monier Monier-Williams and other scholars, and formalised by the Transliteration Committee of the Geneva Oriental Congress, in September 1894. IAST makes it possible for the reader to read the Indic text unambiguously, exactly as if it were in the original Indic script. It is this faithfulness to the original scripts that accounts for its continuing popularity amongst scholars.
The Royal Thai General System of Transcription (RTGS) is the official system for rendering Thai words in the Latin alphabet. It was published by the Royal Institute of Thailand in early 1917, when Thailand was called Siam.
As of Unicode version 15.1, Cyrillic script is encoded across several blocks:
Diacritical marks of two dots¨, placed side-by-side over or under a letter, are used in several languages for several different purposes. The most familiar to English-language speakers are the diaeresis and the umlaut, though there are numerous others. For example, in Albanian, ë represents a schwa. Such diacritics are also sometimes used for stylistic reasons.
There are many systems for the romanization of the Thai language, i.e. representing the language in Latin script. These include systems of transliteration, and transcription. The most seen system in public space is Royal Thai General System of Transcription (RTGS)—the official scheme promulgated by the Royal Thai Institute. It is based on spoken Thai, but disregards tone, vowel length and a few minor sound distinctions.
Thai Braille (อักษรเบรลล์) and Lao Braille (ອັກສອນເບຣລລ໌) are the braille alphabets of the Thai language and Lao language. Thai Braille was adapted by Genevieve Caulfield, who knew both English and Japanese Braille. Unlike the print Thai alphabet, which is an abugida, Thai and Lao Braille have full letters rather than diacritics for vowels. However, traces of the abugida remain: Only the consonants are based on the international English and French standard, while the vowels are reassigned and the five vowels transcribed a e i o u are taken from Japanese Braille.
There are several systems for romanisation of the Telugu script.
ISO 11940-2 is an ISO standard for a simplified transcription of the Thai language into Latin characters.
The Tai Viet script is a Brahmic script used by the Tai Dam people and various other Thai people in Vietnam and Thailand.