Sumihiri

Last updated

Sumihiri is a transliteration scheme that enables writing Sinhala language text using the English alphabet. A number of tools are available to convert text written using sumihiri to Sinhala script, if desired.

Transliteration is a type of conversion of a text from one script to another that involves swapping letters in predictable ways.

English alphabet Latin alphabet consisting of 26 letters, each having an uppercase and a lowercase form

The modern English alphabet is a Latin alphabet consisting of 26 letters, each having an upper- and lower-case form. The same letters constitute the ISO basic Latin alphabet. The alphabet's current form originated in about the 7th century from the Latin script. Since then, various letters have been added, or removed, to give the current Modern English alphabet of 26 letters:

Contents

History

Sumihiri was developed for a project that allowed typesetting Sinhala documents using the LaTeX Document Preparation System. That project involved creating a Sinhala font using Metafont and tools that converted text written in sumihiri into TeX commands that print the corresponding Sinhala script. A paper published in 1995 described the rules of sumihiri.ජලය

Typesetting composition of text by means of types

Typesetting is the composition of text by means of arranging physical types or the digital equivalents. Stored letters and other symbols are retrieved and ordered according to a language's orthography for visual display. Typesetting requires one or more fonts. One significant effect of typesetting was that authorship of works could be spotted more easily, making it difficult for copiers who have not gained permission.

LaTeX is a document preparation system. When writing, the writer uses plain text as opposed to the formatted text found in WYSIWYG word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MikTeX is used to produce an output file suitable for printing or digital distribution. Within the typesetting system, its name is stylised as LaTeX.

Font particular size, weight and style of a typeface

In metal typesetting, a font was a particular size, weight and style of a typeface. Each font was a matched set of type, one piece for each glyph, and a typeface consisting of a range of fonts that shared an overall design.

Design Criteria of sumihiri

The design of sumihiri was guided by the following principles:

  1. It must be possible to write any modern Sinhala word without ambiguity.
  2. As far as possible, common conventions that are being used to write Sinhala names using English keyboard must be retained.
  3. It must be rather easy and effortless to write for someone familiar with the standard English keyboard.
  4. It must be possible to read and correctly pronounce text written using 'sumihiri'.
  5. Typing must be kept to a minimum.

Design of sumihiri tries to achieve the above, sometimes conflicting, goals.

Base Letters

'sumihiri' distinguishes between the two 'ම' letters in the Sinhala word 'මම'. This is to facilitate correct pronouncing of text written using 'sumihiri'. The first 'ම', which has an 'open' sound is written as 'ma', whereas the second 'ම', which has a 'closed' sound is written as 'me'. Although this will have no effect in printing applications (both print the letter 'ම'), it is highly recommended to stick to this usage.

Pronunciation way a word or a language is spoken, or the manner in which someone utters a word

Pronunciation is the way in which a word or a language is spoken. This may refer to generally agreed-upon sequences of sounds used in speaking a given word or language in a specific dialect, or simply the way a particular individual speaks a word or language.

Printing process for reproducing text and images, typically with ink on paper using a printing press

Printing is a process for reproducing text and images using a master form or template. The earliest non-paper products involving printing include cylinder seals and objects such as the Cyrus Cylinder and the Cylinders of Nabonidus. The earliest known form of printing as applied to paper was woodblock printing, which appeared in China before 220 AD. Later developments in printing technology include the movable type invented by Bi Sheng around 1040 AD and the printing press invented by Johannes Gutenberg in the 15th century. The technology of printing played a key role in the development of the Renaissance and the scientific revolution, and laid the material basis for the modern knowledge-based economy and the spread of learning to the masses.

Most of the base letters are obtained in an obvious way (note that 'e' could be used instead of 'a' to print the same letter):

ka - ක, ga - ග, ja - ජ, ta - ට, da - ඩ, na - න, pa - ප, ba - බ, ma - ම, ya - ය, ra - ර, la - ල, wa - ව, sa - ස, ha - හ

Some require two letters:

cha - ච, tha - ත, dha - ද, sha - ශ

A capitalized version is used for 'මහප්‍රාණ' letters:

Ka - ඛ, Cha - ඡ, Ga - ඝ, Tha - ථ, Dha - ධ, Pa - ඵ, Ba - භ

Nasal letters:

Nga - ඟ, Nda - ඬ, Ndha - ඳ, Mba - ඹ

Some others:

La - ළ, Na - ණ, Sha - ෂ, xa - ක්‍ෂ‍, qa - ඤ, GNa - ඥ

Special Letters

NG - ං, H - ඃ

Vowel Sounds

Short vowel sounds are as follows:

a - අ, z - ඇ, i - ඉ, u - උ, E - එ, o - ඔ

Long vowel sounds are as follows:

aa - ආ, zz - ඈ, ii - ඊ, uu - ඌ, ee - ඒ, oo - ඕ

Some remarks are in order: In a radical approach, sumihiri uses the English letter 'z' to represent the 'ඇ' vowel sound. Not only is 'z' easy to type frequently but it also resembles the decoration 'ඇදපිල්ල' to some extent. Alternative transliteration schemes tend to use the two letter combination 'ae' for the same purpose. In those schemes, the corresponding longer vowel is represented with the letter combination 'ei', whereas in sumihiri, it is 'zz' which follows the rule of duplicating the letter to get the longer vowel. Thus the 'sumihiri' convention implies less effort in typing, too. The other remark concerns the use of 'E' for the 'එ' sound. 'E' is used for this as we have used the 'e' to indicate the closed 'අ' sound (see above). However, the corresponding long vowel sound is still indicated by 'ee'.

A vowel is one of the two principal classes of speech sound, the other being a consonant. Vowels vary in quality, in loudness and also in quantity (length). They are usually voiced, and are closely involved in prosodic variation such as tone, intonation and stress. Vowel sounds are produced with an open vocal tract. The word vowel comes from the Latin word vocalis, meaning "vocal". In English, the word vowel is commonly used to refer both to vowel sounds and to the written symbols that represent them.

Furthermore, the following vowel sounds exist:

 ai - ඓ, au - ඖ

Other Letters

Other forms of Sinhala letters are created by combining vowel sounds with the base letters.

Example

Here is a complete example of sumihiri usage to create various forms of the letter 'ක':

 k - ක්, ka/ke - ක, kaa - කා, kz - කැ, kzz - කෑ, ki - කි, kii - කී, ku - කු, kuu - කූ,  kE - කෙ, kee - කේ, ko - කො, koo - කෝ, kai - කෛ, kau - කෞ

Repaya, Yansaya and Rakaransaya

Example

 Rke - ර්‍ක, kYe - ක්‍ය, kRe - ක්‍ර

In general, any vowel sound can be associated with any consonant.

Consonant sound in spoken language, articulated with complete or partial closure of the vocal tract

In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are, pronounced with the lips;, pronounced with the front of the tongue;, pronounced with the back of the tongue;, pronounced in the throat; and, pronounced by forcing air through a narrow channel (fricatives); and and, which have air flowing through the nose (nasals). Contrasting with consonants are vowels.

Related Research Articles

Arabic alphabet alphabet codified for writing the Arabic language

The Arabic alphabet or Arabic abjad is the Arabic script as it is codified for writing Arabic. It is written from right to left in a cursive style and includes 28 letters. Most letters have contextual letterforms.

Devanagari writing script for many Indian and Nepalese languages

Devanagari, also called Nagari, is a left-to-right abugida (alphasyllabary), based on the ancient Brāhmī script, used in Indian subcontinent. It developed from 1st to 4th century CE, and was in regular use by 7th century CE. The Devanagari script, composed of 47 primary characters including 14 vowels and 33 consonants, is one of the most adopted writing systems in the world, being used for over 120 languages. The ancient Nagari script for Sanskrit had two additional consonantal characters.

Sinhalese script abugida

The Sinhalese script is a writing system used by the Sinhalese people and most Sri Lankans in Sri Lanka and elsewhere to write the Sinhalese language, as well as the liturgical languages Pali and Sanskrit. The Sinhalese Akṣara Mālāva, one of the Brahmic scripts, is a descendant of the ancient Indian Brahmi script and closely related to the South Indian Grantha script and Kadamba alphabet.

The Kannada script is an abugida of the Brahmic family, used primarily to write the Kannada language, one of the Dravidian languages of South India especially in the state of Karnataka, Kannada script is widely used for writing Sanskrit texts in Karnataka. Several minor languages, such as Tulu, Konkani, Kodava, Sanketi and Beary, also use alphabets based on the Kannada script. The Kannada and Telugu scripts share high mutual intellegibility with each other, and are often considered to be regional variants of single script. Other scripts similar to Kannada script are Sinhala script, and Old Peguan script (used in Burma).

Æ letter of the Latin alphabet

Æ is a grapheme named æsc or ash, formed from the letters a and e, originally a ligature representing the Latin diphthong ae. It has been promoted to the full status of a letter in the alphabets of some languages, including Danish, Norwegian, Icelandic, and Faroese. As a letter of the Old English Latin alphabet, it was called æsc after the Anglo-Saxon futhorc rune ᚫ which it transliterated; its traditional name in English is still ash. It was also used in Old Swedish before being changed to ä. In recent times, it is also used to represent a short "a" sound. Variants include Ǣ ǣ Ǽ ǽ æ̀.

The Arabic script has numerous diacritics, including i'jam ⟨إِعْجَام⟩ - i‘jām, consonant pointing and tashkil ⟨تَشْكِيل⟩ - tashkīl, supplementary diacritics. The latter include the ḥarakāt ⟨حَرَكَات⟩ vowel marks - singular: ḥarakah ⟨حَرَكَة⟩.

There are several methods of transliteration from Devanāgarī to the Roman script which share similarities, although no single system of transliteration has emerged as the standard. This process has been termed Romanagari, a portmanteau of the words Roman and Devanagari.. The term may also be used for other languages that use Devanagari as the standard writing script, such as Marathi, Nepali or Sanskrit.

Sinhalese language language of the Sinhalese people of Sri Lanka

Sinhalese, known natively as Sinhala, is the native language of the Sinhalese people, who make up the largest ethnic group in Sri Lanka, numbering about 16 million. Sinhalese is also spoken as a second language by other ethnic groups in Sri Lanka, totalling about four million. It belongs to the Indo-Aryan branch of the Indo-European languages. Sinhalese is written using the Sinhalese script, which is one of the Brahmic scripts, a descendant of the ancient Indian Brahmi script closely related to the Kadamba alphabet.

Typographic ligature letter

In writing and typography, a ligature occurs where two or more graphemes or letters are joined as a single glyph. An example is the character æ as used in English, in which the letters a and e are joined. The common ampersand (&) developed from a ligature in which the handwritten Latin letters e and t were combined.

Wylie transliteration

The Wylie transliteration scheme is a method for transliterating Tibetan script using only the letters available on a typical English language typewriter. It bears the name of Turrell V. Wylie, who described the scheme in an article, A Standard System of Tibetan Transcription, published in 1959. It has subsequently become a standard transliteration scheme in Tibetan studies, especially in the United States.

The Assamese script is a writing system of the Assamese language. It used to be the script of choice in the Brahmaputra valley for Sanskrit as well as other languages such as Bodo, Khasi, Mising etc. It evolved from Kamarupi script. The current form of the script has seen continuous development from the 5th-century Umachal/Nagajari-Khanikargaon rock inscriptions written in an eastern variety of the Gupta script, adopting significant traits from the Siddhaṃ script in the 7th century. By the 17th century three styles of Assamese script could be identified that converged to the standard script following typesetting required for printing. The present standard is identical to the Bengali alphabet except for two letters, ৰ (ro) and ৱ (vo).

The romanization of Arabic writes written and spoken Arabic in the Latin script in one of various systematic ways. Romanized Arabic is used for a number of different purposes, among them transcription of names and titles, cataloging Arabic language works, language education when used in lieu of or alongside the Arabic script, and representation of the language in scientific publications by linguists. These formal systems, which often make use of diacritics and non-standard Latin characters and are used in academic settings or for the benefit of non-speakers, contrast with informal means of written communication used by speakers such as the Latin-based Arabic chat alphabet.

The diaeresis and the umlaut are two homoglyphic diacritical marks that consist of two dots ( ¨ ) placed over a letter, usually a vowel. When that letter is an i or a j, the diacritic replaces the tittle: ï.

The grave accent is a diacritical mark in many written languages, including Breton, Catalan, Corsican, Dutch, Emilian-Romagnol, French, West Frisian, Greek, Haitian Creole, Italian, Ligurian, Mohawk, Occitan, Portuguese, Romansh, Sardinian, Scots Gaelic, Vietnamese, Welsh, and Yoruba.

Romanisation of Bengali is the representation of written Bengali language in the Latin script. Various romanisation systems for Bengali are used, most of which do not perfectly represent Bengali pronunciation. While different standards for romanisation have been proposed for Bengali, none has been adopted with the same degree of uniformity as Japanese or Sanskrit.

Singlish Sinhala Transliteration Scheme Computer input method

Singlish Sinhala Typewriter is a method to typewrite Sinhala text on a computer (Transliteration) using an English keyboard in English letters. In other words, it is like typewriting Sinhala in English letters but the final result comes in Sinhala letters Because it is a combination of Sinhala and English it is called Singlish, a word built from taking “Sin” from Sinhala and “glish” from English.

Chakma alphabet alphabet

The Chakma alphabet, also called Ojhapath, Ojhopath, Aaojhapath, is an abugida used for the Chakma language.

Sinhala numerals

Sinhalese belongs to the Indo-European language family with its roots deeply associated with Indo-Aryan sub family to which the languages such as Persian and Hindi belong. Although it is not very clear whether people in Sri Lanka spoke a dialect of Prakrit at the time of arrival of Buddhism in Sri Lanka, there is enough evidence that Sinhala evolved from mixing of Sanskrit, Magadi and local language which was spoken by people of Sri Lanka prior to the arrival of Vijaya in Sri Lanka, the founder of Sinhala Kingdom. It is also surmised that Sinhala had evolved from an ancient variant of Apabramsa which is known as ‘Elu’. When tracing history of Elu, it was preceded by Hela or Pali Sihala.

Bharati Braille alphabet

Bharati braille, or Bharatiya Braille, is a largely unified braille script for writing the languages of India. When India gained independence, eleven braille scripts were in use, in different parts of the country and for different languages. By 1951 a single national standard had been settled on, Bharati braille, which has since been adopted by Sri Lanka, Nepal, and Bangladesh. There are slight differences in the orthographies for Nepali in India and Nepal, and for Tamil in India and Sri Lanka. There are significant differences in Bengali Braille between India and Bangladesh, with several letters differing. Pakistan has not adopted Bharati braille, so the Urdu Braille of Pakistan is an entirely different alphabet than the Urdu Braille of India, with their commonalities largely due to their common inheritance from English or International Braille. Sinhalese Braille largely conforms to other Bharati, but differs significantly toward the end of the alphabet, and is covered in its own article.

Tamil All Character Encoding (TACE16) is a 16-bit Unicode-based character encoding scheme for Tamil language.

References

International Standard Book Number Unique numeric book identifier

The International Standard Book Number (ISBN) is a numeric commercial book identifier which is intended to be unique. Publishers purchase ISBNs from an affiliate of the International ISBN Agency.