Phonemic orthography

Last updated

A phonemic orthography is an orthography (system for writing a language) in which the graphemes (written symbols) correspond to the language's phonemes (the smallest units of speech that can differentiate words). Natural languages rarely have perfectly phonemic orthographies; a high degree of grapheme–phoneme correspondence can be expected in orthographies based on alphabetic writing systems, but they differ in how complete this correspondence is. English orthography, for example, is alphabetic but highly nonphonemic; it was once mostly phonemic during the Middle English stage, when the modern spellings originated, but spoken English changed rapidly while the orthography was much more stable, resulting in the modern nonphonemic situation. On the contrary the Albanian, Serbian/Croatian/Bosnian/Montenegrin, Romanian, Italian, Turkish, Spanish, Finnish, Czech, Latvian, Esperanto, Korean and Swahili orthographic systems come much closer to being consistent phonemic representations.

Contents

In less formal terms, a language with a highly phonemic orthography may be described as having regular spelling. Another terminology is that of deep and shallow orthographies, in which the depth of an orthography is the degree to which it diverges from being truly phonemic. The concept can also be applied to nonalphabetic writing systems like syllabaries.

Ideal phonemic orthography

In an ideal phonemic orthography, there would be a complete one-to-one correspondence (bijection) between the graphemes (letters) and the phonemes of the language, and each phoneme would invariably be represented by its corresponding grapheme. So the spelling of a word would unambiguously and transparently indicate its pronunciation, and conversely, a speaker knowing the pronunciation of a word would be able to infer its spelling without any doubt. That ideal situation is rare but exists in a few languages.

A disputed example of an ideally phonemic orthography is the Serbo-Croatian language.[ contradictory ] In its alphabet (Latin as well as Serbian Cyrillic alphabet), there are 30 graphemes, each uniquely corresponding to one of the phonemes. This seemingly perfect yet simple phonemic orthography was achieved in the 19th century—the Cyrillic alphabet first in 1814 by Serbian linguist Vuk Karadžić, and the Latin alphabet in 1830 by Croatian linguist Ljudevit Gaj. However, both Gaj's Latin alphabet and Serbian Cyrillic do not distinguish short and long vowels, and non-tonic (the short one is written), rising, and falling tones that Serbo-Croatian has. In Serbo-Croatian, the tones and vowel lengths were optionally written as (in Latin) ⟨e⟩, ⟨ē⟩, ⟨è⟩, ⟨é⟩, ⟨ȅ⟩, and ⟨ȇ⟩, especially in dictionaries.

Another such ideal phonemic orthography is native to Esperanto, employing the language creator L. L. Zamenhof's then-pronounced principle "one letter, one sound". [1]

Deviations from phonemic orthography

There are two distinct types of deviation from the phonemic ideal. In the first case, the exact one-to-one correspondence may be lost (for example, some phoneme may be represented by a digraph instead of a single letter), but the "regularity" is retained: there is still an algorithm (but a more complex one) for predicting the spelling from the pronunciation and vice versa. In the second case, true irregularity is introduced, as certain words come to be spelled and pronounced according to different rules from others, and prediction of spelling from pronunciation and vice versa is no longer possible.

Case 1: Regular

Pronunciation and spelling still correspond in a predictable way

Examples:

sch versus s-ch in Romansch

ng versus n + g in Welsh

ch versus çh in Manx Gaelic: this is a slightly different case where the same digraph is used for two different single phonemes.

ai versus in French

This is often due to the use of an alphabet that was originally used for a different language (the Latin alphabet in these examples) and so does not have single letters available for all the phonemes used in the current language (although some orthographies use devices such as diacritics to increase the number of available letters).

Case 2: Irregular

Pronunciation and spelling do not always correspond in a predictable way

Most orthographies do not reflect the changes in pronunciation known as sandhi in which pronunciation is affected by adjacent sounds in neighboring words (written Sanskrit and other Indian languages, however, reflect such changes). A language may also use different sets of symbols or different rules for distinct sets of vocabulary items such as the Japanese hiragana and katakana syllabaries (and the different treatment in English orthography of words derived from Latin and Greek).

Morphophonemic features

Alphabetic orthographies often have features that are morphophonemic rather than purely phonemic. This means that the spelling reflects to some extent the underlying morphological structure of the words, not only their pronunciation. Hence different forms of a morpheme (minimum meaningful unit of language) are often spelt identically or similarly in spite of differences in their pronunciation. That is often for historical reasons; the morphophonemic spelling reflects a previous pronunciation from before historical sound changes that caused the variation in pronunciation of a given morpheme. Such spellings can assist in the recognition of words when reading.

Some examples of morphophonemic features in orthography are described below.

Korean hangul has changed over the centuries from a highly phonemic to a largely morphophonemic orthography.[ citation needed ]Japanese kana are almost completely phonemic but have a few morphophonemic aspects, notably in the use of ぢ di and づ du (rather than じ ji and ず zu, their pronunciation in standard Tokyo dialect), when the character is a voicing of an underlying ち or つ. That is from the rendaku sound change combined with the yotsugana merger of formally different morae. The Russian orthography is also mostly morphophonemic, because it does not reflect vowel reduction, consonant assimilation and final-obstruent devoicing. Also, some consonant combinations have silent consonants.

Defective orthographies

A defective orthography is one that is not capable of representing all the phonemes or phonemic distinctions in a language. An example of such a deficiency in English orthography is the lack of distinction between the voiced and voiceless "th" phonemes ( /ð/ and /θ/ , respectively), occurring in words like this /ˈðɪs/ (voiced) and thin /ˈθɪn/ (voiceless) respectively, with both written th.

Comparison between languages

Languages whose current orthographies have a high grapheme-to-phoneme and phoneme-to-grapheme correspondence (excluding exceptions due to loan words and assimilation) include:

Many otherwise phonemic orthographies are slightly defective, see the page Defective script § Latin script. The graphemes b and v represent the same phoneme in all varieties of Spanish (except in Valencia), while in the Spanish of the Americas, /s/ can be represented by graphemes s, c, or z.

Modern Indo-Aryan languages like Hindi, Punjabi, Gujarati, Maithili and several others feature schwa deletion, where the implicit default vowel is suppressed without being explicitly marked as such. Others, like Marathi, do not have a high grapheme-to-phoneme correspondence for vowel lengths.

Bengali and Assamese, despite having a slightly shallow orthography, has a deeper orthography than its Indo-Aryan cousins as it features silent consonants at places. Moreover, due to sound mergers, the same phonemes are often represented by different graphemes. This also leads to existence of many homophones in these languages.

French, with its silent letters and its heavy use of nasal vowels and elision, may seem to lack much correspondence between spelling and pronunciation, but its rules on pronunciation, though complex, are consistent and predictable with a fair degree of accuracy. The phoneme-to-letter correspondence, on the other hand, is often low and a sequence of sounds may have multiple ways of being spelt, often with different meanings.

Orthographies such as those of German, Hungarian (mainly phonemic with the exception ly, j representing the same sound, but consonant and vowel length are not always accurate and various spellings reflect etymology, not pronunciation), Portuguese, and modern Greek (written with the Greek alphabet), as well as Korean hangul, are sometimes considered to be of intermediate depth (for example they include many morphophonemic features, as described above).

Similarly to French, it is much easier to infer the pronunciation of a German word from its spelling than vice versa. For example, for speakers who merge /eː/ and /ɛː/, the phoneme /eː/ may be spelt e, ee, eh, ä or äh.

English orthography is highly non-phonemic. The irregularity of English spelling arises partly because the Great Vowel Shift occurred after the orthography was established; partly because English has acquired a large number of loanwords at different times, retaining their original spelling at varying levels; and partly because the regularisation of the spelling (moving away from the situation in which many different spellings were acceptable for the same word) happened arbitrarily over a period without any central plan. However even English has general, albeit complex, rules that predict pronunciation from spelling, and several of these rules are successful most of the time; rules to predict spelling from the pronunciation have a higher failure rate.

Most constructed languages such as Esperanto and Lojban have mostly phonemic orthographies.

The syllabary systems of Japanese (hiragana and katakana) are examples of almost perfectly shallow orthography – exceptions include the use of ぢ and づ (discussed above) and the use of は, を, and へ to represent the sounds わ, お, and え, as relics of historical kana usage. There is also no indication of pitch accent, which results in homography of words like 箸 and 橋 (はし in hiragana), which are distinguished in speech.

Xavier Marjou [3] uses an artificial neural network to rank 17 orthographies according to their level of Orthographic depth. Among the tested orthographies, Chinese and French orthographies, followed by English and Russian, are the most opaque regarding writing (i.e. phonemes to graphemes direction) and English, followed by Dutch, is the most opaque regarding reading (i.e. graphemes to phonemes direction); Esperanto, Arabic, Finnish, Korean, Serbo-Croatian and Turkish are very shallow both to read and to write; Italian is shallow to read and very shallow to write, Breton, German, Portuguese and Spanish are shallow to read and to write.

Realignment of orthography

With time, pronunciations change and spellings become out of date, as has happened to English and French. In order to maintain a phonemic orthography such a system would need periodic updating, as has been attempted by various language regulators and proposed by other spelling reformers.

Sometimes the pronunciation of a word changes to match its spelling; this is called a spelling pronunciation. This is most common with loanwords, but occasionally occurs in the case of established native words too.

In some English personal names and place names, the relationship between the spelling of the name and its pronunciation is so distant that associations between phonemes and graphemes cannot be readily identified. Moreover, in many other words, the pronunciation has subsequently evolved from a fixed spelling, so that it has to be said that the phonemes represent the graphemes rather than vice versa. And in much technical jargon, the primary medium of communication is the written language rather than the spoken language, so the phonemes represent the graphemes, and it is unimportant how the word is pronounced. Moreover, the sounds which literate people perceive being heard in a word are significantly influenced by the actual spelling of the word. [4]

Sometimes, countries have the written language undergo a spelling reform to realign the writing with the contemporary spoken language. These can range from simple spelling changes and word forms to switching the entire writing system itself, as when Turkey switched from the Arabic alphabet to a Turkish alphabet of Latin origin.

Phonetic transcription

Methods for phonetic transcription such as the International Phonetic Alphabet (IPA) aim to describe pronunciation in a standard form. They are often used to solve ambiguities in the spelling of written language. They may also be used to write languages with no previous written form. Systems like IPA can be used for phonemic representation or for showing more detailed phonetic information (see Narrow vs. broad transcription).

Phonemic orthographies are different from phonetic transcription; whereas in a phonemic orthography, allophones will usually be represented by the same grapheme, a purely phonetic script would demand that phonetically distinct allophones be distinguished. To take an example from American English: the /t/ sound in the words "table" and "cat" would, in a phonemic orthography, be written with the same character; however, a strictly phonetic script would make a distinction between the aspirated "t" in "table", the flap in "butter", the unaspirated "t" in "stop" and the glottalized "t" in "cat" (not all these allophones exist in all English dialects). In other words, the sound that most English speakers think of as /t/ is really a group of sounds, all pronounced slightly differently depending on where they occur in a word. A perfect phonemic orthography has one letter per group of sounds (phoneme), with different letters only where the sounds distinguish words (so "bed" is spelled differently from "bet").

A narrow phonetic transcription represents phones, the sounds humans are capable of producing, many of which will often be grouped together as a single phoneme in any given natural language, though the groupings vary across languages. English, for example, does not distinguish between aspirated and unaspirated consonants, but other languages, like Korean, Bengali and Hindi do.

The sounds of speech of all languages of the world can be written by a rather small universal phonetic alphabet. A standard for this is the International Phonetic Alphabet.

See also

Related Research Articles

English orthography is the writing system used to represent spoken English, allowing readers to connect the graphemes to sound and to meaning. It includes English's norms of spelling, hyphenation, capitalisation, word breaks, emphasis, and punctuation.

Morphophonology is the branch of linguistics that studies the interaction between morphological and phonological or phonetic processes. Its chief focus is the sound changes that take place in morphemes when they combine to form words.

An orthography is a set of conventions for writing a language, including norms of spelling, hyphenation, capitalization, word boundaries, emphasis, and punctuation.

In phonology and linguistics, a phoneme is a set of phones that can distinguish one word from another in a particular language.

Transliteration is a type of conversion of a text from one script to another that involves swapping letters in predictable ways, such as Greek ⟨α⟩⟨a⟩, Cyrillic ⟨д⟩⟨d⟩, Greek ⟨χ⟩ → the digraph ⟨ch⟩, Armenian ⟨ն⟩⟨n⟩ or Latin ⟨æ⟩⟨ae⟩.

<span class="mw-page-title-main">Sinhala script</span> Abugida writing system

The Sinhala script, also known as Sinhalese script, is a writing system used by the Sinhalese people and most Sri Lankans in Sri Lanka and elsewhere to write the Sinhala language as well as the liturgical languages Pali and Sanskrit. The Sinhalese Akṣara Mālāva, one of the Brahmic scripts, is a descendant of the Ancient Indian Brahmi script. It is also related to the Grantha script.

<span class="mw-page-title-main">Schwa</span> Vowel sound

In linguistics, specifically phonetics and phonology, schwa is a vowel sound denoted by the IPA symbol ə, placed in the central position of the vowel chart. In English and some other languages, it usually represents the mid central vowel sound, produced when the lips, tongue, and jaw are completely relaxed, such as the vowel sound of the a in the English word about.

A spelling reform is a deliberate, often authoritatively sanctioned or mandated change to spelling rules. Proposals for such reform are fairly common, and over the years, many languages have undergone such reforms. Recent high-profile examples are the German orthography reform of 1996 and the on-off Portuguese spelling reform of 1990, which is still being ratified.

<span class="mw-page-title-main">Digraph (orthography)</span> Pair of characters used to write one phoneme

A digraph or digram is a pair of characters used in the orthography of a language to write either a single phoneme, or a sequence of phonemes that does not correspond to the normal values of the two characters combined.

Palauan is a Malayo-Polynesian language native to the Republic of Palau, where it is one of the two official languages, alongside English. It is widely used in day-to-day life in the country. Palauan is not closely related to other Malayo-Polynesian languages and its exact classification within the branch is unclear.

<span class="mw-page-title-main">Marshallese language</span> Micronesian language of the Marshall Islands

Marshallese, also known as Ebon, is a Micronesian language spoken in the Marshall Islands. The language of the Marshallese people, it is spoken by nearly all of the country's population of 59,000, making it the principal language. There are also roughly 27,000 Marshallese citizens residing in the United States, nearly all of whom speak Marshallese, as well as residents in other countries such as Nauru and Kiribati.

<span class="mw-page-title-main">Russian orthography</span>

Russian orthography is an orthographic tradition formally considered to encompass spelling and punctuation. Russian spelling, which is mostly phonemic in practice, is a mix of morphological and phonetic principles, with a few etymological or historic forms, and occasional grammatical differentiation. The punctuation, originally based on Byzantine Greek, was in the seventeenth and eighteenth centuries reformulated on the models of French and German orthography.

A defective script is a writing system that does not represent all the phonemic distinctions of a language. This means that the concept is always relative to a given language. Taking the Latin alphabet used in Italian orthography as an example, the Italian language has seven vowels, but the alphabet has only five vowel letters to represent them; in general, the difference between the phonemes close and open is simply ignored, though stress marks, if used, may distinguish them. Among the Italian consonants, both and are written s, and both and are written z; stress and hiatus are also not reliably distinguished.

Polish orthography is the system of writing the Polish language. The language is written using the Polish alphabet, which derives from the Latin alphabet, but includes some additional letters with diacritics. The orthography is mostly phonetic, or rather phonemic—the written letters correspond in a consistent manner to the sounds, or rather the phonemes, of spoken Polish. For detailed information about the system of phonemes, see Polish phonology.

A pronunciation respelling for English is a notation used to convey the pronunciation of words in the English language, which do not have a phonemic orthography.

Czech orthography is a system of rules for proper formal writing (orthography) in Czech. The earliest form of separate Latin script specifically designed to suit Czech was devised by Czech theologian and church reformist Jan Hus, the namesake of the Hussite movement, in one of his seminal works, De orthographia bohemica.

<span class="mw-page-title-main">Old Georgian</span> 5th–11th-century literary language of Georgian monarchies

Old Georgian was a literary language of the Georgian monarchies attested from the 5th century. The language remains in use as the liturgical language of the Georgian Orthodox Church and for the most part is still intelligible. Spoken Old Georgian gave way to what is classified as Middle Georgian in the 11th century, which in turn developed into the modern Georgian language in the 18th century.

<span class="mw-page-title-main">Bengali alphabet</span> Abugida script used in writing Bengali

The Bengali script or Bangla alphabet is the alphabet used to write the Bengali language based on the Bengali-Assamese script, and has historically been used to write Sanskrit within Bengal. It is one of the most widely adopted writing systems in the world . It is one of the official scripts of the Indian Republic. It is used as the official script of the Bengali language in Bangladesh, West Bengal, Tripura and Barak valley of Assam as well as the Meitei language in Manipur, two of the official languages of India.

The orthographic depth of an alphabetic orthography indicates the degree to which a written language deviates from simple one-to-one letter–phoneme correspondence. It depends on how easy it is to predict the pronunciation of a word based on its spelling: shallow orthographies are easy to pronounce based on the written word, and deep orthographies are difficult to pronounce based on how they are written.

<span class="mw-page-title-main">Swedish orthography</span>

Swedish orthography is the set of rules and conventions used for writing Swedish. The primary authority on Swedish orthography is Svenska Akademiens ordlista (SAOL), a spelling dictionary published by the Swedish Academy. The balance between describing the language and creating norms has changed with the years.

References

  1. "Bazaj elparolaj reguloj — PMEG". bertilow.com.
  2. Hualde, José Ignacio (2005). The Sounds of Spanish. Cambridge University Press. p. 103, 146. ISBN   0-521-54538-2.
  3. Marjou, Xavier (June 2021). "OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network". Proceedings of the Third Workshop on Computational Typology and Multilingual NLP: 1–9. arXiv: 1912.13321 . doi:10.18653/v1/2021.sigtyp-1.1. S2CID   209515879.
  4. David Stark. "Standardised Spelling - Pronunciation 1". The English Spelling Society. Archived from the original on 7 March 2014.