Estonian orthography

Last updated

Estonian orthography is the system used for writing the Estonian language and is based on the Latin alphabet. The Estonian orthography is generally guided by phonemic principles, with each grapheme corresponding to one phoneme.

Contents

Alphabet

Due to German and Swedish influence, the Estonian alphabet (Estonian : eesti tähestik) has the letters Ä, Ö, and Ü (A, O, and U with diaeresis), which represent the vowel sounds [ æ ], [ ø ] and [ y ], respectively. Unlike German umlauts, they are considered, and alphabetised as, separate letters. The most distinctive letter in the Estonian alphabet, however, is the Õ (O with tilde), which was added to the alphabet in the 19th century by Otto Wilhelm Masing and stands for the vowel [ ɤ ]. In addition, the alphabet also differs from the Latin alphabet by the addition of the letters Š and Ž (S and Z with caron/háček), and by the position of Z in the alphabet: it has been moved from the end to between S and T (or Š and Ž).

The official Estonian alphabet has 27 letters: A, B, D, E, F, G, H, I, J, K, L, M, N, O, P, R, S, Š, Z, Ž, T, U, V, Õ, Ä, Ö, Ü. The letters F, Š, Z, Ž are so-called "foreign letters" (võõrtähed), and occur only in loanwords and foreign proper names. Occasionally, the alphabet is recited without them, and thus has only 23 letters: A, B, D, E, G, H, I, J, K, L, M, N, O, P, R, S, T, U, V, Õ, Ä, Ö, Ü.

Additionally C, Q, W, X, and Y are used in writing foreign proper names. They do not occur in Estonian words, and are not officially part of the alphabet. Including all the foreign letters, the alphabet consists of the following 32 letters:

LetterIPANameNotesLetterIPANameNotes
Aa[ɑ]aa[ɑːː]Qqkuu[kuːː] [lower-alpha 1]
Bb[b]bee[beːː]Rr[r]err[erːː] or ärr[ærːː]
Cctsee[tseːː] [lower-alpha 1] Ss[s]ess[esːː]
Dd[d]dee[deːː]Šš[ʃ] or [ʃː]šaa[ʃɑːː] [lower-alpha 2]
Ee[e]ee[eːː]Zz[s]zett[setːː] [lower-alpha 2]
Ff[f] or [fː]eff[efːː] [lower-alpha 2] Žž[ʒ]žee[ʒeːː] [lower-alpha 2]
Gg[ɡ]gee[ɡeːː]Tt[t] or [tː]tee[teːː]
Hh[h]haa[hɑːː] or ašš[ɑʃːː]Uu[u]uu[uːː]
Ii[i]ii[iːː]Vv[v]vee[veːː]
Jj[j]jott[jotʲːː]Wwkaksisvee[kɑk.sisˈveːː] [lower-alpha 1]
Kk[k] or [kː]kaa[kɑːː]Õõ[ɤ]õõ[ɤːː]
Ll[l]ell[elːː]Ää[æ]ää[æːː]
Mm[m]emm[emːː]Öö[ø]öö[øːː]
Nn[n]enn[enːː]Üü[y]üü[yːː]
Oo[o]oo[oːː]Xxiks[iksː] [lower-alpha 1]
Pp[p] or [pː]pee[peːː]Yyigrek[ˈiɡ.rek] or üpsilon[ˈyp.si.lon] [lower-alpha 1]
  1. 1 2 3 4 5 Not officially part of the alphabet; only used in foreign proper names and citations, pronounced according to their source language. Occasionally, w is used instead of v in Estonian surnames (e. g. Wõrk), as a remnant of older spelling.
  2. 1 2 3 4 Only used in loanwords.

In Blackletter script W was used instead of V. In some reference works (e. g. Estonian Soviet Encyclopedia), V and W were sorted as if they were one and the same letter.

Johannes Aavik suggested that the letter Ü be replaced by Y, as it has been in the Finnish alphabet.

Double letters are used to write half-long and overlong vowels and consonants, e. g. aa[ɑː] or [ɑːː], nn[nː] or [nːː], kk[kːː]. For more information, see below.

As the distinction between voiced and voiceless plosives is not native to Estonian, the names of the letters 'b', 'd', 'g' may be pronounced [peːː], [teːː], [keːː], so the letters 'b' and 'd' are also named nõrk B (weak B) and nõrk D (weak D) to distinguish them from tugev P (strong P) and tugev T (strong T). About usage of these letters, see below.

Orthographic principles

Although the Estonian orthography is generally guided by phonemic principles, with each grapheme corresponding to one phoneme, there are some historical and morphological deviations from this: for example the initial letter 'h' in words, preservation of the morpheme in declension of the word (writing b, g, d in places where p, k, t is pronounced) and in the use of 'i' and 'j'. Where it is impractical or impossible to type š and ž, they are substituted with sh and zh in some written texts, although this is considered incorrect. Otherwise, the h in sh represents a voiceless glottal fricative, as in Pasha (pas-ha); this also applies to some foreign names.

Some features of the modern Estonian orthography are:

Syllabification

One consonant between two vowels belongs to the following syllable: kala 'fish' is syllabified ka-la. Consonant combinations are syllabified before the last consonant: linna 'town [gensg]' is syllabified lin-na, tutvus 'acquaintance' is syllabified tut-vus. Consonant digraphs and trigraphs in foreign names are regarded as single consonants: Manchester is syllabified Man-ches-ter. Two vowels usually form a long vowel or a diphthong, e. g. laulu 'song is syllabified lau-lu. However, a hiatus is formed in morpheme bounds, e. g. avaus 'opening' is syllabified a-va-us as the word is composed from the root ava- and the suffix -us. Combinations of three vowel letters represent a hiatus of a long vowel or a diphthong and another vowel, e. g. põuane 'dry, droughty, arid (lacking rain)' is syllabified põu-a-ne; but some loanwords have a hiatus of a short vowel followed by a long vowel: oaas 'oasis' is syllabified o-aas. Compound words are syllabified as combinations of their parts: vanaema 'grandmother' is syllabified as va-na-e-ma as the word is composed from vana 'old' and ema 'mother'. Etymologically compound loanwords and foreign names may be syllabified as compound or simple words: fotograaf 'photographer' is syllabified fo-to-graaf or fo-tog-raaf, Petrograd is syllabified Pet-ro-grad or Pet-rog-rad.

These syllabification rules are used for hyphenating words at the end of line, with the additional rule that a single letter is not left on a line.

Foreign words

Loanwords are normally adapted to Estonian spelling: veeb 'web', džäss 'jazz'. However, foreign words and phrases sometimes may be used in the original spelling, such as Latin phrases, Italian musical terms, exotic words. Such citations are typographically emphasized using italics and declined using apostrophe: croissant'id 'croissants'.

Foreign proper names from Latin-script languages are written in their original spelling: Margaret Thatcher, Bordeaux . Names from non-Latin-script languages are written using either Estonian orthographic transcription or established romanization systems. Some geographical names (and some names of historical personalities, such as monarchs) have traditional Estonian forms (including some adapted spellings such as Viin for German Wien 'Vienna').

Derivations from foreign proper names with the suffixes -lik, -lane, -lus, -ism, -ist usually conserve the spelling of names (e. g. thatcherism, bordeaux'lane), but a few are adapted by established tradition: marksism, darvinism, luterlus. Derivations without suffixes or with other suffixes are adapted to Estonian spelling: njuuton 'newton' (physical unit), haimoriit 'maxillary sinusitis' (inflammation of antrum of Highmore), üterbium 'ytterbium', šeikspiroloog 'Shakespearologist', etc.

Expressions such as Celsiuse kraad 'degree Celsius', Cheddari juust 'Cheddar cheese' conserve the spelling of proper names (adding case endings). However, names of plants and animals are usually written in adapted forms, e.g. koloraado mardikas 'Colorado beetle'.

Apostrophe is used when adding case endings to proper names with unusual grapheme-to-phoneme correspondences (such as ending on a consonant orthographically but on a vowel phonetically or vice versa), e.g. Provence'i (genitive of Provence ).

Capitalization

Capital letters are written at the beginning of the first word in a sentence, proper names, and official names functioned as proper names. May be used in the pronouns Sina 'you (singular)' and Teie 'you (plural, also used as formal singular)' to show respect.

Names of months, days of the week, holidays, Chinese zodiac years, and titles of people such as professor are not capitalized.

Titles of books, films, etc. are written in quotation marks with only the first word and proper names capitalized.

Compound words

Compound words are written as one word, but they are often composed of genitive+nominative and hard to distinguish from simple word combinations. A compound word is considered a single word and written together when: 1)it has a separate meaning, e. g. peatükk 'chapter' but pea tükk 'part of a head'; 2)it is different from the genitive+nominative combination, e. g. vesiveski (nominative+nominative) 'watermill'; 3)some combinations may be together or separately, but writing them together is preferred in more complex word phrases: erakonna liige 'member of a party' — iga erakonnaliige 'every member of the party'. Rare and long word combinations are typically written separately.

The hyphen is used: 1)in compounds where one of the parts is a letter (C-vitamiin 'vitamin C'), an initialism (teksti-TV 'text TV'), a foreign citation (nalja-show 'joke show') or a word part (kuni-sõna 'word containing kuni'); 2)in compound adjectives where the first part as a proper name; 3)in compound geographical names such as Lõuna-Eesti 'South Estonia'; 4)as a suspended hyphen, e. g. kuld- ja hõbeesemed 'gold and silver things' (also in compound words such as ekspordi-impordipank 'export-import bank'); 5)in "nominative+ablative" adverbs, e. g. päev-päevalt 'day after day'; 6)in dvandva compounds, e. g. isa-ema 'father and mother'; 7)in compound adjectives from word phrases, e. g. katselis-foneetiline 'related to tentative phonetics'; 8)in compound adjectives with coordinating meaning, e. g. eesti-inglise sõnaraamat 'Estonian-English dictionary'; 9)in double names such as Ulla-Liisa. It can be optionally used in unusual compounds such as karusmarja-jahukaste 'gooseberry disease'; in compounds with three or four identical letters in a row (e. g. iga-aastane 'yearly', luu-uure 'bone groove'); in compounds with numbers (see below) or with signs (e.g. +-märk '+ sign'); in the construction 'genitive of a proper name + nominative' after another genitive (e. g. Venemaa Euroopa-osa 'European part of Russia'); in the colloquial construction 'genitive of a proper name + noun' instead of 'noun + proper name', e. g. Kuuse-onu instead of onu Kuusk 'Uncle Kuusk'; in ad hoc compounds such as aega-küll-meeleolu; in words from two-or-more-component proper names, e. g. françois-villon'lik, buenos-aireslane.

Abbreviations

The abbreviation period (full stop) may be used, but it is not mandatory. Commonly used abbreviations are usually written without the abbreviation period: t, tn, or tän for tänav 'street'; vt for vaata 'see'; jpt for ja paljud teised 'and many others'. Using the abbreviation period is recommended when an abbreviation may be misread as another word: joon. for joonis 'figure, draft' but joon 'line'. If an abbreviation of a word phrase may be mistaken for a word or for another abbreviation, periods are used after every letter but the last one, and spaces are not used: e.m.a for enne meie ajaarvamist but ema 'mother', m.a.j for meie ajaarvamise järgi but maj for majandus 'economy'.

The hyphen is used in some abbreviations of compound words, e. g. ped-dr for pedagoogikadoktor 'doctor of pedagogy', kpt-ltn for kaptenleitnant 'capitan lieutenant', especially in the construction abbreviation + complete word, such as rb-paneelid for raudbetoonpaneelid 'reinforced concrete panels'.

Numerals

Numerals may be written in words (üks 'one', kaks 'two', kolm 'three'...) or in figures (1, 2, 3, ...). In Estonian texts, the comma is used as the decimal separator, and the space is used as thousands separator (in financial documents, the point can be used as thousands separator to avoid inserting an extra digit). The point as a separator is used for dates, daytime, prices, and sports results in meters and centimeters. For prices in euros and cents, writing € 84.95 as well as 84,95 € is accepted. Daytime in hours and minutes (24-hour format) may be written using the point or the colon (without spaces): 16.15 or 16:15; but seconds are separated by the point: 16:15.25. The colon with spaces is used for ratios. e. g. 2 : 3.

When written in words, numerals with -teist or -teistkümment (11 to 19), -kümmend (tens) and -sada (hundreds) are written together, e. g. viisteist(kümment) 'fifteen', viiskümmend 'fifty', viissada 'five hundred'. Other compound numerals are written separately: kakskümmend viis 'twenty-five'.

For writing ordinal numbers in figures, the ordinal dot is used: 16. for kuueteistkümnes 'the sixteenth'. In some cases, ordinals are written as Roman numerals (without the ordinal dot). Roman numerals followed by a dot may be used in numbered lists.

Case forms of cardinal and ordinal numerals may be written in the form "figures+case ending" with or without a hyphen: 16s or 16-s for kuueteistkümnes 'sixteen [inessive]', 16ndas or 16-ndas for kuueteistkümnendas 'the sixteenth [inessive]'. For case endings beginning with the letter l, the hyphen is mandatory to avoid confusion with the digit 1: 16-le for kuueteistkümnele 'sixteen [allative]'. Case endings after figures are not used when a cardinal or ordinal numeral is in a case concordance with a following noun. Likewise, compound words with numbers written in figures may be written with or without the hyphen: 60vatine lamp or 60-vatine lamp for kuuekümnevatine lamp '60-watt light'.

Punctuation

The period (full stop) is used at the end of sentences, as an ordinal mark and sometimes as an abbreviation mark and as a number separator (see above).

The comma is used for appositions (but appositions in genitive require the comma only before them), for more than one attribute after a determined word, for enumerations (but the serial comma is not used), between coordinated or subordinated clauses, between direct speech and author's words, before and after parenthetic or vocative phrases, and before and after some other constructions. It is also used between placenames and dates in the nominative case (but not in locative cases); between a surname and a given name, if they are written in this order; before parts of and address; and as a decimal mark.

The colon is used before lists, before direct speech, before explanations, and also in writing daytime and ratios (see above).

The semicolon is used between weakly related parts of sentences, especially containing commas.

The hyphen is used for writing compound words (see above). It is also used for hyphenating words at the end of line, for declining letters and abbreviations, and optionally for declining acronyms/initialisms, numbers, and symbols.

The dash is used when there appears a generalizing word after an enumeration; instead of the comma for accenting clauses and appositions or for relatively long parenthetical constructions; before words indicating surprise; for slight pauses (interchangeably with the ellipsis); in the meaning "from...to" (instead of the word kuni); for indicating lines or routes (when in attributive function, the hyphen is also accepted); between coordinated attributes if at least one attribute has a hyphen or a space; between remarks of a dialogue written as one line without author's words; as a marker before enumeration items. The dash is not used to indicate omission of a word that would be repeated.

The exclamation and question marks are used at the end of exclamative and interrogative sentences. Occasionally, they may be parenthesized and written after words within sentences to show doubt or surprise. The exclamation mark is also used for addressing people in letters, e. g. Austatud professor Pirk!. Using the comma or the colon in this case is considered inappropriate.

The quotation marks, written as „ ”, are used for direct speech, citations, scare quotes, and names of books, documents, episodes, enterprises, etc. Names of plant sorts may be written in double or in single quotation marks (looking like apostrophes: ’ ’) and are normally italicized. For cited words and phrases, including words in a linguistic context, quotation marks or italics may be used. Quotation marks are not used in the names of institutions, periodicals, awards, wares, and vehicles.

The apostrophe is used for adding case endings and suffixes to foreign names with unusual grapheme-to-phoneme correspondences and to foreign citations in the original spelling (see above). Sometimes the apostrophe is used for adding case endings and suffixes to Estonian names, to make the original form clear: Metsa’le (allative of the surname Metsa), mutt’lik (the apostrophe is used to conserve the spelling of the surname Mutt, otherwise the double consonant would become a single consonant). Also, the apostrophe is sometimes used in poetry to indicate omission of a sound: õitsel', mull', sull' instead of õitsele, mulle, sulle are found in Lydia Koidula's poems. Single quotation marks (’ ’) are used for word meanings in a linguistic context.

The parentheses are used for parenthetical words or sentences, and also for optional parts of words in a linguistic context.

The square brackets are used for citer's notes to citations and for showing pronunciation in linguistic and reference works.

The slash is used for division in fractions and unit symbols, for connecting alternatives, to show line breaks when citing poetry in the single-line format, and for non-calendar years. In practice, it occasionally appears in abbreviations made of more than one word (e. g. õ/a for õppiaasta 'school year'), but this usage is considered nonstandard (correct abbreviation: õa). Spaces are used before and after the slash only if it separates text fragments of more than one word.

The ellipsis is used for slight pauses and for unfinished thoughts. It is surrounded by spaces. Also, the ellipsis is used for bowdlerizing obscene words.

History

Modern Estonian orthography is based on the Newer Orthography created by Eduard Ahrens in the second half of the 19th century based on Finnish orthography. The Older Orthography it replaced was created in the 17th century by Bengt Gottfried Forselius and Johann Hornung based on standard German orthography. In the old orthography, single consonants following short vowels were written double even if they are short (kala 'fish' was written as kalla) and long vowels in an open syllable were written single (looma 'to create' was written as loma). Before Otto Wilhelm Masing introduced the letter õ in the early 19th century, its sound had not been distinguished in writing from ö. Earlier writing in Estonian had by and large used an ad hoc orthography based on Latin and Middle Low German orthography. Some influences of the standard German orthography — for example, writing 'W'/'w' instead of 'V'/'v' persisted well into the 1930s.

In Fraktur typesetting (which was common in Estonian publications before the first half of the 20th century), two kinds of the small letter s were distinguished: the short s and the long ſ. The long ſ was used at the beginning and in the middle of syllables, and the short s was used at the end of syllables. For example: kaſs 'cat' — kasſi 'cat [gen. sg., part. sg.]'.

Estonian words and names quoted in international publications from Soviet sources were often back-transliterations from the Russian transliteration. Examples are the use of я ("ya") for ä (e.g. Pyarnu (Пярну) for Pärnu), ы ("y") for õ (e.g., Pylva (Пылва) for Põlva) and ю ("yu") for ü (e.g., Pyussi (Пюсси) for Püssi). Even in the Encyclopædia Britannica one can find "ostrov Khiuma", where "ostrov" means "island" in Russian and "Khiuma" is back-transliteration from Russian instead of "Hiiumaa" (Hiiumaa > Хийума(а) > Khiuma).

See also

Related Research Articles

<span class="mw-page-title-main">Diacritic</span> Modifier mark added to a letter

A diacritic is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek διακριτικός, from διακρίνω. The word diacritic is a noun, though it is sometimes used in an attributive sense, whereas diacritical is only an adjective. Some diacritics, such as the acute ⟨á⟩, grave ⟨à⟩, and circumflex ⟨â⟩, are often called accents. Diacritics may appear above or below a letter or in some other position such as within the letter or between two letters.

Esperanto is written in a Latin-script alphabet of twenty-eight letters, with upper and lower case. This is supplemented by punctuation marks and by various logograms, such as the digits 0–9, currency signs such as $ € ¥ £ ₷, and mathematical symbols. The creator of Esperanto, L. L. Zamenhof, declared a principle of "one letter, one sound", though this is a general rather than strict guideline.

Finnish orthography is based on the Latin script, and uses an alphabet derived from the Swedish alphabet, officially comprising twenty-nine letters but also including two additional letters found in some loanwords. The Finnish orthography strives to represent all morphemes phonologically and, roughly speaking, the sound value of each letter tends to correspond with its value in the International Phonetic Alphabet (IPA) – although some discrepancies do exist.

<span class="mw-page-title-main">English alphabet</span> Latin-script alphabet consisting of 26 letters

Modern English is written with a Latin-script alphabet consisting of 26 letters, with each having both uppercase and lowercase forms. The word alphabet is a compound of alpha and beta, the names of the first two letters in the Greek alphabet. Old English was first written down using the Latin alphabet during the 7th century. During the centuries that followed, various letters entered or fell out of use. By the 16th century, the present set of 26 letters had largely stabilised:

German orthography is the orthography used in writing the German language, which is largely phonemic. However, it shows many instances of spellings that are historic or analogous to other spellings rather than phonemic. The pronunciation of almost every word can be derived from its spelling once the spelling rules are known, but the opposite is not generally the case.

The Catalan and Valencian orthographies encompass the spelling and punctuation of standard Catalan and Valencian. There are also several adapted variants to the peculiarities of local dialects of Insular Catalan.

The Azerbaijani alphabet has three versions which includes the Arabic, Latin, and Cyrillic alphabets.

<span class="mw-page-title-main">Northern Sámi</span> Most widely spoken of all Sámi languages

Northern or North Sámi is the most widely spoken of all Sámi languages. The area where Northern Sámi is spoken covers the northern parts of Norway, Sweden and Finland. The number of Northern Sámi speakers is estimated to be somewhere between 15,000 and 25,000. About 2,000 of these live in Finland and between 5,000 and 6,000 in Sweden, with the remaining portions being in Norway.

Irish orthography is the set of conventions used to write Irish. A spelling reform in the mid-20th century led to An Caighdeán Oifigiúil, the modern standard written form used by the Government of Ireland, which regulates both spelling and grammar. The reform removed inter-dialectal silent letters, simplified some letter sequences, and modernised archaic spellings to reflect modern pronunciation, but it also removed letters pronounced in some dialects but not in others.

Dutch orthography uses the Latin alphabet. The spelling system is issued by government decree and is compulsory for all government documentation and educational establishments.

French orthography encompasses the spelling and punctuation of the French language. It is based on a combination of phonemic and historical principles. The spelling of words is largely based on the pronunciation of Old French c. 1100–1200 AD, and has stayed more or less the same since then, despite enormous changes to the pronunciation of the language in the intervening years. Even in the late 17th century, with the publication of the first French dictionary by the Académie française, there were attempts to reform French orthography.

<span class="mw-page-title-main">Spanish orthography</span> System for writing in Spanish

Spanish orthography is the orthography used in the Spanish language. The alphabet uses the Latin script. The spelling is fairly phonemic, especially in comparison to more opaque orthographies like English, having a relatively consistent mapping of graphemes to phonemes; in other words, the pronunciation of a given Spanish-language word can largely be predicted from its spelling and to a slightly lesser extent vice versa. Spanish punctuation uniquely includes the use of inverted question and exclamation marks: ⟨¿⟩⟨¡⟩.

Polish orthography is the system of writing the Polish language. The language is written using the Polish alphabet, which derives from the Latin alphabet, but includes some additional letters with diacritics. The orthography is mostly phonetic, or rather phonemic—the written letters correspond in a consistent manner to the sounds, or rather the phonemes, of spoken Polish. For detailed information about the system of phonemes, see Polish phonology.

The orthography of the Greek language ultimately has its roots in the adoption of the Greek alphabet in the 9th century BC. Some time prior to that, one early form of Greek, Mycenaean, was written in Linear B, although there was a lapse of several centuries between the time Mycenaean stopped being written and the time when the Greek alphabet came into use.

<span class="mw-page-title-main">Portuguese orthography</span> Alphabet and spelling

Portuguese orthography is based on the Latin alphabet and makes use of the acute accent, the circumflex accent, the grave accent, the tilde, and the cedilla to denote stress, vowel height, nasalization, and other sound changes. The diaeresis was abolished by the last Orthography Agreement. Accented letters and digraphs are not counted as separate characters for collation purposes.

Hungarian orthography consists of rules defining the standard written form of the Hungarian language. It includes the spelling of lexical words, proper nouns and foreign words (loanwords) in themselves, with suffixes, and in compounds, as well as the hyphenation of words, punctuation, abbreviations, collation, and other information.

Czech orthography is a system of rules for proper formal writing (orthography) in Czech. The earliest form of separate Latin script specifically designed to suit Czech was devised by Czech theologian and church reformist Jan Hus, the namesake of the Hussite movement, in one of his seminal works, De orthographia bohemica.

<span class="mw-page-title-main">English Braille</span> Tactile writing system for English

English Braille, also known as Grade 2 Braille, is the braille alphabet used for English. It consists of around 250 letters (phonograms), numerals, punctuation, formatting marks, contractions, and abbreviations (logograms). Some English Braille letters, such as ⟨ch⟩, correspond to more than one letter in print.

The 1943 Portuguese Orthographic Form, approved on August 12, 1943, is a set of instructions established by the Brazilian Academy of Letters for the subsequent creation of the Vocabulário Ortográfico da Língua Portuguesa in the same year. This document, along with the modifications made by Law 5,765 of December 18, 1971, regulates the spelling of Portuguese in Brazil. It was also incorporated and modified by the Orthographic Agreement of 1990.

References