Czech orthography

Last updated
Czech alphabet
Česká abeceda
Script type
Time period
Since Jan Hus' Orthographia bohemica (early 15th century – present)
Languages Czech
Related scripts
Parent systems
Child systems
Slovak alphabet
Gaj's Latin alphabet
Unicode
Subset of Latin
 This article contains phonetic transcriptions in the International Phonetic Alphabet (IPA).For an introductory guide on IPA symbols, see Help:IPA.For the distinction between [ ], / / and  , see IPA § Brackets and transcription delimiters.

Czech orthography is a system of rules for proper formal writing (orthography) in Czech. The earliest form of separate Latin script specifically designed to suit Czech was devised by Czech theologian and church reformist Jan Hus, the namesake of the Hussite movement, in one of his seminal works, De orthographia bohemica (On Bohemian orthography).

Contents

The modern Czech orthographic system is diacritic, having evolved from an earlier system which used many digraphs (although one digraph has been kept - ch). The caron is added to standard Latin letters to express sounds which are foreign to Latin. The acute accent is used for long vowels.

The Czech orthography is considered the model for many other Balto-Slavic languages using the Latin alphabet; Slovak orthography being its direct revised descendant, while the Serbo-Croatian Gaj's Latin alphabet and its Slovene descendant system are largely based on it. All of them make use of similar diacritics and also have a similar, usually interchangeable, relationship between the letters and the sounds they are meant to represent. [1]

Alphabet

The Czech alphabet consists of 42 letters.

Czech alphabet
Majuscule forms (uppercase/capital letters)
AÁBCČDĎEÉĚF GHChIÍJKLMN
ŇOÓPQRŘSŠTŤ UÚŮVWXYÝZŽ
Minuscule forms (lowercase/small letters)
aábcčdďeéěf ghchiíjklmn
ňoópqrřsštť uúůvwxyýzž
Czech alphabet (detail)
LetterName LetterName
UppercaseLowercaseUppercaseLowercase
Aaá Ňň
Áádlouhé á; á s čárkou Ooó
BbÓó [lower-alpha 1] dlouhé ó; ó s čárkou
CcPp
Čččé Qqkvé
DdRrer
Ďďďé Řř
Eeé Sses
Éédlouhé é; é s čárkou Šš
Ě [lower-alpha 2] ěije; é s háčkem Tt
Ff [lower-alpha 1] ef Ťťťé
Gg [lower-alpha 1] Uuú
HhÚúdlouhé ú; ú s čárkou
Chchchá Ů [lower-alpha 2] ůů s kroužkem
Iií; měkké i Vv
Íídlouhé í; dlouhé měkké í;
í s čárkou; měkké í s čárkou
Wwdvojité vé
JjXxiks
KkYyypsilon; krátké tvrdé ý
Llel Ý [lower-alpha 2] ýdlouhé ypsilon; dlouhé tvrdé ý;
ypsilon s čárkou; tvrdé ý s čárkou
Mmem Zzzet
Nnen Žžžet
  1. 1 2 3 The letters F, G, and Ó represent the sounds /f/, /ɡ/, and /oː/, respectively, which, when not allophones of /v/ and /k/ in the case of the first two, are used almost exclusively in words and names of foreign origin. With the increasing usage of foreign loanwords and foreign terms, they appear fairly commonly in modern Czech.
  2. 1 2 3 The letters Ě, Ů, and Ý never occur at the beginning of a word. Their capitalized forms are used only in all caps or small caps inscriptions, such as newspaper headlines.

The letters Q, W, and X are used exclusively in foreign words, and the former two are respectively replaced with KV and V once the word becomes "naturalized" (assimilated into Czech); the digraphs dz and are also used mostly for foreign words and are not considered to be distinct letters in the Czech alphabet.

Orthographic principles

Czech orthography is primarily phonemic (rather than phonetic) because an individual grapheme usually corresponds to an individual phoneme (rather than a sound). However, some graphemes and letter groups are remnants of historical phonemes which were used in the past but have since merged with other phonemes. Some changes in the phonology have not been reflected in the orthography.

Vowels
Grapheme IPA value Notes
a /a/
á //
e /ɛ/
é /ɛː/
ě /ɛ/, /ʲɛ/Marks palatalization of preceding consonant; see usage rules below
i /ɪ/Palatalizes preceding d, t, or n; see usage rules below
í //Palatalizes preceding d, t, or n; see usage rules below
o /o/
ó //Occurs mostly in words of foreign origin.
u /u/
ú //See usage rules below
ů //See usage rules below
y /ɪ/See usage rules below
ý //See usage rules below
Consonants
Grapheme IPA value Notes
b /b/
c /t͡s/ [n 1]
č /t͡ʃ/ [n 1]
d /d/Represents /ɟ/ before i í ě; see below
ď /ɟ/
f /f/Occurs mostly in words of foreign origin.
g /ɡ/Occurs mostly in words of foreign origin. [ citation needed ]
h /ɦ/
ch /x/
j /j/
k /k/
l /l/
m /m/
n /n/Represents /ɲ/ before i í ě; see below
ň /ɲ/
p /p/
r /r/
ř // [n 2]
s /s/
š /ʃ/
t /t/Represents /c/ before i í ě; see below
ť /c/
v /v/
x /ks/,/ɡz/Occurs only in words of foreign origin; pronounced /ɡz/ in words with the prefix 'ex-' before vowels or voiced consonants.
z /z/
ž /ʒ/
  1. 1 2 Unofficial ligatures are sometimes used for the transcription of affricates: /ts/,/dz/,/tʃ/,/dʒ/. The actual IPA version supports using two separate letters which can be joined by a tiebar.
  2. The "long-leg R" ɼ is sometimes used to transcribe voiced ř (unofficially). This character was withdrawn from the IPA and replaced by the "lower-case R" with the "up-tack" diacritic mark, which denotes "raised alveolar trill".

Voicing assimilation

All the obstruent consonants are subject to voicing (before voiced obstruents except v) or devoicing (before voiceless consonants and at the end of words); spelling in these cases is morphophonemic (i.e. the morpheme has the same spelling as before a vowel). An exception is the cluster sh, in which the /s/ is voiced to /z/ only in Moravian dialects, while in Bohemia the /ɦ/ is devoiced to /x/ instead (e.g. shodit/sxoɟɪt/, in Moravia /zɦoɟɪt/). Devoicing /ɦ/ changes its articulation place: it becomes [x]. After unvoiced consonants ř is devoiced: for instance, in tři 'three', which is pronounced [tr̝̊ɪ] . Written voiced or voiceless counterparts are kept according to the etymology of the word, e.g. odpadnout[ˈotpadnoʊ̯t] (to fall away) - od- is a prefix; written /d/ is devoiced here because of the following voiceless /p/.

For historical reasons, the consonant [ɡ] is written k in Czech words like kde ('where', < Proto-Slavic *kъdě) or kdo ('who', < Proto-Slavic *kъto). This is because the letter g was historically used for the consonant [j]. The original Slavic phoneme /ɡ/ changed into /h/ in the Old-Czech period. Thus, /ɡ/ is not a separate phoneme (with a corresponding grapheme) in words of domestic origin; it occurs only in foreign words (e.g. graf, gram, etc.).

Final devoicing

Unlike in English but like German and Russian, voiced consonants are pronounced voicelessly in the final position in words. In declension, they are voiced in cases where the words take on endings.

Compare:

led[ˈlɛt]ledy[ˈlɛdɪ] (ice – ices)
let[ˈlɛt]lety[ˈlɛtɪ] (flight – flights)

"Soft" I and "hard" Y

The letters i and y are both pronounced [ɪ], while í and ý are both pronounced [iː]. y was originally pronounced [ɨ] as in contemporary Polish. However, in the 14th century, this difference in standard pronunciation disappeared, though it has been preserved in some Moravian dialects. [2] In words of native origin "soft" i and í cannot follow "hard" consonants, while "hard" y and ý cannot follow "soft" consonants; "neutral" consonants can be followed by either vowel:

Hard and soft consonants
Softž, š, č, ř, c, j, ď, ť, ň
Neutralb, f, l, m, p, s, v, z
Hardh, ch, k, r, d, t, n, g

When i or í is written after d, t, n in native words, these consonants are soft, as if they were written ď, ť, ň. That is, the sounds [ɟɪ,ɟiː,cɪ,ciː,ɲɪ,ɲiː] are written di, dí, ti, tí, ni, ní instead of ďi, ďí, ťi, ťí, ňi, ňí, e.g. in čeština [ˈt͡ʃɛʃcɪna] . The sounds [dɪ,diː,tɪ,tiː,nɪ,niː] are denoted, respectively, by dy, dý, ty, tý, ny, ný. In words of foreign origin, di, ti, ni are pronounced [dɪ,tɪ,nɪ]; that is, as if they were written dy, ty, ny, e.g. in diktát, dictation.

Historically the letter c was hard, but this changed in the 19th century. However, in some words it is still followed by the letter y: tác (plate) – tácy (plates).

Because neutral consonants can be followed by either i or y, in some cases they distinguish homophones, e.g. být (to be) vs. bít (to beat), mýt (to wash) vs. mít (to have). At school pupils must memorize word roots and prefixes where y is written; i is written in other cases. Writing i or y in endings is dependent on the declension patterns.

Letter Ě

The letter ě is a vestige of Old Czech palatalization. The originally palatalizing phoneme /ě/ [ʲɛ] became extinct, changing to [ɛ] or [jɛ], but it is preserved as a grapheme which can never appear in the initial position.

Letter Ů

There are two ways in Czech to write long [uː]: ú and ů. ů cannot occur in an initial position, while ú occurs almost exclusively in the initial position or at the beginning of a word root in a compound.

Historically, long ú changed into the diphthong ou[ou̯] (as also happened in the English Great Vowel Shift with words such as "house"), though not in word-initial position in the prestige form. In 1848 ou at the beginning of word-roots was changed into ú in words like ouřad to reflect this. Thus, the letter ú is written at the beginning of word-roots only: úhel (angle), trojúhelník (triangle), except in loanwords: skútr (scooter).

Meanwhile, historical long ó[oː] changed into the diphthong uo[ʊo]. As was common with scribal abbreviations, the letter o in the diphthong was sometimes written as a ring above the letter u, producing ů, e.g. kóň > kuoň > kůň (horse), like the origin of the German umlaut. Later, the pronunciation changed into [uː], but the grapheme ů has remained. It never occurs at the beginning of words: dům (house), domů (home, homeward).

The letter ů now has the same pronunciation as the letter ú (long [uː]), but alternates with a short o when a word is inflected (e.g. nom. kůň → gen. koně, nom. dům → gen. domu), thus showing the historical evolution of the language.

Agreement between the subject and the predicate

The predicate must be always in accordance with the subject in the sentence - in number and person (personal pronouns), and with past and passive participles also in gender. This grammatical principle affects the orthography (see also "Soft" I and "Hard" Y) – it is especially important for the correct choice and writing of plural endings of the participles.

Examples:

GenderSg.Pl.English
masculine animatepes byl koupenpsi byli koupenia dog was bought/dogs were bought
masculine inanimatehrad byl koupenhrady byly koupenya castle was bought/castles were bought
femininekočka byla koupenakočky byly koupenya cat was bought/cats were bought
neuterměsto bylo koupenoměsta byla koupenaa town was bought/towns were bought

The mentioned example shows both past (byl, byla ...) and passive (koupen, koupena ...) participles. The accordance in gender takes effect in the past tense and the passive voice, not in the present and future tenses in active voice.

If the complex subject is a combination of nouns of different genders, masculine animate gender is prior to others and the masculine inanimate and feminine genders are prior to the neuter gender.

Examples:

muži a ženy byli - men and women were
kočky a koťata byly - cats and kittens were
my jsme byli (my = we all/men) vs. my jsme byly (my = we women) - we were

Priority of genders:

masculine animate > masculine inanimate & feminine > neuter

Punctuation

The use of the full stop (.), the colon (:), the semicolon (;), the question mark (?) and the exclamation mark (!) is similar to their use in other European languages. The full stop is placed after a number if it stands for ordinal numerals (as in German), e.g. 1. den (= první den) – the 1st day.

The comma is used to separate individual parts in complex-compound sentences, lists, isolated parts of sentences, etc. Its use in Czech is different from English. Subordinate (dependent) clauses must be always separated from their principal (independent) clauses, for instance. A comma is not placed before a (and), i (as well as), ani (nor) and nebo (or) when they connect parts of sentences or clauses in copulative conjunctions (on a same level). It must be placed in non-copulative conjunctions (consequence, emphasis, exclusion, etc.). A comma can, however, occur in front of the word a (and) if the former is part of comma-delimited parenthesis: Jakub, můj mladší bratr, a jeho učitel Filip byli příliš zabráni do rozhovoru. Probírali látku, která bude u zkoušky, a též, kdo na ní bude. A comma also separates subordinate conjunctions introduced by composite conjunctions a proto (and therefore) and a tak (and so).

Examples:

Quotation marks . The first one preceding the quoted text is placed to the bottom line:

Other types of quotation marks: ‚‘ »«

Apostrophes are used rarely in Czech. They can denote a missing sound in non-standard speech, but it is optional, e.g. řek' or řek (= řekl, he said).

Capital letters

The first word of every sentence and all proper names are capitalized. Special cases are:

History

In the 9th century, the Glagolitic script was used, during the 11th century it was replaced by Latin script. There are five periods in the development of the Czech Latin-based orthographic system:

Primitive orthography
For writing sounds which are foreign to the Latin alphabet, letters with similar sounds were used. The oldest known written notes in Czech originate from the 11th century. The literature was written predominantly in Latin in this period. Unfortunately, it was very ambiguous at times, with c, for example, being used for c, č, and k.
Digraphic orthography
Various digraphs were used for non-Latin sounds. The system was not consistent and it also did not distinguish long and short vowels. It had some features that Polish orthography has kept, such as cz, rz instead of č, ř, but was still crippled by ambiguities, such as spelling both s and š as s/ss, z and ž as z, and sometimes even c and č both as cz, only distinguishing by context. Long vowels such as á were sometimes (but not always) written double as aa. Other features of the day included spelling j as g and v as w, as the early modern Latin alphabet had not by then distinguished j from i or v from u.
Diacritic orthography
Introduced probably by Jan Hus. Using diacritics for long vowels ("virgula", an acute, "čárka" in Czech) and "soft" consonants ("punctus rotundus", a dot above a letter, which has survived in Polish ż) was suggested for the first time in "De orthographia Bohemica" around 1406. Diacritics replaced digraphs almost completely. It was also suggested that the Prague dialect should become the standard for Czech. Jan Hus is considered to be the author of that work but there is some uncertainty about this.
Brethren orthography
The Bible of Kralice (1579–1593), the first complete Czech translation of the Bible from the original languages by the Czech Brethren, became the model for the literary form of the language. The punctus rotundus was replaced by the caron ("háček"). There were some differences from the current orthography, e.g. the digraph ſſ was used instead of š; ay, ey, au instead of aj, ej, ou; v instead of u (at the beginning of words); w instead of v; g instead of j; and j instead of í (gegj = její, hers). Y was written always after c, s and z (e.g. cizí, foreign, was written cyzý) and the conjunction i (as well as, and) was written y.
Modern orthography
During the period of the Czech National Renaissance (end of the 18th century and the first half of the 19th century), Czech linguists (Josef Dobrovský et al.) codified some reforms in the orthography. These principles have been effective up to the present day. The later reforms in the 20th century mostly referred to introducing loanwords into Czech and their adaptation to the Czech orthography.

Computer encoding

In computing, several different coding standards have existed for this alphabet, among them:

See also

Related Research Articles

<span class="mw-page-title-main">Diacritic</span> Modifier mark added to a letter

A diacritic is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek διακριτικός, from διακρίνω. The word diacritic is a noun, though it is sometimes used in an attributive sense, whereas diacritical is only an adjective. Some diacritics, such as the acute ⟨á⟩, grave ⟨à⟩, and circumflex ⟨â⟩, are often called accents. Diacritics may appear above or below a letter or in some other position such as within the letter or between two letters.

Esperanto is written in a Latin-script alphabet of twenty-eight letters, with upper and lower case. This is supplemented by punctuation marks and by various logograms, such as the digits 0–9, currency signs such as $ € ¥ £ ₷, and mathematical symbols. The creator of Esperanto, L. L. Zamenhof, declared a principle of "one letter, one sound", though this is a general rather than strict guideline.

Finnish orthography is based on the Latin script, and uses an alphabet derived from the Swedish alphabet, officially comprising twenty-nine letters but also including two additional letters found in some loanwords. The Finnish orthography strives to represent all morphemes phonologically and, roughly speaking, the sound value of each letter tends to correspond with its value in the International Phonetic Alphabet (IPA) – although some discrepancies do exist.

Welsh orthography uses 29 letters of the Latin script to write native Welsh words as well as established loanwords.

Estonian orthography is the system used for writing the Estonian language and is based on the Latin alphabet. The Estonian orthography is generally guided by phonemic principles, with each grapheme corresponding to one phoneme.

A phonemic orthography is an orthography in which the graphemes correspond to the language's phonemes. Natural languages rarely have perfectly phonemic orthographies; a high degree of grapheme–phoneme correspondence can be expected in orthographies based on alphabetic writing systems, but they differ in how complete this correspondence is. English orthography, for example, is alphabetic but highly nonphonemic; it was once mostly phonemic during the Middle English stage, when the modern spellings originated, but spoken English changed rapidly while the orthography was much more stable, resulting in the modern nonphonemic situation. On the contrary the Albanian, Serbian/Croatian/Bosnian/Montenegrin, Romanian, Italian, Turkish, Spanish, Finnish, Czech, Latvian, Esperanto, Korean and Swahili orthographic systems come much closer to being consistent phonemic representations.

<span class="mw-page-title-main">English alphabet</span> Latin-script alphabet consisting of 26 letters

The alphabet for Modern English is a Latin-script alphabet consisting of 26 letters, each having an upper- and lower-case form. The word alphabet is a compound of the first two letters of the Greek alphabet, alpha and beta. The alphabet originated around the 7th century to write Old English from Latin script. Since then, letters have been added or removed to give the current letters:

<span class="mw-page-title-main">Polish alphabet</span> Script of the Polish language

The Polish alphabet is the script of the Polish language, the basis for the Polish system of orthography. It is based on the Latin alphabet but includes certain letters with diacritics: the acute accent ; the overdot ; the tail or ogonek ; and the stroke. ⟨q⟩, ⟨v⟩, and ⟨x⟩, which are used only in foreign words, are usually absent from the Polish alphabet. However, prior to the standardization of Polish spelling, ⟨x⟩ was sometimes used in place of ⟨ks⟩.

<span class="mw-page-title-main">Digraph (orthography)</span> Pair of characters used to write one phoneme

A digraph or digram is a pair of characters used in the orthography of a language to write either a single phoneme, or a sequence of phonemes that does not correspond to the normal values of the two characters combined.

<span class="mw-page-title-main">Gaj's Latin alphabet</span> Form of Latin script used to write Serbo-Croatian

Gaj's Latin alphabet, also known as abeceda or gajica, is the form of the Latin script used for writing Serbo-Croatian and all of its standard varieties: Bosnian, Croatian, Montenegrin, and Serbian.

<span class="mw-page-title-main">Spanish orthography</span> System for writing in Spanish

Spanish orthography is the orthography used in the Spanish language. The alphabet uses the Latin script. The spelling is fairly phonemic, especially in comparison to more opaque orthographies like English, having a relatively consistent mapping of graphemes to phonemes; in other words, the pronunciation of a given Spanish-language word can largely be predicted from its spelling and to a slightly lesser extent vice versa. Spanish punctuation uniquely includes the use of inverted question and exclamation marks: ⟨¿⟩⟨¡⟩.

Polish orthography is the system of writing the Polish language. The language is written using the Polish alphabet, which derives from the Latin alphabet, but includes some additional letters with diacritics. The orthography is mostly phonetic, or rather phonemic—the written letters correspond in a consistent manner to the sounds, or rather the phonemes, of spoken Polish. For detailed information about the system of phonemes, see Polish phonology.

A letter is a segmental symbol of a phonemic writing system. The inventory of all letters forms an alphabet. Letters broadly correspond to phonemes in the spoken form of the language, although there is rarely a consistent and exact correspondence between letters and phonemes.

The orthography of the Greek language ultimately has its roots in the adoption of the Greek alphabet in the 9th century BC. Some time prior to that, one early form of Greek, Mycenaean, was written in Linear B, although there was a lapse of several centuries between the time Mycenaean stopped being written and the time when the Greek alphabet came into use.

<span class="mw-page-title-main">Portuguese orthography</span> Alphabet and spelling

Portuguese orthography is based on the Latin alphabet and makes use of the acute accent, the circumflex accent, the grave accent, the tilde, and the cedilla to denote stress, vowel height, nasalization, and other sound changes. The diaeresis was abolished by the last Orthography Agreement. Accented letters and digraphs are not counted as separate characters for collation purposes.

The Czech language developed at the close of the 1st millennium from common West Slavic. Until the early 20th century, it was known as Bohemian.

ISO 11940-2 is an ISO standard for a simplified transcription of the Thai language into Latin characters.

Silesian orthography consists of many systems for writing the Silesian language. the current de facto standard is the Ślabikŏrzowy szrajbōnek or ślabikŏrz for short, largely but not entirely displacing Steuerowy szrajbůnek. These systems use variants of the Silesian alphabet, which derives from the Latin alphabet, but includes some additional letters with diacritics. The orthography is mostly phonetic, or rather phonemic—the written letters correspond in a consistent manner to the phonemes of spoken Silesian.

References

  1. Dvornik, Francis (1962). The Slavs in European History and Civilization . Rutgers University Press. pp.  287. ISBN   0813507995.
  2. Český Jazykový Atlas. Czech Language Institute, vol. 5. pp. 115–117. Retrieved 8 October 2017.
  3. "Přehled kódování češtiny". Cestina.cz. Retrieved 2013-11-19.