Substitutions of the Esperanto alphabet

Last updated

There are two conventional sets ASCII substitutions for the letters in the Esperanto alphabet that have diacritics, as well as a number of graphic work-arounds.

Contents

The diacritics of Esperanto were designed with a French manual typewriter in mind, as French was the international language at the time Esperanto was developed. French typewriters have a dead key for the circumflex that can be used in combination with any other key. In handwritten Esperanto, the diacritics pose no problem. However, since the Esperanto letters with diacritics do not appear on standard computer keyboard layouts (French computer keyboards, unlike manual typewriters, typically assign the circumflex only to letters that bear it in French orthography), various alternative methods have been devised for inputting them or substituting them in type. The original method, suggested by Zamenhof for people who did not have access to a French typewriter, was a set of digraphs in h, now known as the "Zamenhof-system" or "h-system". With the rise of computer word processing, the so-called "x-system" has become equally popular. With the advent of Unicode and more easily customized computer keyboards, the need for such workarounds has lessened.

ASCII transliteration systems

There are two alternative orthographies in common use, which replace the circumflex letters with either h digraphs or x digraphs. Another system sometimes noted is a 'QWXY system'; this is a carry-over from an early Esperanto keyboard app named Ĉapelilo  [ eo ], with which the Q W X and Y keys were assigned to the letters ĥ, ŭ, ŝ, ĵ, and the key sequences TX and DY to the letters ĉ and ĝ. [1] There are also graphic work-arounds such as approximating the circumflexes with carets.

H-system

H-system
H-sistemo
Script type
Alphabet
Creator L. L. Zamenhof
Created1888 [2]
ISO 15924
ISO 15924 [IETF] eo-hsistemo [3]

The original method of working around the diacritics was developed by the creator of Esperanto himself, L. L. Zamenhof. He recommended using u in place of ŭ, and digraphs with h for the circumflex letters. For example, ŝ is replaced by sh, as in shanco for ŝanco (chance). Where proper orthography has sh, the letters should be separated with an apostrophe or a hyphen, as in ses-hora (six-hour) or flug'haveno (airport). [4]

Unfortunately, simplistic ASCII-based rules for sorting words fail badly when sorting h-digraphs, because lexicographically words in ĉ should follow all words in c and precede words in d. The word ĉu should be placed after ci, but sorted in the h-system, chu would appear before ci.

X-system

X-system
X-sistemo, x-kodo
Script type
Alphabet
Createdby 1962 [5]
ISO 15924
ISO 15924 [IETF] eo-xsistemo [6]

A more recent system for typing in Esperanto is the so-called "x-system", which uses x instead of h for the digraphs, including ux for ŭ. For example, ŝ is represented by sx, as in sxi for ŝi and sxanco for ŝanco.

X-digraphs solve those problems of the h-system:

  1. x is not a letter in the Esperanto alphabet, so its use introduces no ambiguity.
  2. The digraphs are now nearly always correctly sorted after their single-letter counterparts; for example, sxanco (for ŝanco) comes after super, while h-system shanco comes before it. The sorting only fails in the infrequent case of a z in compound or unassimilated words; for example, the compound word reuzi ("to reuse") would be sorted after reuxmatismo (for reŭmatismo "rheumatism").

The x-system has become as popular as the h-system, but it has long been perceived as being contrary to the Fundamento de Esperanto. However, in its 2007 decision, the Akademio de Esperanto has issued general permission for the use of surrogate systems for the representation of the diacritical letters of Esperanto, under the condition that this is being done only "when the circumstances do not permit the use of proper diacritics, and when due to a special need the h-system fixed in the Fundamento is not convenient." [7] This provision covers situations such as using the x-system as a technical solution (to store data in plain ASCII) yet still displaying proper Unicode characters to the end user.

A practical problem of digraph substitution that the x-system does not completely resolve is in the complication of bilingual texts. Ux for ŭ is especially problematic when used alongside French text, because many French words end in aux or eux. Aux, for example, is a word in both languages ( in Esperanto). Any automatic conversion of the text will alter the French words as well as the Esperanto. A few English words like "auxiliary" and "Euxine" can also suffer from such search-and-replace routines. One common solution, such as the one used in Wikipedia's MediaWiki software, is to use xx to escape the ux to ŭ conversion, e.g. "auxx" produces "aux". [8] [9] A few people have also proposed using "vx" instead of "ux" for ŭ to resolve this problem, but this variant of the system is rarely used.

Y-sistemo

Ĉ = Cy
Ĝ = Gy
Ĥ = X
Ĵ = Jy
Ŝ = Sy
Ŭ = W

For example: eĥoŝanĝoj ĉiuĵaŭde ("echo-change every Thursday") becomes "exosyangyoj cyiujyawde". [10]

Graphic work-arounds

There are several ad hoc workarounds used in email or on the internet, where the proper letters are often not supported, as seen also in non-ASCII orthographies such as German. These "slipped-hat" conventions make use of the caret (^) or greater than sign (>) to represent the circumflex. For example, ŝanco may be written ^sanco, s^anco, or s>anco. [11] However, they have generally fallen out of favor. Before the internet age, Stefano la Colla  [ eo ] had proposed shifting the caret onto the following vowel, since French circumflex vowels are supported in printing houses. That is, one would write ehôsângôj cîujâude for the nonsense phrase eĥoŝanĝoj ĉiuĵaŭde ("echo-change every Thursday"). [12] However, this proposal has never been adopted.

See also

Related Research Articles

<span class="mw-page-title-main">Diacritic</span> Modifier mark added to a letter

A diacritic is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek διακριτικός, from διακρίνω. The word diacritic is a noun, though it is sometimes used in an attributive sense, whereas diacritical is only an adjective. Some diacritics, such as the acute ⟨á⟩, grave ⟨à⟩, and circumflex ⟨â⟩, are often called accents. Diacritics may appear above or below a letter or in some other position such as within the letter or between two letters.

<span class="mw-page-title-main">Esperanto</span> International auxiliary language

Esperanto is the world's most widely spoken constructed international auxiliary language. Created by L. L. Zamenhof in 1887, it is intended to be a universal second language for international communication, or "the international language". Zamenhof first described the language in Dr. Esperanto's International Language, which he published under the pseudonym Doktoro Esperanto. Early adopters of the language liked the name Esperanto and soon used it to describe his language. The word esperanto translates into English as "one who hopes".

Esperanto is written in a Latin-script alphabet of twenty-eight letters, with upper and lower case. This is supplemented by punctuation marks and by various logograms, such as the digits 0–9, currency signs such as $ € ¥ £ ₷, and mathematical symbols. The creator of Esperanto, L. L. Zamenhof, declared a principle of "one letter, one sound", though this is a general rather than strict guideline.

The circumflex is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from Latin: circumflexus "bent around"—a translation of the Greek: περισπωμένη.

The Danish and Norwegian alphabets is the set of symbols, forming a variant of the Latin alphabet, used for writing the Danish and Norwegian languages. It has consisted of the following 29 letters since 1917 (Norwegian) and 1948 (Danish):

A caron is a diacritic mark commonly placed over certain letters in the orthography of some languages to indicate a change of the related letter's pronunciation.

<span class="mw-page-title-main">Ĉ</span> Latin letter C with circumflex; used in Esperanto

Ĉ or ĉ is a consonant in Esperanto orthography, representing the sound.

Ĥ or ĥ is a letter of some extended Latin alphabets, most prominently a consonant in Esperanto orthography, where it represents a voiceless velar fricative or voiceless uvular fricative. Its name in Esperanto is ĥo, or ĥi in the Kalocsay abecedary.

Ŝ or ŝ is a consonant in Esperanto orthography, representing the sound.

Proto-Esperanto is the modern term for any of the stages in the evolution of L. L. Zamenhof's language project, prior to the publication of Unua Libro in 1887.

Esperanto is a constructed international auxiliary language designed to have a simple phonology. The creator of Esperanto, L. L. Zamenhof, described Esperanto pronunciation by comparing the sounds of Esperanto with the sounds of several major European languages.

An Esperantido is a constructed language derived from Esperanto. Esperantido originally referred to the language which is now known as Ido. The word Esperantido contains the affix (-ido), which means a "child, young or offspring". Hence, Esperantido literally means an 'offspring or descendant of Esperanto'.

Esperanto and Interlingua are two planned languages with different approaches to the problem of providing an International auxiliary language (IAL). Esperanto has many more speakers; the number of speakers is c. 100,000-2,000,000. On the other hand, the number of speakers is c. 1,500 for Interlingua, but speakers of the language claim to be able to communicate easily with the c. 1 billion speakers of Romance languages, whereas Esperanto speakers can only communicate among each other.

Esperanto and Ido are constructed international auxiliary languages, with Ido being an Esperantido derived from Esperanto and Reformed Esperanto. The number of speakers is estimated at 100 thousand to 2 million for Esperanto, whereas Ido is much fewer at 100 to 1 thousand.

Esperanto and Novial are two different constructed international auxiliary languages. Their main difference is that while Esperanto is a schematic language, with an unvarying grammar, Novial is a naturalistic language, whose grammar and vocabulary varies to try to retain a "natural" sound. Demographically, Esperanto has thousands of times more speakers than Novial.

Greek orthography has used a variety of diacritics starting in the Hellenistic period. The more complex polytonic orthography, which includes five diacritics, notates Ancient Greek phonology. The simpler monotonic orthography, introduced in 1982, corresponds to Modern Greek phonology, and requires only two diacritics.

International auxiliary language orthography is often simplified when compared with natural language orthography.

There are a number of methods to input Esperanto letters and text on a computer, e.g. when using a word processor or email. Input methods depend on a computer's operating system. Specifically the characters ĵ, ĝ, ĉ, ĥ, ŭ, ŝ can be problematic.

Caret is the name used familiarly for the character ^ provided on most QWERTY keyboards by typing ⇧ Shift+6. The symbol has a variety of uses in programming and mathematics. The name "caret" arose from its visual similarity to the original proofreader's caret, a mark used in proofreading to indicate where a punctuation mark, word, or phrase should be inserted into a document. The formal ASCII standard (X3.64.1977) calls it a "circumflex".

References

  1. Monato: internacia magazino sendependa, numero 1995/04, paĝo 32: 'Ĉapelilo 1.0 verkita de Pejno Simono'.
  2. Zamenhof, Ludoviko Lazaro (1888). Aldono al la "Dua Libro de l' Lingvo Internacia" (in Esperanto). Warsaw. Retrieved 12 March 2021. 3) Se ia el la tipografioj ne povas presi verkojn kun signetoj superliteraj (^) kaj (˘), ĝi povas anstataŭigi la signeton (^) per la litero "h" kaj la signeton (˘) tute ne uzadi. Sed en la komenco de tia verko devas esti presita: "ch=ĉ; gh=ĝ; hh=ĥ; jh=ĵ; sh=ŝ". Se oni bezonas presi ion kun signetoj internaj (,), oni devas ĝin fari garde, ke la leganto ne prenu ilin por komoj (,). Anstataŭ la signeto (,) oni povas ankaŭ presadi (') aŭ (-). Ekzemple: sign,et,o = sign'et'o = sig-net-o.{{cite book}}: CS1 maint: location missing publisher (link)
  3. Starner, David. "Registration form for 'hsistemo'" (text). IANA. Retrieved 12 March 2021.
  4. Lenio Marobin, PY3DF (2008) 'Morsa kodo kaj Esperanto rekolekto de artikoloj iam aperintaj', ILERA Bulteno n-o 70, p-o 04.
  5. Eichholz, Rüdiger (1983). "Akademiaj Studoj". Akademiaj Studoj: 7. quoting from "Esperanto". Esperanto: 161. September 1962.
  6. Starner, David. "Registration form for 'xsistemo'" (text). IANA. Retrieved 12 March 2021.
  7. "Akademio de Esperanto: Oficialaj Informoj 6 - 2007 01 21". akademio-de-esperanto.org. Archived from the original on 29 March 2013. Retrieved 22 January 2013.
  8. Wikipedia:Wikipedia Signpost/2012-12-31/Interview
  9. Chuck Smith (10 January 2011). "Unicoding the Esperanto Wikipedia (Part 3 of 4)". Esperanto Language Blog. Retrieved 14 January 2013.
  10. "Esperanto", wiktionary.org, retrieved 23 July 2023
  11. "lernu!: Community / Forum / Introduction". lernu.net. Archived from the original on 16 January 2009. Retrieved 24 October 2008.
  12. Plena Analiza Gramatiko, end of section 4: Cê la sângôj okazintaj en la cî-landa vojkodo, cîuj automobilistoj zorge informigû pri la jûsaj instrukcioj.