Decipherment

Last updated

In philology, decipherment is the discovery of the meaning of the symbols found in extinct languages and/or alphabets. [1]

Contents

Decipherment overlaps with another technical field known as cryptanalysis, a field that aims to decipher writings used in secret communication, known as ciphertext. A famous case of this was in the cryptanalysis of the Enigma during the World War II. Many other ciphers from past wars have only recently been cracked. [2] Unlike in language decipherment, however, actors using ciphertext intentionally lay obstacles to prevent outsiders from uncovering the meaning of the communication system. [3]

Today, at least a dozen languages remain undeciphered. [4] A notable recent decipherment was that of the Linear Elamite script. [5]

Categories

According to Gelb and Whiting, the approach of decipherment depends on four categories of situations in an undeciphered language: [3] [6]

Methods

A number of methods are available to go about deciphering an extinct writing system or language. These can be divided into approaches utilizing external or internal information. [3]

External information

Many successful encipherments have proceeded from the discovery of external information, a common example being through the use of multilingual inscriptions, such as the Rosetta Stone (with the same text in three scripts: Demotic, hieroglyphic, and Greek) that enabled the decipherment of Egyptian hieroglyphic. In principle, multilingual text may be insufficient for a decipherment as translation is not a linear and reversible process, but instead represents an encoding of the message in a different symbolic system. Translating a text from one language into a second, and then from the second language back into the first, rarely reproduces exactly the original writing. Likewise, unless a significant number of words are contained in the multilingual text, limited information can be gleaned from it. [3]

Internal information

Internal approaches are multi-step: one must first ensure that the writing they are looking at represents real writing, as opposed to a grouping of pictorial representations or a modern-day forgery without further meaning. This is commonly approached with methods from the field of grammatology. Prior to decipherment of meaning, one can then determine the number of distinct graphemes (which, in turn, allows one to tell if the writing system is alphabetic, syllabic, or logo-syllabic; this is because such writing systems typically do not overlap in the number of graphemes they use [6] ), the sequence of writing (whether it be from left to right, right to left, top to bottom, etc.), and the determination of whether individual words are properly segmented when the alphabet is written (such as with the use of a space or a different special mark) or not. If a repetitive schematic arrangement can be identified, this can help in decipherment. For example, if the last line of a text has a small number, it can be reasonably guessed to be referring to the date, where one of the words means "year" and, sometimes, a royal name also appears. Another case is when the text contains many small numbers, followed by a word, followed by a larger number; here, the word likely means "total" or "sum". After one has exhausted the information that can be inferentially derived from probable content, they must transition to the systematic application of statistical tools. These include methods concerning the frequency of appearance of each symbol, the order in which these symbols typically appear, whether some symbols appear at the beginning or end of words, etc. There are situations where orthographic features of a language make it difficult if not impossible to decipher specific features (especially without certain outside information), such as when an alphabet does not express double consonants. Additional, and more complex methods, also exist. Eventually, the application of such statistical methods becomes exceedingly laborious, in which computers might be used to apply them automatically. [3]

Computational approaches

Computational approaches towards the decipherment of unknown languages began to appear in the late 1990s. [7] Typically, there are two types of computational approaches used in language decipherment: approaches meant to produce translations in known languages, and approaches used to detect new information that might enable future efforts at translation. The second approach is more common, and includes things such as the detection of cognates or related words, discovery of the closest known language, word alignments, and more. [6]

Artificial intelligence

In recent years, there has been a growing emphasis on methods utilizing artificial intelligence for the decipherment of lost languages, especially through natural language processing (NLP) methods. Proof-of-concept methods have independently re-deciphered Ugaritic and Linear B using data from similar languages, in this case Hebrew and Ancient Greek. [8]

Deciphering pronunciation

Related to attempts to decipher the meaning of languages and alphabets, include attempts to decipher how extinct writing systems, or older versions of contemporary writing systems (such as English in the 1600s) were pronounced. Several methods and criteria have been developed in this regard. Important criteria include (1) Rhymes and the testimony of poetry (2) Evidence from occasional spellings and misspellings (3) Interpretations of material in one language from authors in foreign languags (4) Information obtained from related languages (5) Grammatical changes in spelling over time. [9]

For example, analysis of poetry focuses on the use of wordplay or literary techniques between words that have a similar sound. Shakespeare's play Romeo and Juliet contains wordplay that relies on a similar sound between the words "soul" and "soles", allowing confidence that the similar pronunciation between the terms today also existed in Shakespeare's time. Another common source of information on pronunciation is when earlier texts use rhyme, such as when consecutive lines in poetry end in the similar or the same sound. This method does have some limitations however, as texts may use rhymes that rely on visual similarities between words (such as 'love' and 'remove') as opposed to auditory similarities, and that rhymes can be imperfect. Another source of information about pronunciation comes from explicit description of pronunciations from earlier texts, as in the case of the Grammatica Anglicana, such as in the following comment about the letter <o>: "In the long time it naturally soundeth sharp, and high; as in chósen, hósen, hóly, fólly [. . .] In the short time more flat, and a kin to u; as còsen, dòsen, mòther, bròther, lòve, pròve". [10] Another example comes from detailed comments on pronunciations of Sanskrit from the surviving works of Sanskrit grammarians. [9]

Challenges

Many challenges exist in the decipherment of languages, including when: [4] [6]

Notable decipherers

Name of scholarScript decipheredDate
Magnus Celsius Staveless Runes 1674
Jón Ólafsson of Grunnavík Cipher runes 1740s
Jean-Jacques Barthélemy Palmyrene alphabet 1754
Jean-Jacques Barthélemy Phoenician alphabet 1758
Antoine-Isaac Silvestre de Sacy Pahlavi script 1791
Jean-François Champollion Egyptian Hieroglyphs (Decipherment)1822
Georg Friedrich Grotefend, Eugène Burnouf, and Henry Rawlinson Old Persian Cuneiform (Decipherment)1823
Thomas Young Demotic script
Manuel Gómez-Moreno Northeastern Iberian script
James Prinsep Brahmi, Kharosthi
Edward Hincks Mesopotamian Cuneiform
Bedřich Hrozný Hittite Cuneiform
Vilhelm Thomsen Old Turkic
George Smith and Samuel Birch, et al. [11] Cypriot syllabary
Hans Bauer and Édouard Paul Dhorme [12] Ugaritic alphabet
Wáng Yìróng, Liú È, Sūn Yíràng, et al. Oracle Bone script
Aleksei Ivanovich Ivanov, Nikolai Aleksandrovich Nevsky, et al. Tangut script
Michael Ventris, John Chadwick, and Alice Kober Linear B
Yuri Knorozov and Tatiana Proskouriakoff, et al. Maya
Louis Félicien de Saulcy Libyco-Berber script (almost fully)
Jan-Olof Tjäder "Enlarged opening script" of Ravenna (variant of the Latin alphabet)
Zaza Alexidze Caucasian Albanian alphabet
François Desset [5] Linear Elamite

See also

Deciphered scripts

Undeciphered scripts

Undeciphered texts

Related Research Articles

<span class="mw-page-title-main">Alphabet</span> Set of letters used to write a given language

An alphabet is a standard set of letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from another in a given language. Not all writing systems represent language in this way: a syllabary assigns symbols to spoken syllables, while logographies assign symbols to words, morphemes, or other semantic units.

<span class="mw-page-title-main">Abugida</span> Writing system

An abugida – sometimes also called alphasyllabary, neosyllabary, or pseudo-alphabet – is a segmental writing system in which consonant–vowel sequences are written as units; each unit is based on a consonant letter, and vowel notation is secondary, similar to a diacritical mark. This contrasts with a full alphabet, in which vowels have status equal to consonants, and with an abjad, in which vowel marking is absent, partial, or optional – in less formal contexts, all three types of the script may be termed "alphabets". The terms also contrast them with a syllabary, in which a single symbol denotes the combination of one consonant and one vowel.

<span class="mw-page-title-main">Linear A</span> Undeciphered writing system of ancient Crete

Linear A is a writing system that was used by the Minoans of Crete from 1800 BC to 1450 BC. Linear A was the primary script used in palace and religious writings of the Minoan civilization. It evolved into Linear B, which was used by the Mycenaeans to write an early form of Greek. It was discovered by the archaeologist Sir Arthur Evans in 1900. No texts in Linear A have yet been deciphered. Evans named the script "Linear" because its characters consisted simply of lines inscribed in clay, in contrast to the more pictographic characters in Cretan hieroglyphs that were used during the same period.

In the linguistic study of written languages, a syllabary is a set of written symbols that represent the syllables or moras which make up words.

<span class="mw-page-title-main">Writing</span> Persistent representation of language

Writing is the act of creating a persistent representation of human language. A writing system uses a set of symbols and rules to encode aspects of spoken language, such as its lexicon and syntax. However, written language may take on characteristics distinct from those of any spoken language.

<span class="mw-page-title-main">Egyptian hieroglyphs</span> Formal writing system used by Ancient Egyptians

Ancient Egyptian hieroglyphs were the formal writing system used in Ancient Egypt for writing the Egyptian language. Hieroglyphs combined ideographic, logographic, syllabic and alphabetic elements, with more than 1,000 distinct characters. Cursive hieroglyphs were used for religious literature on papyrus and wood. The later hieratic and demotic Egyptian scripts were derived from hieroglyphic writing, as was the Proto-Sinaitic script that later evolved into the Phoenician alphabet. Egyptian hieroglyphs are the ultimate ancestor of the Phoenician alphabet, the first widely adopted phonetic writing system. Moreover, owing in large part to the Greek and Aramaic scripts that descended from Phoenician, the majority of the world's living writing systems are descendants of Egyptian hieroglyphs—most prominently the Latin and Cyrillic scripts through Greek, and the Arabic and Brahmic scripts through Aramaic.

<span class="mw-page-title-main">Logogram</span> Grapheme which represents a word or a morpheme

In a written language, a logogram, also logograph or lexigraph, is a written character that represents a semantic component of a language, such as a word or morpheme. Chinese characters as used in Chinese as well as other languages are logograms, as are Egyptian hieroglyphs and characters in cuneiform script. A writing system that primarily uses logograms is called a logography. Non-logographic writing systems, such as alphabets and syllabaries, are phonemic: their individual symbols represent sounds directly and lack any inherent meaning. However, all known logographies have some phonetic component, generally based on the rebus principle, and the addition of a phonetic component to pure ideographs is considered to be a key innovation in enabling the writing system to adequately encode human language.

<span class="mw-page-title-main">Ugaritic alphabet</span> Cuneiform consonantal alphabet of 30 letters

The Ugaritic writing system is a cuneiform abjad with syllabic elements used from around either 1400 BCE or 1300 BCE for Ugaritic, an extinct Northwest Semitic language. It was discovered in Ugarit, modern Ras Shamra, Syria, in 1928. It has 30 letters. Other languages, particularly Hurrian, were occasionally written in the Ugaritic script in the area around Ugarit, although not elsewhere.

<span class="mw-page-title-main">Cuneiform</span> Writing system of the ancient Near East

Cuneiform is a logo-syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform scripts are marked by and named for the characteristic wedge-shaped impressions which form their signs. Cuneiform is the earliest known writing system and was originally developed to write the Sumerian language of southern Mesopotamia.

<span class="mw-page-title-main">Cypriot syllabary</span> Syllabic script used in Iron Age Cyprus

The Cypriot or Cypriote syllabary is a syllabic script used in Iron Age Cyprus, from about the 11th to the 4th centuries BCE, when it was replaced by the Greek alphabet. It has been suggested that the script remained in use as late as the 1st century BC. A pioneer of that change was King Evagoras of Salamis. It is thought to be descended from the Cypro-Minoan syllabary, itself a variant or derivative of Linear A. Most texts using the script are in the Arcadocypriot dialect of Greek, but also one bilingual inscription was found in Amathus.

<span class="mw-page-title-main">Maya script</span> Writing system of the Maya civilization

Maya script, also known as Maya glyphs, is historically the native writing system of the Maya civilization of Mesoamerica and is the only Mesoamerican writing system that has been substantially deciphered. The earliest inscriptions found which are identifiably Maya date to the 3rd century BCE in San Bartolo, Guatemala. Maya writing was in continuous use throughout Mesoamerica until the Spanish conquest of the Maya in the 16th and 17th centuries. Though modern Mayan languages are almost entirely written using the Latin alphabet rather than Maya script, there have been recent developments encouraging a revival of the Maya glyph system.

<span class="mw-page-title-main">Minoan language</span> Language of ancient Minoans written in Cretan hieroglyphs and Linear A syllabary

The Minoan language is the language of the ancient Minoan civilization of Crete written in the Cretan hieroglyphs and later in the Linear A syllabary. As the Cretan hieroglyphs are undeciphered and Linear A only partly deciphered, the Minoan language is unknown and unclassified. With the existing evidence, it is even impossible to be certain that the two scripts record the same language.

<span class="mw-page-title-main">Byblos syllabary</span> Bronze Age pictographic script from Byblos

The Byblos script, also known as the Byblos syllabary, Pseudo-hieroglyphic script, Proto-Byblian, Proto-Byblic, or Byblic, is an undeciphered writing system, known from ten inscriptions found in Byblos, a coastal city in Lebanon. The inscriptions are engraved on bronze plates and spatulas, and carved in stone. They were excavated by Maurice Dunand, from 1928 to 1932, and published in 1945 in his monograph Byblia Grammata. The inscriptions are conventionally dated to the second millennium BC, probably between the 18th and 15th centuries BC.

<span class="mw-page-title-main">Phaistos Disc decipherment claims</span> Alleged deciphering of unknown symbols on the Phaistos Disc

Many people have claimed to have deciphered the Phaistos Disc.

The Cypro-Minoan syllabary (CM), more commonly called the Cypro-Minoan Script, is an undeciphered syllabary used on the island of Cyprus and at its trading partners during the late Bronze Age and early Iron Age. The term "Cypro-Minoan" was coined by Arthur Evans in 1909 based on its visual similarity to Linear A on Minoan Crete, from which CM is thought to be derived. Approximately 250 objects—such as clay balls, cylinders, and tablets which bear Cypro-Minoan inscriptions, have been found. Discoveries have been made at various sites around Cyprus, as well as in the ancient city of Ugarit on the Syrian coast. It is thought to be somehow related to the later Cypriot syllabary.

Aegean script or Cretan script refers to a group of scripts that originate from the island of Crete. It may also refer to:

Mesoamerica, along with Mesopotamia and China, is one of three known places in the world where writing is thought to have developed independently. Mesoamerican scripts deciphered to date are a combination of logographic and syllabic systems. They are often called hieroglyphs due to the iconic shapes of many of the glyphs, a pattern superficially similar to Egyptian hieroglyphs. Fifteen distinct writing systems have been identified in pre-Columbian Mesoamerica, many from a single inscription. The limits of archaeological dating methods make it difficult to establish which was the earliest and hence the progenitor from which the others developed. The best documented and deciphered Mesoamerican writing system, and the most widely known, is the classic Maya script. Earlier scripts with poorer and varying levels of decipherment include the Olmec hieroglyphs, the Zapotec script, and the Isthmian script, all of which date back to the 1st millennium BC. An extensive Mesoamerican literature has been conserved, partly in indigenous scripts and partly in postconquest transcriptions in the Latin script.

<span class="mw-page-title-main">Undeciphered writing systems</span> Writing systems that are yet to be understood

Many undeciphered writing systems exist today; most date back several thousand years, although some more modern examples do exist. The term "writing systems" is used here loosely to refer to groups of glyphs which appear to have representational symbolic meaning, but which may include "systems" that are largely artistic in nature and are thus not examples of actual writing.

A writing system comprises a set of symbols, called a script, as well as the rules by which the script represents a particular language. The earliest writing was invented during the late 4th millennium BC. Throughout history, each writing system invented without prior knowledge of writing gradually evolved from a system of proto-writing that included a small number of ideographs, which were not fully capable of encoding spoken language, and lacked the ability to express a broad range of ideas.

References

  1. Although the script, Libyco-Berber, has been almost fully deciphered, the language has not.
  1. Trask, R.L (2000). The Dictionary of Historical and Comparative Linguistics. Fitzroy Dearborn Publishers, p. 82 ("The process of determining the relation between an extinct and unknown writing system and the language it represents. Strictly, decipherment is the elucidation of the script—that is, determining the values of the written characters")
  2. Bauer, Craig P. (2023-03-04). "The new golden age of decipherment". Cryptologia. 47 (2): 97–100. doi:10.1080/01611194.2023.2170158. ISSN   0161-1194.
  3. 1 2 3 4 5 Gelb, I. J.; Whiting, R. M. (1975). "Methods of Decipherment". Journal of the Royal Asiatic Society. 107 (2): 95–104. doi:10.1017/S0035869X00132769. ISSN   2051-2066.
  4. 1 2 Luo, Jiaming; Hartmann, Frederik; Santus, Enrico; Barzilay, Regina; Cao, Yuan (2021). "Deciphering Undersegmented Ancient Scripts Using Phonetic Prior". Transactions of the Association for Computational Linguistics. 9: 69–81. arXiv: 2010.11054 . doi:10.1162/tacl_a_00354. ISSN   2307-387X.
  5. 1 2 Desset, François; Tabibzadeh, Kambiz; Kervran, Matthieu; Basello, Gian Pietro; Marchesi, and Gianni (2022-07-01). "The Decipherment of Linear Elamite Writing". Zeitschrift für Assyriologie und vorderasiatische Archäologie. 112 (1): 11–60. doi:10.1515/za-2022-0003. ISSN   1613-1150.
  6. 1 2 3 4 Braović, Maja; Krstinić, Damir; Štula, Maja; Ivanda, Antonia (2024-06-01). "A Systematic Review of Computational Approaches to Deciphering Bronze Age Aegean and Cypriot Scripts". Computational Linguistics. 50 (2): 725–779. doi: 10.1162/coli_a_00514 . ISSN   0891-2017.
  7. Knight, Kevin; Yamada, Kenji (1999). "A Computational Approach to Deciphering Unknown Scripts" (PDF). Unsupervised Learning in Natural Language Processing.
  8. Luo, Jiaming; Cao, Yuan; Barzilay, Regina (2019). "Neural Decipherment via Minimum-Cost Flow: From Ugaritic to Linear B". Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. pp. 3146–3155. arXiv: 1906.06718 . doi:10.18653/v1/P19-1303.
  9. 1 2 Campbell, Lyle (2021). Historical linguistics: an introduction (4th ed.). MIT Press. pp. 372–375. ISBN   978-0-262-53159-7.
  10. Burridge, Kate; Bergs, Alexander (2017). Understanding language change. Understanding language series. London New York: Routledge, Taylor & Francis Group. pp. 234–235. ISBN   978-0-415-71339-9.
  11. "Cypro-Syllabic".
  12. "Anatomy of a Decipherment", http://images.library.wisc.edu/WI/EFacs/transactions/WT1966/reference/wi.wt1966.adcorre.pdf"

Further reading