In philology, decipherment is the discovery of the meaning of the symbols found in extinct languages and/or alphabets. [1]
Decipherment overlaps with another technical field known as cryptanalysis, a field that aims to decipher writings used in secret communication, known as ciphertext. A famous case of this was in the cryptanalysis of the Enigma during the World War II. Many other ciphers from past wars have only recently been cracked. [2] Unlike in language decipherment, however, actors using ciphertext intentionally lay obstacles to prevent outsiders from uncovering the meaning of the communication system. [3]
Today, at least a dozen languages remain undeciphered. [4] A notable recent decipherment was that of the Linear Elamite script. [5]
According to Gelb and Whiting, the approach of decipherment depends on four categories of situations in an undeciphered language: [3] [6]
A number of methods are available to go about deciphering an extinct writing system or language. These can be divided into approaches utilizing external or internal information. [3]
Many successful encipherments have proceeded from the discovery of external information, a common example being through the use of multilingual inscriptions, such as the Rosetta Stone (with the same text in three scripts: Demotic, hieroglyphic, and Greek) that enabled the decipherment of Egyptian hieroglyphic. In principle, multilingual text may be insufficient for a decipherment as translation is not a linear and reversible process, but instead represents an encoding of the message in a different symbolic system. Translating a text from one language into a second, and then from the second language back into the first, rarely reproduces exactly the original writing. Likewise, unless a significant number of words are contained in the multilingual text, limited information can be gleaned from it. [3]
Internal approaches are multi-step: one must first ensure that the writing they are looking at represents real writing, as opposed to a grouping of pictorial representations or a modern-day forgery without further meaning. This is commonly approached with methods from the field of grammatology. Prior to decipherment of meaning, one can then determine the number of distinct graphemes (which, in turn, allows one to tell if the writing system is alphabetic, syllabic, or logo-syllabic; this is because such writing systems typically do not overlap in the number of graphemes they use [6] ), the sequence of writing (whether it be from left to right, right to left, top to bottom, etc.), and the determination of whether individual words are properly segmented when the alphabet is written (such as with the use of a space or a different special mark) or not. If a repetitive schematic arrangement can be identified, this can help in decipherment. For example, if the last line of a text has a small number, it can be reasonably guessed to be referring to the date, where one of the words means "year" and, sometimes, a royal name also appears. Another case is when the text contains many small numbers, followed by a word, followed by a larger number; here, the word likely means "total" or "sum". After one has exhausted the information that can be inferentially derived from probable content, they must transition to the systematic application of statistical tools. These include methods concerning the frequency of appearance of each symbol, the order in which these symbols typically appear, whether some symbols appear at the beginning or end of words, etc. There are situations where orthographic features of a language make it difficult if not impossible to decipher specific features (especially without certain outside information), such as when an alphabet does not express double consonants. Additional, and more complex methods, also exist. Eventually, the application of such statistical methods becomes exceedingly laborious, in which computers might be used to apply them automatically. [3]
Computational approaches towards the decipherment of unknown languages began to appear in the late 1990s. [7] Typically, there are two types of computational approaches used in language decipherment: approaches meant to produce translations in known languages, and approaches used to detect new information that might enable future efforts at translation. The second approach is more common, and includes things such as the detection of cognates or related words, discovery of the closest known language, word alignments, and more. [6]
In recent years, there has been a growing emphasis on methods utilizing artificial intelligence for the decipherment of lost languages, especially through natural language processing (NLP) methods. Proof-of-concept methods have independently re-deciphered Ugaritic and Linear B using data from similar languages, in this case Hebrew and Ancient Greek. [8]
Related to attempts to decipher the meaning of languages and alphabets, include attempts to decipher how extinct writing systems, or older versions of contemporary writing systems (such as English in the 1600s) were pronounced. Several methods and criteria have been developed in this regard. Important criteria include (1) Rhymes and the testimony of poetry (2) Evidence from occasional spellings and misspellings (3) Interpretations of material in one language from authors in foreign languags (4) Information obtained from related languages (5) Grammatical changes in spelling over time. [9]
For example, analysis of poetry focuses on the use of wordplay or literary techniques between words that have a similar sound. Shakespeare's play Romeo and Juliet contains wordplay that relies on a similar sound between the words "soul" and "soles", allowing confidence that the similar pronunciation between the terms today also existed in Shakespeare's time. Another common source of information on pronunciation is when earlier texts use rhyme, such as when consecutive lines in poetry end in the similar or the same sound. This method does have some limitations however, as texts may use rhymes that rely on visual similarities between words (such as 'love' and 'remove') as opposed to auditory similarities, and that rhymes can be imperfect. Another source of information about pronunciation comes from explicit description of pronunciations from earlier texts, as in the case of the Grammatica Anglicana, such as in the following comment about the letter <o>: "In the long time it naturally soundeth sharp, and high; as in chósen, hósen, hóly, fólly [. . .] In the short time more flat, and a kin to u; as còsen, dòsen, mòther, bròther, lòve, pròve". [10] Another example comes from detailed comments on pronunciations of Sanskrit from the surviving works of Sanskrit grammarians. [9]
Many challenges exist in the decipherment of languages, including when: [4] [6]
An alphabet is a standard set of letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from another in a given language. Not all writing systems represent language in this way: a syllabary assigns symbols to spoken syllables, while logographies assign symbols to words, morphemes, or other semantic units.
An abugida – sometimes also called alphasyllabary, neosyllabary, or pseudo-alphabet – is a segmental writing system in which consonant–vowel sequences are written as units; each unit is based on a consonant letter, and vowel notation is secondary, similar to a diacritical mark. This contrasts with a full alphabet, in which vowels have status equal to consonants, and with an abjad, in which vowel marking is absent, partial, or optional – in less formal contexts, all three types of the script may be termed "alphabets". The terms also contrast them with a syllabary, in which a single symbol denotes the combination of one consonant and one vowel.
Linear A is a writing system that was used by the Minoans of Crete from 1800 BC to 1450 BC. Linear A was the primary script used in palace and religious writings of the Minoan civilization. It evolved into Linear B, which was used by the Mycenaeans to write an early form of Greek. It was discovered by the archaeologist Sir Arthur Evans in 1900. No texts in Linear A have yet been deciphered. Evans named the script "Linear" because its characters consisted simply of lines inscribed in clay, in contrast to the more pictographic characters in Cretan hieroglyphs that were used during the same period.
In the linguistic study of written languages, a syllabary is a set of written symbols that represent the syllables or moras which make up words.
Writing is the act of creating a persistent representation of human language. A writing system uses a set of symbols and rules to encode aspects of spoken language, such as its lexicon and syntax. However, written language may take on characteristics distinct from those of any spoken language.
Ancient Egyptian hieroglyphs were the formal writing system used in Ancient Egypt for writing the Egyptian language. Hieroglyphs combined ideographic, logographic, syllabic and alphabetic elements, with more than 1,000 distinct characters. Cursive hieroglyphs were used for religious literature on papyrus and wood. The later hieratic and demotic Egyptian scripts were derived from hieroglyphic writing, as was the Proto-Sinaitic script that later evolved into the Phoenician alphabet. Egyptian hieroglyphs are the ultimate ancestor of the Phoenician alphabet, the first widely adopted phonetic writing system. Moreover, owing in large part to the Greek and Aramaic scripts that descended from Phoenician, the majority of the world's living writing systems are descendants of Egyptian hieroglyphs—most prominently the Latin and Cyrillic scripts through Greek, and the Arabic and Brahmic scripts through Aramaic.
In a written language, a logogram, also logograph or lexigraph, is a written character that represents a semantic component of a language, such as a word or morpheme. Chinese characters as used in Chinese as well as other languages are logograms, as are Egyptian hieroglyphs and characters in cuneiform script. A writing system that primarily uses logograms is called a logography. Non-logographic writing systems, such as alphabets and syllabaries, are phonemic: their individual symbols represent sounds directly and lack any inherent meaning. However, all known logographies have some phonetic component, generally based on the rebus principle, and the addition of a phonetic component to pure ideographs is considered to be a key innovation in enabling the writing system to adequately encode human language.
The Ugaritic writing system is a cuneiform abjad with syllabic elements used from around either 1400 BCE or 1300 BCE for Ugaritic, an extinct Northwest Semitic language. It was discovered in Ugarit, modern Ras Shamra, Syria, in 1928. It has 30 letters. Other languages, particularly Hurrian, were occasionally written in the Ugaritic script in the area around Ugarit, although not elsewhere.
Cuneiform is a logo-syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform scripts are marked by and named for the characteristic wedge-shaped impressions which form their signs. Cuneiform is the earliest known writing system and was originally developed to write the Sumerian language of southern Mesopotamia.
The Cypriot or Cypriote syllabary is a syllabic script used in Iron Age Cyprus, from about the 11th to the 4th centuries BCE, when it was replaced by the Greek alphabet. It has been suggested that the script remained in use as late as the 1st century BC. A pioneer of that change was King Evagoras of Salamis. It is thought to be descended from the Cypro-Minoan syllabary, itself a variant or derivative of Linear A. Most texts using the script are in the Arcadocypriot dialect of Greek, but also one bilingual inscription was found in Amathus.
Maya script, also known as Maya glyphs, is historically the native writing system of the Maya civilization of Mesoamerica and is the only Mesoamerican writing system that has been substantially deciphered. The earliest inscriptions found which are identifiably Maya date to the 3rd century BCE in San Bartolo, Guatemala. Maya writing was in continuous use throughout Mesoamerica until the Spanish conquest of the Maya in the 16th and 17th centuries. Though modern Mayan languages are almost entirely written using the Latin alphabet rather than Maya script, there have been recent developments encouraging a revival of the Maya glyph system.
The Minoan language is the language of the ancient Minoan civilization of Crete written in the Cretan hieroglyphs and later in the Linear A syllabary. As the Cretan hieroglyphs are undeciphered and Linear A only partly deciphered, the Minoan language is unknown and unclassified. With the existing evidence, it is even impossible to be certain that the two scripts record the same language.
The Byblos script, also known as the Byblos syllabary, Pseudo-hieroglyphic script, Proto-Byblian, Proto-Byblic, or Byblic, is an undeciphered writing system, known from ten inscriptions found in Byblos, a coastal city in Lebanon. The inscriptions are engraved on bronze plates and spatulas, and carved in stone. They were excavated by Maurice Dunand, from 1928 to 1932, and published in 1945 in his monograph Byblia Grammata. The inscriptions are conventionally dated to the second millennium BC, probably between the 18th and 15th centuries BC.
Many people have claimed to have deciphered the Phaistos Disc.
The Cypro-Minoan syllabary (CM), more commonly called the Cypro-Minoan Script, is an undeciphered syllabary used on the island of Cyprus and at its trading partners during the late Bronze Age and early Iron Age. The term "Cypro-Minoan" was coined by Arthur Evans in 1909 based on its visual similarity to Linear A on Minoan Crete, from which CM is thought to be derived. Approximately 250 objects—such as clay balls, cylinders, and tablets which bear Cypro-Minoan inscriptions, have been found. Discoveries have been made at various sites around Cyprus, as well as in the ancient city of Ugarit on the Syrian coast. It is thought to be somehow related to the later Cypriot syllabary.
Aegean script or Cretan script refers to a group of scripts that originate from the island of Crete. It may also refer to:
Mesoamerica, along with Mesopotamia and China, is one of three known places in the world where writing is thought to have developed independently. Mesoamerican scripts deciphered to date are a combination of logographic and syllabic systems. They are often called hieroglyphs due to the iconic shapes of many of the glyphs, a pattern superficially similar to Egyptian hieroglyphs. Fifteen distinct writing systems have been identified in pre-Columbian Mesoamerica, many from a single inscription. The limits of archaeological dating methods make it difficult to establish which was the earliest and hence the progenitor from which the others developed. The best documented and deciphered Mesoamerican writing system, and the most widely known, is the classic Maya script. Earlier scripts with poorer and varying levels of decipherment include the Olmec hieroglyphs, the Zapotec script, and the Isthmian script, all of which date back to the 1st millennium BC. An extensive Mesoamerican literature has been conserved, partly in indigenous scripts and partly in postconquest transcriptions in the Latin script.
Many undeciphered writing systems exist today; most date back several thousand years, although some more modern examples do exist. The term "writing systems" is used here loosely to refer to groups of glyphs which appear to have representational symbolic meaning, but which may include "systems" that are largely artistic in nature and are thus not examples of actual writing.
A writing system comprises a set of symbols, called a script, as well as the rules by which the script represents a particular language. The earliest writing was invented during the late 4th millennium BC. Throughout history, each writing system invented without prior knowledge of writing gradually evolved from a system of proto-writing that included a small number of ideographs, which were not fully capable of encoding spoken language, and lacked the ability to express a broad range of ideas.