Last updated

In linguistics, a grapheme is the smallest functional unit of a writing system. [1]


There are two main opposing grapheme concepts. [2] In the so-called referential conception, graphemes are interpreted as the smallest units of writing that correspond with sounds (more accurately phonemes). In this concept, the sh in the written English word shake would be a grapheme because it represents the phoneme ʃ. This referential concept is linked to the dependency hypothesis that claims that writing merely depicts speech. By contrast, the analogical concept defines graphemes analogously to phonemes, i.e. via written minimal pairs such as shake vs. snake. In this example, h and n are graphemes because they distinguish two words. This analogical concept is associated with the autonomy hypothesis which holds that writing is a system in its own right and should be studied independently from speech. Both concepts have weaknesses. [3]

Some models adhere to both concepts simultaneously by including two individual units, [4] which are given names such as graphemic grapheme for the grapheme according to the analogical conception (h in shake), and phonological-fit grapheme for the grapheme according to the referential concept (sh in shake). [5]

In newer concepts, in which the grapheme is interpreted semiotically as a dyadic linguistic sign, [6] it is defined as a minimal unit of writing that is both lexically distinctive and corresponds with a linguistic unit (phoneme, syllable, or morpheme). [7]

The word grapheme, coined in analogy with phoneme, is derived from Ancient Greek γράφω (gráphō) 'write', and the suffix -eme by analogy with phoneme and other names of emic units. The study of graphemes is called graphemics.

The concept of graphemes is abstract and similar to the notion in computing of a character. By comparison, a specific shape that represents any particular grapheme in a specific typeface is called a glyph. For example, the grapheme corresponding to the abstract concept of "the Arabic numeral one" has a distinct glyph with identical meaning (an allograph) in each of many typefaces (such as, for example, a serif form as in Times New Roman and a sans-serif form as in Helvetica).


Graphemes are often notated within angle brackets: a, B, etc. [8] This is analogous to both the slash notation (/a/, /b/) used for phonemes, and the square bracket notation used for phonetic transcriptions ([a], [b]).


In the same way that the surface forms of phonemes are speech sounds or phones (and different phones representing the same phoneme are called allophones), the surface forms of graphemes are glyphs (sometimes "graphs"), namely concrete written representations of symbols, and different glyphs representing the same grapheme are called allographs.

Thus, a grapheme can be regarded as an abstraction of a collection of glyphs that are all functionally equivalent.

For example, in written English (or other languages using the Latin alphabet), there are two different physical representations of the lowercase latin letter "a": "a" and "ɑ". Since, however, the substitution of either of them for the other cannot change the meaning of a word, they are considered to be allographs of the same grapheme, which can be written a. Italic and bold face are also allographic.

There is some disagreement as to whether capital and lower case letters are allographs or distinct graphemes. Capitals are generally found in certain triggering contexts that do not change the meaning of a word: a proper name, for example, or at the beginning of a sentence, or all caps in a newspaper headline. In other contexts, capitalization can determine meaning: compare, for example Polish and polish: the former is a language, the latter is for shining shoes. Some linguists consider digraphs like the sh in ship to be distinct graphemes, but these are generally analyzed as sequences of graphemes. Non-stylistic ligatures, however, such as æ, are distinct graphemes, as are various letters with distinctive diacritics, such as ç.

Types of grapheme

The principal types of graphemes are logograms (more accurately termed morphograms [9] ), which represent words or morphemes (for example Chinese characters, the ampersand "&" representing the word and, Arabic numerals); syllabic characters, representing syllables (as in Japanese kana); and alphabetic letters, corresponding roughly to phonemes (see next section). For a full discussion of the different types, see Writing system § Functional classification.

There are additional graphemic components used in writing, such as punctuation marks, mathematical symbols, word dividers such as the space, and other typographic symbols. Ancient logographic scripts often used silent determinatives to disambiguate the meaning of a neighboring (non-silent) word.

Relationship with phonemes

As mentioned in the previous section, in languages that use alphabetic writing systems, many of the graphemes stand in principle for the phonemes (significant sounds) of the language. In practice, however, the orthographies of such languages entail at least a certain amount of deviation from the ideal of exact grapheme–phoneme correspondence. A phoneme may be represented by a multigraph (sequence of more than one grapheme), as the digraph sh represents a single sound in English (and sometimes a single grapheme may represent more than one phoneme, as with the Russian letter я or the Spanish c). Some graphemes may not represent any sound at all (like the b in English debt or the h in all Spanish words containing the said letter), and often the rules of correspondence between graphemes and phonemes become complex or irregular, particularly as a result of historical sound changes that are not necessarily reflected in spelling. "Shallow" orthographies such as those of standard Spanish and Finnish have relatively regular (though not always one-to-one) correspondence between graphemes and phonemes, while those of French and English have much less regular correspondence, and are known as deep orthographies.

Multigraphs representing a single phoneme are normally treated as combinations of separate letters, not as graphemes in their own right. However, in some languages a multigraph may be treated as a single unit for the purposes of collation; for example, in a Czech dictionary, the section for words that start with ch comes after that for h. [10] For more examples, see Alphabetical order § Language-specific conventions.

See also

Related Research Articles

Alphabet Standard set of letters that represent phonemes of a spoken language

An alphabet is a standardized set of basic written symbols or graphemes that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syllable, for instance, and logographic systems use characters to represent words, morphemes, or other semantic units.

Glyph Element of writing

In typography, a glyph is an elemental symbol within an agreed set of symbols, intended to represent a readable character for the purposes of writing. Glyphs are considered to be unique marks that collectively add up to the spelling of a word or contribute to a specific meaning of what is written, with that meaning dependent on cultural and social usage.

Morphophonology is the branch of linguistics that studies the interaction between morphological and phonological or phonetic processes. Its chief focus is the sound changes that take place in morphemes when they combine to form words.

An orthography is a set of conventions for writing a language. It includes norms of spelling, hyphenation, capitalization, word breaks, emphasis, and punctuation.

In phonology and linguistics, a phoneme is a unit of sound that distinguishes one word from another in a particular language.

Syllabary Set of written symbols that represent the syllables or moras which make up spoken words.

In the linguistic study of written languages, a syllabary is a set of written symbols that represent the syllables or moras which make up words.

Logogram Grapheme which represents a word or a morpheme

In a written language, a logogram or logograph is a written character that represents a word or morpheme. Chinese characters are generally logograms, as are many hieroglyphic and cuneiform characters. The use of logograms in writing is called logography, and a writing system that is based on logograms is called a logography or logographic system. All known logographies have some phonetic component, generally based on the rebus principle.

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), and Korean (hanja).

Finnish orthography is based on the Latin script, and uses an alphabet derived from the Swedish alphabet, officially comprising 29 letters but also has two additional letters found in some loanwords. The Finnish orthography strives to represent all morphemes phonologically and, roughly speaking, the sound value of each letter tends to correspond with its value in the International Phonetic Alphabet (IPA) – although some discrepancies do exist.

A phonemic orthography is an orthography in which the graphemes correspond to the phonemes of the language. Natural languages rarely have perfectly phonemic orthographies; a high degree of grapheme-phoneme correspondence can be expected in orthographies based on alphabetic writing systems, but they differ in how complete this correspondence is. English orthography, for example, is alphabetic but highly nonphonemic; it was once mostly phonemic during the Middle English stage, when the modern spellings originated, but spoken English changed rapidly while the orthography was much more stable, resulting in the modern nonphonemic situation. However, because of their relatively recent modernizations compared to English, the Romanian, Italian, Turkish, Spanish, Finnish, Czech, Latvian and Polish orthographic systems come much closer to being consistent phonemic representations.

Digraph (orthography)

A digraph or digram is a pair of characters used in the orthography of a language to write either a single phoneme, or a sequence of phonemes that does not correspond to the normal values of the two characters combined.

Allography, from the Greek for "other writing", has several meanings which all relate to how words and sounds are written down.

Letter (alphabet) Grapheme in an alphabetic system of writing

A letter is a segmental symbol of a phonemic writing system. The inventory of all letters forms the alphabet. Letters broadly correspond to phonemes in the spoken form of the language, although there is rarely a consistent, exact correspondence between letters and phonemes.

Alphabetic principle

According to the alphabetic principle, letters and combinations of letters are the symbols used to represent the speech sounds of a language based on systematic and predictable relationships between written letters, symbols, and spoken words. The alphabetic principle is the foundation of any alphabetic writing system. In the education field, it is known as the alphabetic code.

Graphemics or graphematics is the linguistic study of writing systems and their basic components, i.e. graphemes.

Dyslexia is a complex, lifelong disorder involving difficulty in learning to read or interpret words, letters and other symbols. Dyslexia does not affect general intelligence, but is often co-diagnosed with ADHD. There are at least three sub-types of dyslexia that have been recognized by researchers: orthographic, or surface dyslexia, phonological dyslexia and mixed dyslexia where individuals exhibit symptoms of both orthographic and phonological dyslexia. Studies have shown that dyslexia is genetic and can be passed down through families, but it is important to note that, although a genetic disorder, there is no specific locus in the brain for reading and writing. The human brain does have language centers, but written language is a cultural artifact, and a very complex one requiring brain regions designed to recognize and interpret written symbols as representations of language in rapid synchronization. The complexity of the system and the lack of genetic predisposition for it is one possible explanation for the difficulty in acquiring and understanding written language.

Writing system Any conventional method of visually representing verbal communication

A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form of information storage and transfer. Writing systems require shared understanding between writers and readers of the meaning behind the sets of characters that make up a script. Writing is usually recorded onto a durable medium, such as paper or electronic storage, although non-durable methods may also be used, such as writing on a computer display, on a blackboard, in sand, or by skywriting. Reading a text can be accomplished purely in the mind as an internal process, or expressed orally.

The orthographic depth of an alphabetic orthography indicates the degree to which a written language deviates from simple one-to-one letter–phoneme correspondence. It depends on how easy it is to predict the pronunciation of a word based on its spelling: shallow orthographies are easy to pronounce based on the written word, and deep orthographies are difficult to pronounce based on how they are written.

In linguistics and related fields, an emic unit is a type of abstract object. Kinds of emic units are generally denoted by terms with the suffix -eme, such as phoneme, grapheme, and morpheme. The term "emic unit" is defined by Nöth (1995) to mean "an invariant form obtained from the reduction of a class of variant forms to a limited number of abstract units". The variant forms are called etic units. This means that a given emic unit is considered to be a single underlying object that may have a number of different observable "surface" representations.


  1. Coulmas, F. (1996), The Blackwell's Encyclopedia of Writing Systems. Oxford: Blackwells, p.174
  2. Kohrt, M. (1986), The term ‘grapheme’ in the history and theory of linguistics. In G. Augst (Ed.), New trends in graphemics and orthography. Berlin: De Gruyter, pp. 80–96. doi : 10.1515/9783110867329.80
  3. Lockwood, D. G. (2001), Phoneme and grapheme: How parallel can they be? LACUS Forum 27, 307–316.
  4. Rezec, O. (2013), Ein differenzierteres Strukturmodell des deutschen Schriftsystems. Linguistische Berichte 234, pp. 227–254.
  5. Herrick, E. M. (1994), Of course a structural graphemics is possible! LACUS Forum 21, pp. 413–424.
  6. Fedorova, L. (2013), The development of graphic representation in abugida writing: The akshara’s grammar. Lingua Posnaniensis 55:2, pp. 49–66. doi : 10.2478/linpo-2013-0013
  7. Meletis, D. (2019), The grapheme as a universal basic unit of writing. Writing Systems Research. doi : 10.1080/17586801.2019.1697412
  8. The Cambridge Encyclopedia of Language, second edition, Cambridge University Press, 1997, p. 196
  9. Joyce, T. (2011), The significance of the morphographic principle for the classification of writing systems, Written Language and Literacy 14:1, pp. 58–81. doi : 10.1075/wll.14.1.04joy
  10. Zeman, Dan. "Czech Alphabet, Code Page, Keyboard, and Sorting Order". Archived from the original on 15 April 2012. Retrieved 31 March 2012.