Allograph

Last updated
<g>  rendered with or without a looptail are allographs of each other LowercaseG.svg
g rendered with or without a looptail are allographs of each other

In graphemics and typography, the term allograph is used of a glyph that is a design variant of a letter or other grapheme, such as a letter, a number, an ideograph, a punctuation mark or other typographic symbol. In graphemics, an obvious example in English (and many other writing systems) is the distinction between uppercase and lowercase letters. Allographs can vary greatly, without affecting the underlying identity of the grapheme. Even if the word "cat" is rendered as "cAt", it remains recognizable as the sequence of the three graphemes c, a, t. [1]

Contents

Letters and other graphemes can also have significant variations that may be missed by many readers. The letter g, for example, has two common forms (glyphs) in different typefaces, and a wide variety in people's handwriting. A positional example of allography is the long s (ſ), a symbol which was once a widely used as a non-final allograph of the lowercase letter s.

A grapheme variant can acquire a separate meaning in a specialized writing system, such as the International Phonetic Alphabet used in linguistics. Several such variants have distinct code points in Unicode and thus are not allographs for some applications. [2]

Han characters

In the Han script, there exist several graphemes that have more than one written representation. Han typefaces often contain many variants of some graphemes. Different regional standards have adopted certain character variants. For instance:

StandardAllographDefinition
Mainland China
Japan
Taiwan

Typography

Official dimensions of the euro sign Euro Construction.svg
Official dimensions of the euro sign
Allographs of the sign in a selection of type faces Moreeurofonts.svg
Allographs of the sign in a selection of type faces

In typography, the term 'allograph' is used more specifically to describe the different representations of the same grapheme or character in different typefaces. [3] The resulting font elements may look quite different in shape and style from the reference character or each other, but nevertheless their meaning remains the same. [4]

In Unicode, a given character is allocated a code point: all allographs of that character have the same code point and thus the essential meaning is retained irrespective of font choice at time of printing or display. Typically, for example, U+0067gLATIN SMALL LETTER G is given a loop tail in serif typefaces but not in sans-serif faces (e.g., Times New Roman: g, Helvetica: g) but its code point is constant and its meaning persists irrespective of typeface. (The code U+0261ɡLATIN SMALL LETTER SCRIPT G in the IPA Extensions block is specified for use with the International Phonetic Alphabet.)

Homoglyph

The concept of the allograph may be compared and contrasted with that of the homoglyph   glyphs of different meaning that are visually similar. For example, the letter O and the figure 0 have similar shape but have different meanings; the three letters A, Α and А look identical but are characters from three different scripts (Latin, Greek and Cyrillic).

See also

Related Research Articles

<span class="mw-page-title-main">Cyrillic script</span> Writing system used for various Eurasian languages

The Cyrillic script, Slavonic script or simply Slavic script is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Eastern Europe, the Caucasus, Central Asia, North Asia, and East Asia, and used by many other minority languages.

<span class="mw-page-title-main">Grapheme</span> Smallest functional written unit

In linguistics, a grapheme is the smallest functional unit of a writing system. The word grapheme is derived from Ancient Greek γράφω (gráphō) 'write' and the suffix -eme by analogy with phoneme and other names of emic units. The study of graphemes is called graphemics. The concept of graphemes is abstract and similar to the notion in computing of a character. By comparison, a specific shape that represents any particular grapheme in a given typeface is called a glyph.

<span class="mw-page-title-main">Glyph</span> Purposeful written mark

A glyph is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A grapheme, or part of a grapheme, or sometimes several graphemes in combination can be represented by a glyph.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

<span class="mw-page-title-main">Typeface</span> Set of characters that share common design features

A typeface is a design of letters, numbers and other symbols, to be used in printing or for electronic display. Most typefaces include variations in size, weight, slope, width, and so on. Each of these variations of the typeface is a font.

Koppa or qoppa is a letter that was used in early forms of the Greek alphabet, derived from Phoenician qoph (𐤒). It was originally used to denote the sound, but dropped out of use as an alphabetic character and replaced by Kappa (Κ). It has remained in use as a numeral symbol (90) in the system of Greek numerals, although with a modified shape. Koppa is the source of Latin Q, as well as the Cyrillic numeral sign of the same name (Koppa).

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature shared in common by written Chinese (hanzi), Japanese (kanji), Korean (hanja) and Vietnamese.

<span class="mw-page-title-main">Arial</span> Neo-grotesque sans-serif typeface

Arial is a sans-serif typeface and set of computer fonts in the neo-grotesque style. Fonts from the Arial family are included with all versions of Microsoft Windows after Windows 3.1, as well as in other Microsoft programs, Apple's macOS, and many PostScript 3 printers.

<span class="mw-page-title-main">Ligature (writing)</span> Glyph combining two or more letterforms

In writing and typography, a ligature occurs where two or more graphemes or letters are joined to form a single glyph. Examples are the characters ⟨æ⟩ and ⟨œ⟩ used in English and French, in which the letters ⟨a⟩ and ⟨e⟩ are joined for the first ligature and the letters ⟨o⟩ and ⟨e⟩ are joined for the second ligature. For stylistic and legibility reasons, ⟨f⟩ and ⟨i⟩ are often merged to create ⟨fi⟩ ; the same is true of ⟨s⟩ and ⟨t⟩ to create ⟨st⟩. The common ampersand, ⟨&⟩, developed from a ligature in which the handwritten Latin letters ⟨e⟩ and ⟨t⟩ were combined.

<span class="mw-page-title-main">Letter case</span> Uppercase or lowercase

Letter case is the distinction between the letters that are in larger uppercase or capitals and smaller lowercase in the written representation of certain languages. The writing systems that distinguish between the upper- and lowercase have two parallel sets of letters: each in the majuscule set has a counterpart in the minuscule set. Some counterpart letters have the same shape, and differ only in size, but for others the shapes are different. The two case variants are alternative representations of the same letter: they have the same name and pronunciation and are typically treated identically when sorting in alphabetical order.

<span class="mw-page-title-main">R rotunda</span> Variant of the Latin letter R (ꝛ)

The r rotunda ⟨ ꝛ ⟩, "rounded r", is a historical calligraphic variant of the minuscule (lowercase) letter Latin r used in full script-like typefaces, especially blackletters.

<span class="mw-page-title-main">Homoglyph</span> Different glyphs which are visually similar

In orthography and typography, a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar but may have differing meaning. The designation is also applied to sequences of characters sharing these properties.

Unicode has a certain amount of duplication of characters. These are pairs of single Unicode code points that are canonically equivalent. The reason for this are compatibility issues with legacy systems.

<span class="mw-page-title-main">Font</span> Particular size, weight and style of a typeface

In metal typesetting, a font is a particular size, weight and style of a typeface. Each font is a matched set of type, with a piece for each glyph. A typeface consists of various fonts that share an overall design.

In a writing system, a letter is a grapheme that generally corresponds to a phoneme—the smallest functional unit of speech—though there is rarely total one-to-one correspondence between the two. An alphabet is a writing system that uses letters.

Apple's Macintosh computer supports a wide variety of fonts. This support was one of the features that initially distinguished it from other systems.

<span class="mw-page-title-main">Microsoft Sans Serif</span> Neo-grotesque sans-serif typeface

Microsoft Sans Serif is a sans-serif typeface introduced with early Microsoft Windows versions. It is the successor of MS Sans Serif, formerly Helv, a proportional bitmap font introduced in Windows 1.0. Both typefaces are very similar in design to Arial and Helvetica. The typeface was designed to match the MS Sans bitmap included in the early releases of Microsoft Windows.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

A numeral is a character that denotes a number. The decimal number digits 0–9 are used widely in various writing systems throughout the world, however the graphemes representing the decimal digits differ widely. Therefore Unicode includes 22 different sets of graphemes for the decimal digits, and also various decimal points, thousands separators, negative signs, etc. Unicode also includes several non-decimal numerals such as Aegean numerals, Roman numerals, counting rod numerals, Mayan numerals, Cuneiform numerals and ancient Greek numerals. There is also a large number of typographical variations of the Western Arabic numerals provided for specialized mathematical use and for compatibility with earlier character sets, such as ² or ②, and composite characters such as ½.

References

  1. "allograph". The Cambridge Encyclopedia of Language (second ed.). Cambridge University Press. 1997. p. 196.
  2. Kumar, Sanjeev (2012-10-15). "A Comparative Study of UTF-8, UTF-16, and UTF-32 of Unicode Code Point". The IUP Journal of Telecommunications. Rochester, NY. IV (2): 50–59. SSRN   2161812.
  3. Thomas Milo (2012). "Arabic Script Tutorial". nuqta.com. Retrieved 24 November 2019. In Arabic the abstract, nominal graphemes are represented by context-dependent allographs. Simplified support for Arabic handles contextual allographs according to two patterns, discontinuous and continuous assimilation. (Allographs and Ligatures)
  4. David Rothlein; Brenda Rapp (3 April 2017). "The role of allograph representations in font-invariant letter identification". Journal of Experimental Psychology: Human Perception and Performance. 43 (7): 1411–1429. doi:10.1037/xhp0000384. PMC   5481478 . PMID   28368166.