Typographic approximation

Last updated November 17, 2024

A typographic approximation is a replacement of an element of the writing system (usually a glyph) with another glyph or glyphs. The replacement may be a nearly homographic character, a digraph, or a character string. An approximation is different from a typographical error in that an approximation is intentional and aims to preserve the visual appearance of the original. The concept of approximation also applies to the World Wide Web and other forms of textual information available via digital media, though usually at the level of characters, not glyphs.

Historically, the main cause of typographic approximation was a low quantity of glyphs (such as letterforms and symbols) available for printing. In the age of World Wide Web and digital typesetting, especially after the advent of Unicode and enormous amount of computer fonts, typographic approximations are usually caused either by low ability of humans to distinguish and find needed symbols or by inadequate replacement patterns in word processors,^[1] rather than by lack of available characters.

Normative:	$3 \times 2 - 1$
Approximated:	3 x 2 - 1
An ASCII approximation of an arithmetical expression

Typewriter and line printer approximations

Merger of characters

On typewriter, several characters were merged due to limited size of glyph repertoire. Several modern computing characters appeared by merger of different symbols, such as the "typewriter" apostrophe, ', which can denote an apostrophe proper, ’, a single quotation mark, or the prime symbol.

Non-spacing modifiers

Some typewriters have non-spacing keys for use as diacritical marks. After the typist pushes, say, acute accent ◌́ the caret does not move. This allows the typist to overstrike this mark by a spacing letter, say, e and obtain é, an accented letter. Due to geometrical restrictions of a monospaced font, the result could not always be perfect. For example, overstriking was unlikely to be a feasible method to produce uppercase accented letters, such as É.

Overstrike was used on line printers for the same function. This contributed to standardization of such characters as U+0060` GRAVE ACCENT .

Overstrike of the same letter was used to simulate boldface letters on line printers.

ASCII approximations

An ASCII approximation (above) may be ugly, but giving some representation of several symbols. Replacements of non-ASCII characters (others than default "*") are highlighted in yellow.

Original text:

                   ASCII*Decima

ASC Dec Hex Binary   ╔═╤════════════════  ║│  0  00 00000000  ║☺│  1  01 00000001  ║☻│  2  02 00000010  ║♥│  3  03 00000011  ║♦│  4  04 00000100  ║♣│  5  05 00000101  ║♠│  6  06 00000110  ║•│  7  07 00000111  ║◘│  8  08 00001000  ║∘│  9  09 00001001  ║◙│ 10  0a 00001010  ║♂│ 11  0b 00001011

ASC Dec Hex ╔═╤════════ ║►│ 16  10  ║◄│ 17  11  ║↕│ 18  12  ║‼│ 19  13  ║¶│ 20  14  ║§│ 21  15  ║▬│ 22  16  ║↨│ 23  17  ║↑│ 24  18  ║↓│ 25  19  ║→│ 26  1a  ║←│ 27  1b

║─│196  c4 11000100  ║═│205  cd 11001101

║││179  b3  ║║│186  ba

The US-ASCII character set and other variants of ISO/IEC 646 contains 95 graphic characters. It is comparable with a (Latin script) typewriter and insufficient for a quality typography. But high availability and robustness of ASCII character encoding prompted computer users to invent ASCII substitutes for various glyphs.

The following ASCII characters are used to approximate certain characters. Note that there are many Latin letters that are homographic to letters of other scripts, however those Latin letters are not listed below.

U+0020 SPACE (space): alignment and justification.
U+0022"QUOTATION MARK: various type of double quotes, double prime ″ .
U+0023#NUMBER SIGN: sharp symbol ♯ .
U+0027'APOSTROPHE: various type of single quotes, apostrophe ’, prime ′ .
Parentheses U+0028(LEFT PARENTHESISU+0029)RIGHT PARENTHESIS: encircled characters, such as (c) for Copyright symbol © .
U+002A*ASTERISK – see Asterisk.
U+002B+PLUS SIGN – various symbols with strokes extending to left, up, right and down.
U+002D- HYPHEN-MINUS – probably an ASCII character the most used for approximations. A conventional representation of hyphen, an approximation of dash (especially as -- and ---), minus sign − and line drawing horizontal line ─ (see the image).
U+002E.FULL STOP: various dot-like symbols, see Full stop.
U+002F/SOLIDUS – see Slash (punctuation).
U+00311DIGIT ONE: Turkish dotless ı, Cyrillic palochka Ӏ.
U+00333DIGIT THREE: IPA reversed epsilon ɜ , Cyrillic letter З .
U+00344DIGIT FOUR: Cyrillic letter Ч .
U+00388DIGIT EIGHT: various non-Latin letters and symbols with similar grapheme.
U+003A:COLON – see Colon (punctuation).
U+003C<LESS-THAN SIGN and U+003E>GREATER-THAN SIGN: chevrons ⟨ ⟩, angle quotes ‹ ›, horizontal arrows (especially as digraphs <- and ->).
U+003D=EQUALS SIGN: line drawing horizontal double line ═ (see the image), double hyphen.
U+003F?QUESTION MARK – although not an approximation, the question sign sometimes replaces unrepresented and unrecognized characters.
U+0040@COMMERCIAL AT – see At sign.
U+004ENLATIN CAPITAL LETTER N: Numero sign №.
U+0054TLATIN CAPITAL LETTER T: various symbols with strokes extending to left, right and down, but not up.
U+0055ULATIN CAPITAL LETTER U: set union ∪.
U+0056VLATIN CAPITAL LETTER V: logical OR ∨ .
U+0058XLATIN CAPITAL LETTER X: X mark ✗.
U+005B[LEFT SQUARE BRACKET and U+005D]RIGHT SQUARE BRACKET: checkbox and similar rectangular pictograms.
U+005E^CIRCUMFLEX ACCENT: logical AND ∧ , upwards arrow ↑, and similar symbols with the wedge at the top.
U+005F_LOW LINE – see Underscore.
U+0060`GRAVE ACCENT – opening single quote ‘.
U+0062bLATIN SMALL LETTER B - flat symbol ♭
U+006FoLATIN SMALL LETTER O: bullets and various circle-like symbols such as ∘ and ∞ (using two consecutive characters).
U+0075uLATIN SMALL LETTER U: μ — SI prefix micro- or lowercase Greek letter mu
U+0076vLATIN SMALL LETTER V: downwards arrow ↓, and similar symbols with the wedge at the bottom.
U+0078xLATIN SMALL LETTER X: multiplication sign ×.
U+007C|VERTICAL LINE (on the image, this ASCII character is rendered as a broken bar¦): line drawing vertical symbols.
U+007E~TILDE – see Tilde.

Approximation of non-glyphs

There exist various approximation for typographic alignment. For example, justification may be emulated with inserting of spaces, and flush-right alignment may be done by padding with spaces.

There are various techniques for approximation of tables (historically used for text mode displays), such as box-drawing characters.

Related Research Articles

A diacritic is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek διακριτικός, from διακρίνω. The word diacritic is a noun, though it is sometimes used in an attributive sense, whereas diacritical is only an adjective. Some diacritics, such as the acute ⟨ó⟩, grave ⟨ò⟩, and circumflex ⟨ô⟩, are often called accents. Diacritics may appear above or below a letter or in some other position such as within the letter or between two letters.

A glyph is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A grapheme, or part of a grapheme, or sometimes several graphemes in combination can be represented by a glyph.

The Coptic script is the script used for writing the Coptic language, the most recent development of Egyptian. The repertoire of glyphs is based on the uncial Greek alphabet, augmented by letters borrowed from the Egyptian Demotic. It was the first alphabetic script used for the Egyptian language. There are several Coptic alphabets, as the script varies greatly among the various dialects and eras of the Coptic language.

The apostrophe is a punctuation mark, and sometimes a diacritical mark, in languages that use the Latin alphabet and some other alphabets. In English, the apostrophe is used for three basic purposes:

Ø is a letter used in the Danish, Norwegian, Faroese, and Southern Sámi languages. It is mostly used as to represent the mid front rounded vowels, such as and, except for Southern Sámi where it is used as an diphthong.

The tilde is a grapheme ⟨˜⟩ or ⟨~⟩ with a number of uses. The name of the character came into English from Spanish tilde, which in turn came from the Latin titulus, meaning 'title' or 'superscription'. Its primary use is as a diacritic (accent) in combination with a base letter. Its freestanding form is used in modern texts mainly to indicate approximation.

The backtick` is a typographical mark used mainly in computing. It is also known as backquote, grave, or grave accent.

The ʻokina is the letter that transcribes the glottal stop consonant in Hawaiian. It does not have distinct uppercase and lowercase forms, and is represented electronically by the modifier letter turned comma: ʻ.

In typography, overstrike is a method of printing characters that are missing from the printer's character set. The character is created by placing one character on another one – for example, overstriking ⟨L⟩ with ⟨-⟩ results in printing a ⟨Ł⟩ character.

<span class="mw-page-title-main">ArmSCII</span> Set of obsolete single-byte character encodings

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166–9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard.

<span class="mw-page-title-main">Backspace</span> Key on a keyboard

Backspace is the keyboard key that in typewriters originally pushed the carriage one position backwards, and in modern computer systems typically moves the display cursor one position backwards, deletes the character at that position, and shifts back any text after that position by one character.

<span class="mw-page-title-main">Scribal abbreviation</span> Abbreviations used by ancient and medieval scribes

Scribal abbreviations, or sigla, are abbreviations used by ancient and medieval scribes writing in various languages, including Latin, Greek, Old English and Old Norse.

<span class="mw-page-title-main">Slashed zero</span> Glyph variant of numeral 0 (zero) with slash

The dotted or slashed zero 0̷ is a representation of the Arabic digit "0" (zero) with a slash or a dot through it. This variant zero glyph is often used to distinguish the digit "zero" ("0") from the Latin script letter "O" anywhere that the distinction needs emphasis, particularly in encoding systems, scientific and engineering applications, computer programming, and telecommunications. It thus helps to differentiate characters that would otherwise be homoglyphs. It was commonly used during the punch card era, when programs were typically written out by hand, to avoid ambiguity when the character was later typed on a card punch.

In orthography and typography, a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar but may have differing meaning. The designation is also applied to sequences of characters sharing these properties.

The internationalized domain name (IDN) homoglyph attack is a method used by malicious parties to deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike. For example, the Cyrillic, Greek and Latin alphabets each have a letter ⟨o⟩ that has the same shape but different meaning from its counterparts.

Unicode has a certain amount of duplication of characters. These are pairs of single Unicode code points that are canonically equivalent. The reason for this are compatibility issues with legacy systems.

The hyphen-minus symbol - is the form of hyphen most commonly used in digital documents. On most keyboards, it is the only character that resembles a minus sign or a dash so it is also used for these. The name hyphen-minus derives from the original ASCII standard, where it was called hyphen (minus). The character is referred to as a hyphen, a minus sign, or a dash according to the context where it is being used.

Extended ASCII is a repertoire of character encodings that include the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

The German keyboard layout is family of QWERTZ keyboard layouts commonly used in Austria and Germany. It is based on one defined in a former edition of the German standard DIN 2137–2. The current edition DIN 2137-1:2012-06 standardizes it as the first (basic) one of three layouts, calling it "T1".

References

↑ Phin, Christopher (2008-03-29). "Ten typographic mistakes everyone makes". Archived from the original on May 3, 2012. Retrieved August 17, 2015.{{cite web}}: CS1 maint: unfit URL (link)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Phin, Christopher (2008-03-29). "Ten typographic mistakes everyone makes". Archived from the original on May 3, 2012. Retrieved August 17, 2015.{{cite web}}: CS1 maint: unfit URL (link)

[1]