Fallback font

Last updated

A fallback font is a reserve typeface containing symbols for as many Unicode characters as possible. [1] When a display system encounters a character that is not part of the repertoire of any of the other available fonts, a symbol from a fallback font is used instead. Typically, a fallback font will contain symbols representative of the various types of Unicode characters.

Contents

Systems that do not offer a fallback font typically display black or white rectangles, question marks, the Unicode Replacement Character (U+FFFD), or nothing at all, in place of missing characters. Placing one or more fallback fonts at the end of a list of preferred fonts ensures that there are no missing characters.

Unicode BMP Fallback font

 0 0 
 2 0 

The Unicode BMP Fallback font is a Unicode font that was originally created for debugging purposes. It contains a glyph for every character in the Unicode Basic Multilingual Plane. Each glyph consists of a box containing the four hexadecimal digits corresponding to the Unicode value. [2] The example to the left is a mock-up of the glyph for a space character (U+0020).

Unlike the Unicode Last Resort font, the Unicode BMP Fallback font displays a different glyph for each different Unicode character, but cannot display all Unicode characters. Because four hexadecimal digits can only represent 64K characters (0000=0, FFFF=65,535) the Unicode BMP Fallback is limited to the 64K characters in the Unicode Basic Multilingual Plane.

Unicode Last Resort font

Sample glyphs from Apple's Last Resort font. LastResort samples.svg
Sample glyphs from Apple's Last Resort font.

As of Unicode version 5.0, the Unicode consortium provides a fallback font to represent types of Unicode characters. This is a version of the macOS Last Resort system font, modified to work on non-Apple platforms and made available by Apple via the Unicode Consortium. [3]

The symbols provided by the Unicode Last Resort font place glyphs into categories based on their location in the Unicode system and provide a hint to the user about which font or script is required to view the unavailable characters. The symbols provided by the Unicode Last Resort font are square with rounded corners with a bold outline. In the left and right sides of the outline, the Unicode range that the character belongs to is given using hexadecimal digits. Top and bottom are used for one or two descriptions of the Unicode block. A symbol representative of the block is centered inside the square. [4]

Unlike the Unicode BMP Fallback font or the GNU Unifont, the Unicode Last Resort font displays the same glyph for many different Unicode characters. Using this one-glyph-per-block generalization allows the Unicode Last Resort font to contain a glyph for every character in Unicode despite the fact that the total number of Unicode characters exceeds the address space of an sfnt (TrueType and OpenType) font structure, which has a 16-bit glyph index that can store a maximum of 65,536 glyphs. Unicode now has over 100,000 defined characters, with a potential address space of over one million characters—over 15 times the sfnt size limit. Unicode Last Resort Font will therefore not break as Unicode continues to grow and the Basic Multilingual Plane (BMP) and surrogate planes fill up further.

Apple's Last Resort font

Apple's Last Resort font is a system font for the Macintosh operating systems that is identical to the Unicode Last Resort font (which was created for the Unicode consortium by Apple). [3]

Apple's Last Resort font was first included in Mac OS 8.5 in 1998, for the benefit of applications using Apple Type Services for Unicode Imaging (ATSUI). It is also used in macOS. In 2001, for Mac OS X 10.1 the Last Resort font design was revised to include the border text and was re-digitized, and extended by Michael Everson of Evertype, who continues to update it with each new release of Unicode.

Unicode Consortium versions

Since version 13.000, the font family is released under SIL Open Font License 1.1. [5]

The family includes Last Resort, Last Resort High-Efficiency. Last Resort High-Efficiency uses Format 13 (Many-to-one range mappings) 'cmap' (character to glyph index mapping) table, which reduces the size of the font, but may not be compatible with some environments.

Releases

Source: [6]

  • 13.000 (2020-10-08): Supports Unicode Version 13.0.0. Only Format 13 cmap table is included.
  • 13.001 (2020-10-22): Added Last Resort High-Efficiency. Both fonts include Format 4 cmap table, with Last Resort includes Format 12 cmap table and LRHE includes Format 13 cmap table.
  • 14.000 (2021-12-01): Supports Unicode Version 14.0.0. Added 12 glyphs for 12 new blocks. Modified 2 glyphs in 2 existing blocks (Ahom, Tangut Supplement).
  • 15.000 (2022-09-13): Supports Unicode Version 15.0.0. 'meta' table was removed. Added 7 glyphs for 7 new blocks. Modified 6 glyphs in 6 existing blocks (Egyptian Hieroglyph Format Controls, Number Forms, Mathematicl Operators Supplement, Variation Selectors, CJK Unified Ideographs Extension F, Variation Selectors Supplement).
  • 15.100 (2023-09-11): Supports Unicode Version 15.1.0. Added one new glyph that corresponds to the newly added CJK Unified Ideographs Extension I block; 627 mappings that correspond to the 627 new characters in Unicode Version 15.1 were changed.

GNU Unifont

The GNU Unifont is a font that contains a glyph for every character in the Unicode Basic multilingual plane. [7] Unlike with the Unicode BMP Fallback font or the Unicode Last Resort Font, the characters in GNU Unifont are low resolution bitmap approximations of each glyph, which results in character renderings which are of low quality but adequate to be a distinguishable graphical representation of a given code point.

The goal of the GNU Unifont project is to "lower our expectations about the font quality to a reasonable degree" in order to obtain complete coverage of all Unicode characters. [8] To achieve this goal, all glyphs are 16 pixels in height and either 8 or 16 pixels in width.

A TrueType version of the GNU Unifont is available for free. [9]

See also

Related Research Articles

TrueType is an outline font standard developed by Apple in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript. It has become the most common format for fonts on the classic Mac OS, macOS, and Microsoft Windows operating systems.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text written in all of the world's major writing systems. Version 15.1 of the standard defines 149813 characters and 161 scripts used in various ordinary, literary, academic, and technical contexts.

OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.

The National Library at Kolkata romanisation is a widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliteration scheme is also known as (American) Library of Congress and is nearly identical to one of the possible ISO 15919 variants. The scheme is an extension of the IAST scheme that is used for transliteration of Sanskrit.

A Unicode block is one of several contiguous ranges of numeric character codes of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole.

<span class="mw-page-title-main">Open-source Unicode typefaces</span>

There are Unicode typefaces which are open-source and designed to contain glyphs of all Unicode characters, or at least a broad selection of Unicode scripts. There are also numerous projects aimed at providing only a certain script, such as the Arabeyes Arabic font. The advantage of targeting only some scripts with a font was that certain Unicode characters should be rendered differently depending on which language they are used in, and that a font that only includes the characters a certain user needs will be much smaller in file size compared to one with many glyphs. Unicode fonts in modern formats such as OpenType can in theory cover multiple languages by including multiple glyphs per character, though very few actually cover more than one language's forms of the unified Han characters.

The Glyph Bitmap Distribution Format (BDF) by Adobe is a file format for storing bitmap fonts. The content takes the form of a text file intended to be human- and computer-readable. BDF is typically used in Unix X Window environments. It has largely been replaced by the PCF font format which is somewhat more efficient, and by scalable fonts such as OpenType and TrueType fonts.

In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane, and one each in, and nearly covering, planes 15 and 16. The code points in these areas cannot be considered as standardized characters in Unicode itself. They are intentionally left undefined so that third parties may define their own characters without conflicting with Unicode Consortium assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions.

A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet. Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters. This article lists some widely used Unicode fonts that support a comparatively large number and broad range of Unicode characters.

Geometric Shapes is a Unicode block of 96 symbols at code point range U+25A0–25FF.

Block Elements is a Unicode block containing square block symbols of various fill and shading. Used along with block elements are box-drawing characters, shade characters, and terminal graphic characters. These can be used for filling regions of the screen and portraying drop shadows. Its block name in Unicode 1.0 was Blocks.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

A numeral is a character that denotes a number. The decimal number digits 0–9 are used widely in various writing systems throughout the world, however the graphemes representing the decimal digits differ widely. Therefore Unicode includes 22 different sets of graphemes for the decimal digits, and also various decimal points, thousands separators, negative signs, etc. Unicode also includes several non-decimal numerals such as Aegean numerals, Roman numerals, counting rod numerals, Mayan numerals, Cuneiform numerals and ancient Greek numerals. There is also a large number of typographical variations of the Western Arabic numerals provided for specialized mathematical use and for compatibility with earlier character sets, such as ² or ②, and composite characters such as ½.

<span class="mw-page-title-main">GNU Unifont</span> Duospaced bitmap font

GNU Unifont is a free Unicode bitmap font created by Roman Czyborra. The main Unifont covers all of the Basic Multilingual Plane (BMP). The "upper" companion covers significant parts of the Supplementary Multilingual Plane (SMP). The "Unifont JP" companion contains Japanese kanji present in the JIS X 0213 character set.

<span class="mw-page-title-main">Unicode input</span> Input characters using their Unicode code points

Unicode input is the insertion of a specific Unicode character on a computer by a user; it is a common way to input characters not directly supported by a physical keyboard. Unicode characters can be produced either by selecting them from a display or by typing a certain sequence of keys on a physical keyboard. In addition, a character produced by one of these methods in one web page or document can be copied into another. In contrast to ASCII's 96 element character set, Unicode encodes hundreds of thousands of graphemes (characters) from almost all of the world's written languages and many other signs and symbols besides.

<span class="mw-page-title-main">Web typography</span> Publishing considerations for the Web

Web typography, like typography generally, is the design of pages – their layout and typeface choices. Unlike traditional print-based typography, pages intended for display on the World Wide Web have additional technical challenges and – given its ability to change the presentation dynamically – additional opportunities. Early web page designs were very simple due to technology limitations; modern designs use Cascading Style Sheets (CSS), JavaScript and other techniques to deliver the typographer's and the client's vision.

<span class="mw-page-title-main">WenQuanYi</span>

WenQuanYi is an open-source project of Chinese computer fonts licensed under GNU General Public License.

The implementation of emojis on different platforms took place across a three-decade period, starting in the 1990s. Today, the exact appearance of emoji is not prescribed but can vary between fonts and platforms, much like different typefaces.

References

  1. Wichary, Marcin (September 29, 2020). "When fonts fail". Figma . Retrieved February 5, 2021.
  2. "Unicode BMP Fallback font". SIL International. March 20, 2008. Retrieved June 10, 2019.
  3. 1 2 "Last Resort font". Apple Computer. November 2, 2002. Archived from the original on October 23, 2011. Retrieved August 27, 2011.
  4. "Last Resort Font Glyph Table". Apple Computer. February 2, 2002. Archived from the original on October 20, 2011. Retrieved August 28, 2011.
  5. Last Resort Font, The Unicode Consortium, February 10, 2022, retrieved February 11, 2022
  6. Releases
  7. "GNU Unifont Glyphs". Unifoundry.com. September 7, 2008. Retrieved August 28, 2011.
  8. Czyborra, Roman (September 29, 1998). "Proposal for a GNU Unicode Font" . Retrieved August 28, 2011.
  9. González Miranda, Luis Alejandro (January 23, 2008). "GNU Unifont in TrueType format". Archived from the original on December 30, 2011. Retrieved August 28, 2011.