TeX font metric

Last updated
TeX font metric
Filename extension
.tfm
Internet media type application/x-tex-tfm (unofficial)
Developed by Donald E. Knuth
Type of format font metric

TeX font metric (TFM) is a font file format used by the TeX typesetting system. It is a font metric format, not an outline font format like TrueType, because it provides only the information necessary to typeset the font such as each character's width, height and depth. The actual glyphs are stored elsewhere. This is not unique to TeX; Adobe's AFM files and Windows' PFM (NTF on modern Windows PostScript driver) files use the same technique.

Contents

TFM files contain all of the information TeX needs to produce its device-independent (DVI) output. The actual glyphs are then inserted by the eventual DVI output driver or previewer, using, for instance, TrueType fonts, or fonts in the bitmap PK format derived from a METAFONT source. The format is designed to be extremely compact: in the original Computer Modern distribution, every font's TFM file is smaller than 2 kB. [1]

Specification

The canonical specification of the TFM format is embedded in the source code of the program TFtoPL. [2]

A TFM file is broken down into a series of four-byte words, which can contain data fields of various lengths. Any data fields that are more than one byte long are held in big endian order. (The exact same file will be generated, regardless of architecture of the computer generating it.)


The six-word (24-byte) file header contains twelve unsigned 16-bit integers which describe the length of the file, the range of character codes contained in the font, and the size of each of the tables. A single TFM file describes between 0 and 256 characters, inclusive.


The body of the TFM file consists of a series of ten tables, each one except for the first laid out as an array of fixed-length fields. A 32-bit signed fixed-point number with 12 bits to the left of the decimal point, referred to as a fix_word, is used heavily. The first table, header, contains a checksum designed to prevent a document compiled into a DVI with one set of fonts from being printed with a different set, as well as ASCII descriptions of the character coding scheme (e.g., ASCII or TeX text) and the font family. It also contains the font's design size; all following fix_word values are interpreted as multiplication factors for this.

File structure of a TFM file TFM file structure.pdf
File structure of a TFM file

The next table, char_info, consists of one word per character, and contains indexes into the width, height, depth and italic correction tables. This is a device to save space, because width values, for instance, are frequently duplicated. Because height and depth values are duplicated more frequently, to fit all of these values into a single word, the indexes are limited to four bits. Because of this, there is a limit of sixteen different character heights and sixteen different character depths in any given TFM file. Also, there is a limit of sixty-four different italic corrections. There is also one more index which can point into the lig_kern table, or to information about extensible characters, depending on a two-bit tag value. Extensible characters use a series of repeated characters to construct a single large one of arbitrary size, usually large delimiters such as parentheses or brackets.

There then follow the four tables width, height, depth and italic, which contain values (in fix_word format) referred to by indexes in char_info.

Ligatures and kerning are represented using a simple programming language consisting of fixed-length four-byte operations in the lig_kern table; it makes use of kerning values (specified as fix_words) in the kern table, which follows it.

Extensible characters are specified in the exten table, using a series of four-byte words specifying the top, middle, bottom and repeated sections of an extensible character. For instance, the character at left below would be obtained by setting (top, mid, bot, rep) to the character codes for (/, <, \, |). The first three character codes can be set to zero. For instance, if mid were set to 0 in the previous example, the result would change from the brace drawn at left to the parenthesis drawn to its right.

/     / |     | |     | <     | |     | |     | \     \

Of course, the font would use specially designed characters for this, instead of reusing existing ones, but the principle is the same.

The final table, param, contains a series of specifically defined fix_word values, including the font's x-height and the amount of italic slant (to determine how far to shift accents). Certain coding schemes such as TeX math symbols and TeX math extension define extra parameters which appear after these.

Property lists

There is a human-readable equivalent to the TFM format called PL, for property list. There is an exact correspondence between a TFM file and a PL file: one can be freely converted to the other and back again with no loss of information using the tftopl and pltotf programs. The PL format, optimized for usability instead of space, does not make the same use of references that the TFM format does. For instance, many characters in a font may use the same character width, which would be represented only once in the TFM format, and this value would be referenced by each character, since the index would be significantly smaller than the full-precision numerical value. In the PL format, however, the full value is written out each time it appears.

For example, this is the code for the upper-case letter Y in Computer Modern Roman, ten point:

(CHARACTERCY(CHARWDR0.750002)(CHARHTR0.683332)(CHARICR0.025)(COMMENT(KRNCeR-0.083334)(KRNCoR-0.083334)(KRNCrR-0.083334)(KRNCaR-0.083334)(KRNCAR-0.083334)(KRNCuR-0.083334)))

The kerning values seen here are copied from another section of the PL file in order to make it easier to read, which in itself is redundant. Notice how the full numeric values of the kerning constants are written out each time they appear, instead of being stored once and referred to by a much smaller index.

Notes

  1. "CTAN:/tex-archive/fonts/cm/tfm/". Comprehensive TeX Archive Network. 1996-07-08. Retrieved 2006-07-30.
  2. Knuth, Donald E. (February 2008). "TFtoPL" (WEB source code; extract full documentation using WEAVE). Version 3.2. Retrieved 2010-10-31.{{cite journal}}: Cite journal requires |journal= (help)

Related Research Articles

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

TeX, stylized within the system as TeX, is a typesetting system which was designed and written by computer scientist and Stanford University professor Donald Knuth and first released in 1978. TeX is a popular means of typesetting complex mathematical formulae; it has been noted as one of the most sophisticated digital typographical systems.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is an information technology standard maintained by the Unicode Consortium for the consistent handling of text across most of the world's writing systems. Version 15.0 of the standard defines 149186 characters and 161 scripts for various modern, historical, and technical uses. Many common characters, including numerals, punctuation, and other symbols, are unified within the standard and are not treated as specific to any given writing system. Unicode presently encodes thousands of emoji, and its adoption is largely responsible for their popularization outside of Japan, as well as for their continued development as a part of the standard. Unicode can encode a maximum of more than 1.1 million characters.

Metafont is a description language used to define raster fonts. It is also the name of the interpreter that executes Metafont code, generating the bitmap fonts that can be embedded into e.g. PostScript. Metafont was devised by Donald Knuth as a companion to his TeX typesetting system.

Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement and has the potential for very high throughput in hardware implementations. It is the algorithm of the Unix file compression utility compress and is used in the GIF image format.

UTF-32 (32-bit Unicode Transformation Format) is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far fewer than 232 Unicode code points, needing actually only 21 bits). UTF-32 is a fixed-length encoding, in contrast to all other Unicode transformation formats, which are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value.

OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.

<span class="mw-page-title-main">Device independent file format</span> Typesetting file format

The device independent file format (DVI) is the output file format of the TeX typesetting program, designed by David R. Fuchs and implemented by Donald E. Knuth in 1982. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer. DVI files are typically used as input to a second program which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert from DVI to popular page description languages and for printing.

<span class="mw-page-title-main">Kerning</span> Adjustment of the space between the characters of a typeface

In typography, kerning is the process of adjusting the spacing between characters in a proportional font, usually to achieve a visually pleasing result. Kerning adjusts the space between individual letterforms while tracking (letter-spacing) adjusts spacing uniformly over a range of characters. In a well-kerned font, the two-dimensional blank spaces between each pair of characters all have a visually similar area. The term "keming" is sometimes used informally to refer to poor kerning.

The BMP file format or bitmap, is a raster graphics image file format used to store bitmap digital images, independently of the display device, especially on Microsoft Windows and OS/2 operating systems.

<span class="mw-page-title-main">GB 18030</span> Unicode character encoding mostly used for Simplified Chinese

GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format, GB18030 supports both simplified and traditional Chinese characters characters. It is also compatible with legacy encodings including GB2312, CP926, and GBK 1.0.

<span class="mw-page-title-main">Computer Modern</span> Family of typefaces

Computer Modern is the original family of typefaces used by the typesetting program TeX. It was created by Donald Knuth with his Metafont program, and was most recently updated in 1992. Computer Modern, or variants of it, remains very widely used in scientific publishing, especially in disciplines that make frequent use of mathematical notation.

Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically loaded code, and core dumps. It was developed to replace the a.out format.

A computer font is implemented as a digital data file containing a set of graphically related glyphs. A computer font is designed and created using a font editor. A computer font specifically designed for the computer screen, and not for printing, is a screen font.

The Glyph Bitmap Distribution Format (BDF) by Adobe is a file format for storing bitmap fonts. The content takes the form of a text file intended to be human- and computer-readable. BDF is typically used in Unix X Window environments. It has largely been replaced by the PCF font format which is somewhat more efficient, and by scalable fonts such as OpenType and TrueType fonts.

In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page. For example, the common whitespace symbol U+0020 SPACE represents a blank space punctuation character in text, used as a word divider in Western scripts.

Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary information in ASCII text form. It is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices and hardware emulators. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. Some also use it as a container format holding packets of stream data. Common file extensions used for the resulting files are .HEX or .H86. The HEX file is then read by a programmer to write the machine code into a PROM or is transferred to the target system for loading and execution.

PostScript fonts are font files encoded in outline font specifications developed by Adobe Systems for professional digital typesetting. This system uses PostScript file format to encode font information.

<span class="mw-page-title-main">EB Garamond</span> Typeface family

EB Garamond is a free and open source implementation of Claude Garamond’s typeface, Garamond, and the matching Italic, Greek and Cyrillic characters designed by Robert Granjon. Its name is shortening of Egenolff–Berner Garamond which refers to the fact that the letter forms are taken from the Egenolff–Berner specimen printed in 1592.

<span class="mw-page-title-main">TeX Gyre Heros</span> Typeface

TeX Gyre Heros is a free implementation of a typical mid 20th century neo-grotesque font. It is a derivative of the free edition of Nimbus Sans L with enhanced letter forms, additional glyphs and metrics compatible with Adobe Helvetica. Nimbus Sans itself was created by URW++ and is inspired by Helvetica, also known by its original name Neue Haas Grotesk.

References