PC Screen Font

Last updated

PC Screen Font (PSF) is a bitmap font format currently employed by the Linux kernel for console fonts. Documentation of the PSF file format can be found within the source code of the Linux kbd utilities. [1] The format is described by the University Eindhoven. [2]

Contents

File format

File header

Every PSF file begins with a header. There are two header types: PSF1 and PSF2. All multibyte integers are in least significant byte order (little endian).

psf1 header
LengthDescriptionNotes
2Magic bytesAlways 36 04
1PSF Font modeVarious font flags, see font modes
1Glyph sizeGlyph size in bytes, 8 bit unsigned integer. For psf1 to character size always equals the glyph height
psf2 header
LengthDescriptionNotes
4Magic bytesAlways 72 b5 4a 86
4Version32 bit unsigned integer, currently always 0
4Header Size32 bit unsigned integer, size of the header in bytes (usually 32)
4Flags32 bit unsigned integer, see Font flags
4Length32 bit unsigned integer, number of glyphs
4Glyph size32 bit unsigned integer, number of bytes per glyph
4Height32 bit unsigned integer, height of each glyph
4Width32 bit unsigned integer, width of each glyph

All psf1 glyphs are 8 pixels wide.

Font modes

The font mode in a psf1 header is an 8 bit unsigned integer containing various flags about the font

Font mode bits
Value (hex)NameMeaning
0x01PSF1_MODE512If this bit is set, the font face will have 512 glyphs. If it is unset, then the font face will have just 256 glyphs.
0x02PSF1_MODEHASTABIf this bit is set, the font face will have a unicode table.
0x04PSF1_MODESEQEquivalent to PSF1_MODEHASTAB

Font flags

The font flags field in a psf2 header is a 32 bit unsigned integer containing various flags about the font. There is currently only one flag.

Font flag bits
Value (hex)NameMeaning
0x00000001PSF2_HAS_UNICODE_TABLEIf this bit is set, the font face will have a unicode table

File bitmaps

The actual glyph data immediately follows the header. Each bit in each glyph represents one pixel in the font: 0 for undrawn, 1 for drawn. Each row of each glyph is padded to a whole number of bytes. For example, a 12x12 font would have 2 bytes per row. The letter 'A' in a 12x12 PSF bitmap may look like this:

           padding  Font data    | +----------+ +--+ 000001100000 0000 000011110000 0000 000110011000 0000 001100001100 0000 011000000110 0000 110000000011 0000 111111111111 0000 111111111111 0000 110000000011 0000 110000000011 0000 110000000011 0000 110000000011 0000 

with twelve bits of actual data and padding to fill each row to the nearest byte. Rows are stored left-most column first.

The unicode table

If a PSF file contains a unicode table, then every glyph has an entry in the unicode table, with the first glyph corresponding to the first entry in the table, the second glyph the second entry, and so on. The format of an entry in the unicode table depends on the type of the PSF header.

The PSF1 unicode table

Entries in the unicode table of a PSF1 file are represented as a series of 16 bit little-endian unsigned integers ending in 0xffff. The first integers in the entry contain a unicode character represented by the corresponding glyph. These unicode characters continue until the integer 0xFFFE is encountered. Then, starting from the 0xfffe value, rather than representing individual unicode characters, the numbers represent series of unicode characters that correspond to the glyph, starting a new series with every 0xfffe encountered, and stopping once 0xffff is encountered. For example, the following series of 16 bit little-endian unsigned integers would be interpreted in the following way:

0xdead 0xbeef  0xfffe 0x3141 0x5926  0xfffe 0x1234 0x5678  0xffff 

Whenever either U+dead, U+beef, the sequence U+3141 U+5926, or the sequence U+1234 U+5678 is seen, display the glyph corresponding to this unicode table entry.

The PSF2 unicode table

Entries in the unicode table of a PSF2 file are the same as those in a PSF1 file, except unicode characters are represented in UTF-8, unicode sequences begin with the one byte sequence 0xfe rather than the two byte sequence 0xfffe, and unicode entries end with the one byte sequence 0xff rather than the two byte sequence 0xffff.

Related Research Articles

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji, and non-visual control and formatting codes.

UTF-8 is a variable-length character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.

In computing, endianness is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest. A little-endian system, in contrast, stores the least-significant byte at the smallest address. Bi-endianness is a feature supported by numerous computer architectures that feature switchable endianness in data fetches and stores or for instruction fetches. Other orderings are generically called middle-endian or mixed-endian.

The byte order mark (BOM) is a particular usage of the special Unicode character, U+FEFFZERO WIDTH NO-BREAK SPACE, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

UTF-32 (32-bit Unicode Transformation Format) is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far fewer than 232 Unicode code points, needing actually only 21 bits). UTF-32 is a fixed-length encoding, in contrast to all other Unicode transformation formats, which are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value.

The BMP file format or bitmap, is a raster graphics image file format used to store bitmap digital images, independently of the display device, especially on Microsoft Windows and OS/2 operating systems.

UTF-7 is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable.

In computer programming, a magic number is any of the following:

The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically loaded code, and core dumps. It was developed to replace the a.out format.

A Java class file is a file containing Java bytecode that can be executed on the Java Virtual Machine (JVM). A Java class file is usually produced by a Java compiler from Java programming language source files containing Java classes. If a source file has more than one class, each class is compiled into a separate class file.

Netpbm is an open-source package of graphics programs and a programming library. It is used mainly in the Unix world, where one can find it included in all major open-source operating system distributions, but also works on Microsoft Windows, macOS, and other operating systems.

TeX font metric (TFM) is a font file format used by the TeX typesetting system. It is a font metric format, not an outline font format like TrueType, because it provides only the information necessary to typeset the font such as each character's width, height and depth. The actual glyphs are stored elsewhere. This is not unique to TeX; Adobe's AFM files and Windows' PFM files use the same technique.

The Glyph Bitmap Distribution Format (BDF) by Adobe is a file format for storing bitmap fonts. The content takes the form of a text file intended to be human- and computer-readable. BDF is typically used in Unix X Window environments. It has largely been replaced by the PCF font format which is somewhat more efficient, and by scalable fonts such as OpenType and TrueType fonts.

This article compares Unicode encodings. Two situations are considered: 8-bit-clean environments, and environments that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in some standards and so some standard-conforming software must generate messages that comply with the restrictions. Standard Compression Scheme for Unicode and Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.

Silicon Graphics Image (SGI) or the RGB file format is the native raster graphics file format for Silicon Graphics workstations. The format was invented by Paul Haeberli. It can be run-length encoded (RLE). FFmpeg and ImageMagick, among others, support this format.

<span class="mw-page-title-main">Universal Character Set characters</span> Complete list of the characters available on most computers

The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set, is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other domains, to unique machine-readable data values. By creating this mapping, the UCS enables computer software vendors to interoperate, and transmit—interchange—UCS-encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple interpretations depending on the character encoding in use, resulting in mojibake if the wrong one is chosen.

A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets to represent an arbitrarily large integer. A VLQ is essentially a base-128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes. VLQ is identical to LEB128 except in endianness. See the example below.

<span class="mw-page-title-main">GNU Unifont</span> Duospaced bitmap font

GNU Unifont is a free Unicode bitmap font that uses an intermediate bitmapped font format created by Roman Czyborra. The main Unifont covers all of the Basic Multilingual Plane (BMP). The "upper" companion covers significant parts of the Supplementary Multilingual Plane (SMP). The "Unifont JP" companion contains Japanese kanji present in the JIS X 0213 character set.

References

  1. the Linux Kernel Organization (2007-01-28). "psf.h".
  2. TUE (1999). "psf format".