Device independent file format

Last updated
Device-independent (DVI)
Evince previewing a DVI file.png
Evince previewing a DVI file. Note that referenced images are not displayed, because they are not part of the DVI file. Images will be added in by a print driver, such as dvips.
Filename extension
.dvi
Internet media type application/x-dvi (unofficial)
Developed by David R. Fuchs
Type of format document

The device independent file format (DVI) is the output file format of the TeX typesetting program, designed by David R. Fuchs and implemented by Donald E. Knuth in 1982. [1] Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer. DVI files are typically used as input to a second program (called a DVI driver) which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert from DVI to popular page description languages (e.g. PostScript, PDF) and for printing.

Contents

TeX markup may be at least partially reverse-engineered from DVI files, although this process is unlikely to produce high-level constructs identical to those present in the original markup, especially if the original markup used high-level TeX extensions (e.g. LaTeX).

DVI differs from PostScript and PDF in that it does not support any form of font embedding, instead merely referencing external font names. (Both PostScript and PDF formats can embed their fonts inside the documents.) For a DVI file to be printed or even properly previewed, the fonts it references must be already installed. Like PDF, DVI uses a limited sort of machine language with termination guarantees that is not a full, Turing-complete programming language like PostScript.

As of 2004 there is a compilation of the specifications a DVI driver must implement by the "TUG DVI Driver Standards Committee". [2] It seems to be based on a TUGboat article of the same name from 1992, but which is much shorter. [3] These documents do not specify the endianness, which is however big endian, as can be seen looking into a DVI file itself.

Specification

The DVI format was designed to be compact and easily machine-readable. Toward this end, a DVI file is a sequence of commands which form "a machine-like language", in Knuth's words. [1] Each command begins with an eight-bit opcode, followed by zero or more bytes of parameters. For example, an opcode from the group 0x00 through 0x7F (decimal 127), set_char_i, typesets a single character and moves the implicit cursor right by that character's width. In contrast, opcode 0xF7 (decimal 247), pre (the preamble, which must be the first opcode in the DVI file), takes at least fourteen bytes of parameters, plus an optional comment of up to 255 bytes.

In a broader sense, a DVI file consists of a preamble, one or more pages, and a postamble. Six state variables are maintained as a tuple of signed, 32-bit integers: . h and v are the current horizontal and vertical offsets from the upper-left corner (increasing v moves down the page), w and x hold horizontal space values, y and z, vertical.

These variables can be pushed to or popped from the stack. In addition, the current font f is held as an integer value, but is not pushed and popped with the rest of the state variables when the opcodes push or pop are encountered. Font spacing information is loaded from TFM files. The fonts themselves are not embedded in the DVI file, only referenced by an integer value defined in the relevant fnt_defi op. (This is done exactly twice for each loaded font: once before it is referenced, and once in the postamble.) f contains an integer value of up to four bytes in length, though in practice, TeX only ever outputs font numbers in the range 0 through 255.

Similarly, the DVI format supports character codes up to four bytes in length, even though only the 0–255 range is commonly seen, as the TFM format is limited to that range. Character codes in DVI files refer to the character encoding of the current font rather than that of the system processing it. This means, for instance, that an EBCDIC-based system can process a DVI file that was generated by an ASCII-based system, so long as it has the same fonts installed.

Graphics as specials

The DVI format does not have support for graphics except for the most basic black-and-white boxes. Instead DVI has a general escape/extension mechanism, known as specials (expressed by the \special command in TeX), which defers graphics (and color) to post-processing filters. There are numerous DVI specials, the most notable of which are PostScript specials, but other programs like tpic have their own. [4] :6,17

DVI versions

DVI files are often converted into PDF, PostScript, or PCL format for reading and printing. They can be also viewed directly by using DVI viewers.

The first DVI previewers capable of on-screen previewing and modification of LaTeX documents ran on Amigas. [10] [11]

DVI-to-PDF converters

dvipdf is a tool to translate DVI files (generated by TeX) to PDF files. In current Linux distributions like Ubuntu, it is a thin wrapper around dvips and Ghostscript, and copyrighted to Artifex Software (the makers of Ghostscript). [12] A possibly different program with the same name—described as a modified version of dvips—was announced in the late 1990s by Sergey Lesenko, [13] [14] [15] however it was apparently never released. [16] [17]

dvipdfm is a DVI-to-PDF translator developed by Mark A. Wicks. The early documentation of dvipdfm specifically mentions the limited availability of Lesenko's dvipdf as a reason for creating dvipdfm. [18] dvipdfm supports most of the newer special functions of the PDF format, including bookmarks, annotations, thumbnails, and dvips specials—a feature making possible the inclusion of Encapsulated PostScript (.eps) files like METAPOST output—as well inclusion of JPEG and PNG images; other features of dvipdfm include partial font embedding (reducing file size) and balancing the internal PDF document trees to speed up rendering of large documents. [4] :798 Many of these features (except for the direct support for .eps files [19] ) are also present in pdfTeX, which typesets TeX directly to PDF. The 2004, 4th edition of the Guide to LaTeX compares them in the following way: [20]

The dvipdfm program is in the original spirit of TEX, that uses DVI as a universal intermediate format for all outputs. Purists might tend to respect this ideal. After all, no one ever considered rewriting TEX to produce PostScript output directly. That said, one must consider that TEX was invented in the days when no one printer specification dominated the field. Today, PDF is much more than a printer format; it is the means of representing documents electronically. That alone would not justify preferring pdfTEX over a DVI-to-PDF converter, nor would the fact that it saves a processing step; the deciding argument is that pdfTEX has established itself as reliable, robust, and flexible. In the end, it is likely a question of which program one is more comfortable with, and which one has given the better results for the particular user.

dvipdfmx is an extended version of the dvipdfm DVI-to-PDF translator, included in current TeX distributions like TeX Live 2014 [21] and MiKTeX 2.9. [22] The primary goal of the dvipdfmx project is to support multi-byte character encodings and CJK character sets for East Asian languages. [23] dvipdfmx is also included (in a somewhat modified form) in XeTeX. [4] :798

The 2nd, 2008 edition of the LaTeX Graphics Companion makes the following workflow suggestion: [4] :803

The route that you should follow depends mostly on the graphics material that you want to include. If most of it is in EPS format, the easiest way is to use latex, followed by dvips and finally ps2pdf. If all of your graphics files are already in PDF format, with some JPEG and PNG images, the more direct route is to run pdflatex. You can also combine both approaches by running latex and the dvipdfmx program. If you make a lot of use of PSTricks, you should look at [...] the pst-pdf package.

References and notes

  1. 1 2 Donald E. Knuth (December 1995). "DVItype" (WEB source code; extract full documentation using WEAVE). Version 3.6. Retrieved 2008-05-07.
  2. TUG DVI Driver Standards Committee. "The DVI Driver Standard, Level 0" (PDF). ctan.org.
  3. TUG DVI Driver Standards Committee (1992). "The DVI Driver Standard, Level 0" (PDF). TUGboat. 13: 54.
  4. 1 2 3 4 5 6 7 Michel Goossens, Frank Mittelbach, Sebastian Rahtz, Denis Roegel, Herbert Voß (2008). The LaTeX Graphics Companion (2nd ed.). Addison-Wesley. ISBN   978-0-321-50892-8.{{cite book}}: CS1 maint: multiple names: authors list (link)
  5. "Y&Y Inc. -- DVIWindo". www.tug.org.
  6. "Y&Y Inc. -- DVIPSONE". www.tug.org.
  7. "CTAN: /tex-archive/dviware/dvitops". ctan.org.
  8. https://www.tug.org/TUGboat/tb27-2/tb87frischauf.pdf [ bare URL PDF ]
  9. "CTAN: /tex-archive/dviware". ctan.org.
  10. In 1986 Tomas Rokicki printed his first page with dvisw, an early DVI printer driver for the Amiga, on a QMS SmartWriter using AmigaTeX by Radical Eye Software. A link to a relic info about milestones of LaTeX history is available at this external site.
  11. Rokicki, Tomas (April 1988). "The Commodore Amiga: A Magic TeX Machine" (PDF). TUGboat . 9 (1): 40–41. Retrieved 2010-11-19.
  12. "Ubuntu Manpage: Dvipdf - Convert TeX DVI file to PDF using ghostscript and dvips". Archived from the original on 2015-09-09. Retrieved 2014-08-03.
  13. https://www.tug.org/TUGboat/tb17-3/tb52lese.pdf [ bare URL PDF ]
  14. https://www.tug.org/TUGboat/tb18-3/tb56lese.pdf [ bare URL PDF ]
  15. "(La)TeX Navigator".
  16. Helmut Kopka; Patrick W. Daly (February 2008) [2004]. Guide to LaTeX (4th; 9th printing ed.). Pearson Education. § 13.2.2 the dvipdfm driver. ISBN   978-0-321-17385-0.
  17. "Where art dvipdf? - comp.text.tex". compgroups.net. Archived from the original on August 11, 2014.{{cite web}}: CS1 maint: unfit URL (link)
  18. Mark A. Wicks, Dvipdfm User’s Manual Archived 2015-07-06 at the Wayback Machine , Version 0.12.4 September 19, 1999, page 2
  19. "texfaq2html redirect emulating cgi-bin lookup on the original site". www.texfaq.org.
  20. Helmut Kopka; Patrick W. Daly (February 2008) [2004]. Guide to LaTeX (4th; 9th printing ed.). Pearson Education. § 13.2.3 The pdfTEX program. ISBN   978-0-321-17385-0.
  21. "Debian -- Details of package texlive-base in sid". packages.debian.org.
  22. "MiKTeX Packages A-Z". miktex.org.
  23. "The DVIPDFMx Project". project.ktug.org.

Related Research Articles

<span class="mw-page-title-main">LaTeX</span> Document preparation software system

LaTeX is a software system for typesetting documents. LaTeX markup describes the content and layout of the document, as opposed to the formatted text found in WYSIWYG word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MiKTeX is used to produce an output file suitable for printing or digital distribution.

TeX, stylized within the system as TeX, is a typesetting system which was designed and written by computer scientist and Stanford University professor Donald Knuth and first released in 1978. TeX is a popular means of typesetting complex mathematical formulae; it has been noted as one of the most sophisticated digital typographical systems.

Metafont is a description language used to define raster fonts. It is also the name of the interpreter that executes Metafont code, generating the bitmap fonts that can be embedded into e.g. PostScript. Metafont was devised by Donald Knuth as a companion to his TeX typesetting system.

<span class="mw-page-title-main">Ghostscript</span> Interpreter for the PostScript language

Ghostscript is a suite of software based on an interpreter for Adobe Systems' PostScript and Portable Document Format (PDF) page description languages. Its main purposes are the rasterization or rendering of such page description language files, for the display or printing of document pages, and the conversion between PostScript and PDF files.

<span class="mw-page-title-main">Typesetting</span> Composition of text by means of arranging physical types or digital equivalents

Typesetting is the composition of text for publication, display, or distribution by means of arranging physical type in mechanical systems or glyphs in digital systems representing characters. Stored types are retrieved and ordered according to a language's orthography for visual display. Typesetting requires one or more fonts. One significant effect of typesetting was that authorship of works could be spotted more easily, making it difficult for copiers who have not gained permission.

OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corporation.

MetaPost refers to both a programming language and the interpreter of the MetaPost programming language. Both are derived from Donald Knuth's Metafont language and interpreter. MetaPost produces vector graphic diagrams from a geometric/algebraic description. The language shares Metafont's declarative syntax for manipulating lines, curves, points and geometric transformations. However,

dvips is a computer program that converts the Device Independent file format (DVI) output of TeX typography into a printable or otherwise presentable form. dvips was written by Tomas Rokicki to produce printable PostScript files from DVI input, and is now commonly used for general DVI conversion.

<span class="mw-page-title-main">Computer Modern</span> Family of typefaces

Computer Modern is the original family of typefaces used by the typesetting program TeX. It was created by Donald Knuth with his Metafont program, and was most recently updated in 1992. Computer Modern, or variants of it, remains very widely used in scientific publishing, especially in disciplines that make frequent use of mathematical notation.

<span class="mw-page-title-main">XeTeX</span> TeX typesetting engine

XeTeX is a TeX typesetting engine using Unicode and supporting modern font technologies such as OpenType, Graphite and Apple Advanced Typography (AAT). It was originally written by Jonathan Kew and is distributed under the X11 free software license.

TeX font metric (TFM) is a font file format used by the TeX typesetting system. It is a font metric format, not an outline font format like TrueType, because it provides only the information necessary to typeset the font such as each character's width, height and depth. The actual glyphs are stored elsewhere. This is not unique to TeX; Adobe's AFM files and Windows' PFM files use the same technique.

Beamer is a LaTeX document class for creating presentation slides, with a wide range of templates and a set of features for making slideshow effects.

dvipng is a cross-platform program for converting the DVI output of the TeX typesetting system into PNG image format. Dvipng was written by Jan-Åke Larsson.

LuaTeX is a TeX-based computer typesetting system which started as a version of pdfTeX with a Lua scripting engine embedded. After some experiments it was adopted by the TeX Live distribution as a successor to pdfTeX. Later in the project some functionality of Aleph was included. The project was originally sponsored by the Oriental TeX project, founded by Idris Samawi Hamid, Hans Hagen, and Taco Hoekwater.

The computer program pdfTeX is an extension of Knuth's typesetting program TeX, and was originally written and developed into a publicly usable product by Hàn Thế Thành as a part of the work for his PhD thesis at the Faculty of Informatics, Masaryk University, Brno, Czech Republic. The idea of making this extension to TeX was conceived during the early 1990s, when Jiří Zlatuška and Phil Taylor discussed some developmental ideas with Donald Knuth at Stanford University. Knuth later met Hàn Thế Thành in Brno during his visit to the Faculty of Informatics to receive an honorary doctorate from Masaryk University.

PGF/Ti<i>k</i>Z Graphics languages

PGF/TikZ is a pair of languages for producing vector graphics from a geometric/algebraic description, with standard features including the drawing of points, lines, arrows, paths, circles, ellipses and polygons. PGF is a lower-level language, while TikZ is a set of higher-level macros that use PGF. The top-level PGF and TikZ commands are invoked as TeX macros, but in contrast with PSTricks, the PGF/TikZ graphics themselves are described in a language that resembles MetaPost. Till Tantau is the designer of the PGF and TikZ languages. He is also the main developer of the only known interpreter for PGF and TikZ, which is written in TeX. PGF is an acronym for "Portable Graphics Format". TikZ was introduced in version 0.95 of PGF, and it is a recursive acronym for "TikZ ist kein Zeichenprogramm".

TeX4ht is a configurable converter capable of translating TeX and LaTeX documents to HTML and certain XML formats. Most notably, TeX4ht serves for converting (La)TeX documents to formats used by word processors. It was developed by Eitan M. Gurari.

<span class="mw-page-title-main">MusiXTeX</span> Open source scorewriter

MusiXTeX is a suite of open source music engraving macros and fonts that allow music typesetting in TeX, released under the GPL-2.0-or-later license.

The CSX+ Indic character set, or the Classical Sanskrit eXtended Plus Indic Character Set, is used by LaTeX to represent text used in the Romanization of Sanskrit. It is an extension of the CSX Indic character set, which in turn is an extension of the CS Indic character set, and is based on Code Page 437. It fixes an issue with Windows programs, by moving á from code point 160 (0xA0), to code point 158 (0x9E).