TeX4ht

Last updated
TeX4ht
Developer(s) Eitan M. Gurari (1947-2009), Karl Berry, Michal Hoftich
Stable release
Rolling releases / December 11, 2023
Operating system Linux/Windows/Mac OS X
Type Utility
Licence LaTeX Project Public License (LPPL).
Website http://www.tug.org/tex4ht/

TeX4ht is a configurable converter capable of translating TeX and LaTeX documents to HTML and certain XML formats. Most notably, TeX4ht serves for converting (La)TeX documents to formats used by word processors. It was developed by Eitan M. Gurari. [1]

Contents

The program is published under the LaTeX Project Public License (LPPL).

History

TeX4ht was developed in the 1990s to convert (La)TeX to HTML, helping to publish scientific documents that were written in (La)TeX on the World Wide Web for display in a web browser. Particularly, hypertext features were supported, so it became possible to include hyperlinks in the web version of documents.

More XML-based formats were supported gradually. As of 2023, HTML5, XHTML, MathML, OpenDocument, DocBook, EPUB and TEI are supported. [2] [3]

  1. (*1947, †2009)
  2. "TeX4ht - TeX Users Group".
  3. "5 Output Formats". www.kodymirus.cz. Retrieved 2023-12-13.

JavaHelp can also be generated.

TeX4ht is now included preconfigured with all TeX distributions.

Since Eitan M. Gurari's death the program has been maintained by Radhakrishnan CV (no longer active), Karl Berry, and Michal Hoftich, with contributions from many others. [1]

Function

TeX4ht does not directly transform TeX or LaTeX markup into the output markup language (HTML etc.) Instead, an ordinary (La)TeX run compiles a DVI file from the source first. TeX4ht subsequently processes the DVI file. [2] Other converters, most notably LaTeX2HTML or TtH operate in a single pass.

TeX4ht essentially can deal with any successfully compiling (La)TeX document source. TeX4ht can also incorporate support publicly available macro packages or user-made (perhaps document-specific) commands to process features that transcend standard TeX formats, such as for managing bibliography with BibTeX, because these extensions do not need corresponding implementations in the converter.

Mathematical formulae and other characters or symbols that cannot be displayed as text are converted into graphics. Mathematics can also be converted into MathML or form suitable for processing with MathJax.

TeX4ht can convert LaTeX documents into Microsoft Word's doc format via the OpenDocument format, ODT.

Related Research Articles

<span class="mw-page-title-main">LaTeX</span> Document preparation software system

LaTeX is a software system for document preparation. When writing, the writer uses plain text as opposed to the formatted text found in WYSIWYG word processors like Microsoft Word, LibreOffice Writer and Apple Pages. The writer uses markup tagging conventions to define the general structure of a document, to stylise text throughout a document, and to add citations and cross-references. A TeX distribution such as TeX Live or MiKTeX is used to produce an output file suitable for printing or digital distribution.

<span class="mw-page-title-main">Markup language</span> Modern system for annotating a document

A markuplanguage is a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitate automated processing.

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content, and is one of a number of mathematical markup languages. Its aim is to natively integrate mathematical formulae into World Wide Web pages and other documents. It is part of HTML5 and standardised by ISO/IEC since 2015.

<span class="mw-page-title-main">GNU TeXmacs</span> Open-source word processor

GNU TeXmacs is a scientific word processor and typesetting component of the GNU Project. It originated as a variant of GNU Emacs with TeX functionalities, though it shares no code with those programs, while using TeX fonts. It is written and maintained by Joris van der Hoeven and a group of developers. The program produces structured documents with a WYSIWYG user interface. New document styles can be created by the user. The editor provides high-quality typesetting algorithms and TeX and other fonts for publishing professional looking documents.

<span class="mw-page-title-main">Device independent file format</span> Typesetting file format

The device independent file format (DVI) is the output file format of the TeX typesetting program, designed by David R. Fuchs and implemented by Donald E. Knuth in 1982. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer. DVI files are typically used as input to a second program which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert from DVI to popular page description languages and for printing.

<span class="mw-page-title-main">Texinfo</span> Markup language for documentation

Texinfo is a typesetting syntax used for generating documentation in both on-line and printed form with a single source file. It is implemented by a computer program released as free software of the same name, created and made available by the GNU Project from the Free Software Foundation.

<span class="mw-page-title-main">Formula editor</span> Computer program used to typeset mathematical works or formulae

A formula editor is a computer program that is used to typeset mathematical formulas and mathematical expressions.

A lightweight markup language (LML), also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output.

The following tables compare general and technical information for a number of document markup languages. Please see the individual markup languages' articles for further information.

<span class="mw-page-title-main">MathType</span> The Software for Type And Design Formulas and numerical expressions

MathType is a software application created by Design Science that allows the creation of mathematical notation for inclusion in desktop and web applications.

A mathematical markup language is a computer notation for representing mathematical formulae, based on mathematical notation. Specialized markup languages are necessary because computers normally deal with linear text and more limited character sets. A formally standardized syntax also allows a computer to interpret otherwise ambiguous content, for rendering or even evaluating. For computer-interpretable syntaxes, the most popular are TeX/LaTeX, MathML, OpenMath and OMDoc.

<span class="mw-page-title-main">RefDB</span>

RefDB is a client/server reference database and bibliography tool for markup languages like SGML, XML, and LaTeX. It is suitable for standalone use for the purpose of self-archiving, but can be used as an institutional repository as well. Data storage proper is done in one of several supported SQL database engines. RefDB runs on Unix-like operating systems and on Windows/Cygwin. RefDB is licensed under the GPL.

<span class="mw-page-title-main">MathJax</span> Cross-browser JavaScript library that displays mathematical equations in web browsers

MathJax is a cross-browser JavaScript library that displays mathematical notation in web browsers, using MathML, LaTeX and ASCIIMathML markup. MathJax is released as open-source software under the Apache License.

LaTeXML is a free public domain software package which converts LaTeX documents to XML, HTML, EPUB, JATS and TEI.

Pandoc is a free-software document converter, widely used as a writing tool and as a basis for publishing workflows. It was created by John MacFarlane, a philosophy professor at the University of California, Berkeley.

References

  1. Karl Berry, posting in mailing list texhax, 17 July 2009; ibid. posting in der mailinglist texhax, 7 November 2009.
  2. Cf. The LaTeX Web Companion, pp. 169f.

Literature

See also