Analyzed Layout and Text Object

Last updated

Analyzed Layout and Text Object (ALTO) is an open XML schema originally developed by the EU-funded METAe project. [1] ALTO files describe the placement, size, and style of text in an image of a digitized document, as well as other elements of the document's layout, such as margins, headings, columns, and illustrations.

Contents

The text and placement information in ALTO files is usually generated by specialized optical character recognition (OCR) software, and is often used in combination with the Metadata Encoding and Transmission Standard (METS) to describe a larger digitized object (such as a book) and create references across ALTO files (such as pages), as might be necessary to describe a reading sequence.

From version 1.0 in June 2004 to 1.4 in 2007, ALTO was developed and maintained by Content Conversion Specialists (CCS) GmbH, Hamburg. In August 2009, maintenance for the schema was transferred to the Library of Congress, and from then overseen by a separate editorial board created for that purpose. [2]

Structure

An ALTO file consists of three major sections as children of the root <alto> element: [3]

<?xml version="1.0"?><alto><Description><MeasurementUnit/><sourceImageInformation/><Processing/></Description><Styles><TextStyle/><ParagraphStyle/></Styles><Layout><Page><TopMargin/><LeftMargin/><RightMargin/><BottomMargin/><PrintSpace/></Page></Layout></alto>

Software support

See also

References

  1. Stehno, Birgit; Egger, Alexander; Retti, Gregor (April 2003). "METAe—Automated Encoding of Digitized Texts". Literary and Linguistic Computing. 18 (1): 77–88. doi:10.1093/llc/18.1.77.
  2. "ALTO News". Library of Congress. Retrieved 8 October 2025.
  3. Structure of ALTO Files