OpenDocument technical specification

Last updated

This article describes the technical specifications of the OpenDocument office document standard, as developed by the OASIS industry consortium. A variety of organizations developed the standard publicly and make it publicly accessible, meaning it can be implemented by anyone without restriction. The OpenDocument format aims to provide an open alternative to proprietary document formats.

Contents

Document representation

The OpenDocument format supports the following two ways of document representation:

The recommended filename extensions and MIME types are included in the official standard (OASIS, May 1, 2005, and its later revisions or versions). The MIME types and extensions contained in the ODF specification are applicable only to office documents that are contained in a package. Office documents that conform to the OpenDocument specification but are not contained in a package should use the MIME type text/xml.

The MIME type is also used in the office:mimetype attribute. It is very important to use this attribute in flat XML files/single XML documents, where this is the only way the type of the document can be detected (in a package, the MIME type is also present in a separate file mimetype). Its values are the MIME types that are used for the packaged variant of office documents.

Documents

The most common file extensions used for OpenDocument documents are .odt for text documents, .ods for spreadsheets, .odp for presentation programs, and .odg for graphics. These are easily remembered by considering ".od" as being short for "OpenDocument", and then noting that the last letter indicates its more specific type (such as t for text). Here is the complete list of document types, showing the type of file, the recommended file extension, and the MIME Type:

File typeExtensionMIME TypeODF specification
Text.odtapplication/vnd.oasis.opendocument.text1.0
Spreadsheet.odsapplication/vnd.oasis.opendocument.spreadsheet1.0
Presentation.odpapplication/vnd.oasis.opendocument.presentation1.0
Drawing.odgapplication/vnd.oasis.opendocument.graphics1.0
Chart.odcapplication/vnd.oasis.opendocument.chart1.0
Formula.odfapplication/vnd.oasis.opendocument.formula1.0
Image.odiapplication/vnd.oasis.opendocument.image1.0
Master Document.odmapplication/vnd.oasis.opendocument.text-master1.0
Database.odbapplication/vnd.sun.xml.base [2] [3] not defined in ODF 1.0/1.1 specifications;
used in OpenOffice.org 2.x
application/vnd.oasis.opendocument.baseODF 1.2;
used in OpenOffice.org 3.x
application/vnd.oasis.opendocument.databasedefined in IANA registration
all OpenDocument single/flat XML filesnot definedtext/xml1.0

Templates

OpenDocument also supports a set of template types. Templates represent formatting information (including styles) for documents, without the content themselves. The recommended filename extension begins with ".ot" (interpretable as short for "OpenDocument template"), with the last letter indicating what kind of template (such as "t" for text). The supported set includes:

File typeExtensionMIME TypeODF specification
Text.ottapplication/vnd.oasis.opendocument.text-template1.0
Spreadsheet.otsapplication/vnd.oasis.opendocument.spreadsheet-template1.0
Presentation.otpapplication/vnd.oasis.opendocument.presentation-template1.0
Drawing.otgapplication/vnd.oasis.opendocument.graphics-template1.0
Chart template.otcapplication/vnd.oasis.opendocument.chart-template1.0
Formula template.otfapplication/vnd.oasis.opendocument.formula-template1.0
Image template.otiapplication/vnd.oasis.opendocument.image-template1.0
Web page template.othapplication/vnd.oasis.opendocument.text-web1.0

Capabilities

As noted above, the OpenDocument format can describe text documents (for example, those typically edited by a word processor), spreadsheets, presentations, drawings/graphics, images, charts, mathematical formulas, and "master documents" (which can combine them). It can also represent templates for many of them.

The official OpenDocument standard version 1.0 (OASIS, May 1, 2005) defines OpenDocument's capabilities. The text below provides a brief summary of the format's capabilities.

Metadata

The OpenDocument format supports storing metadata (data about the data) by having a set of pre-defined metadata elements, as well as allowing user-defined and custom metadata. The format predefines the following metadata fields:

Content

OpenDocument's text content format supports both typical and advanced capabilities. Headings of various levels, lists of various kinds (numbered and not), numbered paragraphs, and change tracking are all supported. Page sequences and section attributes can be used to control how the text is displayed. Hyperlinks, ruby text (which provides annotations and is especially critical for some languages), bookmarks, and references are supported as well. Text fields (for autogenerated content), and mechanisms for automatically generating tables such as tables of contents, indexes, and bibliographies, are included as well.

The OpenDocument format implements spreadsheets as sets of tables. Thus it features extensive capabilities for formatting the display of tables and spreadsheets. OpenDocument also supports database ranges, filters, and "data pilots" (known in Microsoft Excel contexts as "pivot tables"). Change tracking is available for spreadsheets as well.

The graphics format supports a vector graphic representation, in which a set of layers and the contents of each layer is defined. Available drawing shapes include Rectangle, Line, Polyline, Polygon, Regular Polygon, Path, Circle, Ellipse, and Connector. 3D Shapes are also available; the format includes information about the Scene, Light, Cube, Sphere, Extrude, and Rotate (it is intended for use as for office data exchange, and not sufficient to represent videos or other extensive 3D scenes). Custom shapes can also be defined.

Presentations are supported. Users can include animations in presentations, with control over the sound, showing a shape or text, hiding a shape or text, or dimming something, and these can be grouped. In OpenDocument, much of the format capabilities are reused from the text format, simplifying implementations. However, tables are not supported within OpenDocument as drawing objects, so may only be included in presentations as embedded tables.

Charts define how to create graphical displays from numerical data. They support titles, subtitles, a footer, and a legend to explain the chart. The format defines the series of data that is to be used for the graphical display, and a number of different kinds of graphical displays (such as line charts, pie charts, and so on).

Forms are specially supported, building on the existing XForms standard.

Objects

A document in OpenDocument format can contain two types of objects, as follows:

Use of Microsoft Object Linking and Embedding (OLE) objects limits the interoperability, because these objects are not widely supported in programs for viewing or editing files (e.g. embedding of other files inside the file, such as tables or charts from a spreadsheet application in a text document or presentation file). [5] [6] [7] [8] [9] If a software that understands an OLE object is not available, the object is usually replaced by a picture (bitmap representation of the object) or not displayed at all. [10] [11] [12]

Formatting

The style and formatting controls are numerous, providing a number of controls over the display of information.

Page layout is controlled by a variety of attributes. These include page size, number format, paper tray, print orientation, margins, border (and its line width), padding, shadow, background, columns, print page order, first page number, scale, table centering, maximum footnote height and separator, and many layout grid properties.

Headers and footer can have defined fixed and minimum heights, margins, border line width, padding, background, shadow, and dynamic spacing.

There are many attributes for specific text, paragraphs, ruby text, sections, tables, columns, lists, and fills. Specific characters can have their fonts, sizes, generic font family names (roman  serif, swiss  sans-serif, modern  monospace, decorative, script or system), and other properties set. Paragraphs can have their vertical space controlled through attributes on keep together, widow, and orphan, and have other attributes such as "drop caps" to provide special formatting. The list is extremely extensive; see the references (in particular the actual standard) for details.

Spreadsheet formulas

OpenDocument version 1.2 fully describes mathematical formulas displayable on-screen. It is fully capable of exchanging spreadsheet data, formats, pivot tables, and other information typically included in a spreadsheet. OpenDocument exchanges formulas as values of the attribute table:formula.

The allowed syntax of table:formula was not defined in sufficient detail in the OpenDocument version 1.0 specification, which defined spreadsheet formulas using a set of simple examples showing, for example, how to specify ranges and the SUM() function. The OASIS OpenDocument Formula sub group therefore standardized the table:formula in the OpenFormula specification. For more information see the OpenFormula article.

Encryption

When an OpenDocument file is password protected the file structure of the bundle remains the same, but contents of XML files in the package are encrypted using the following algorithm:

  1. The file contents are compressed with the DEFLATE algorithm.
  2. A checksum of a portion of the compressed file is computed (SHA-1 of the file contents, or SHA-1 of the first 1024 bytes of the file, or SHA-256 of the first 1024 bytes of the file) and stored, so password correctness can be verified when decrypting.
  3. A digest (hash) of the user-entered password in UTF-8 encoding is created and passed to the package component. ODF versions 1.0 and 1.1 only mandate support for the SHA-1 digest here, while version 1.2 recommends SHA-256.
  4. This digest is used to produce a derived key by undergoing key stretching with PBKDF2 using HMAC-SHA-1 with a salt of arbitrary length (in ODF 1.2; it's 16 bytes in ODF 1.1 and below) generated by the random number generator for an arbitrary iteration count (1024 by default in ODF 1.2).
  5. The random number generator is used to generate a random initialization vector for each file.
  6. The initialization vector and derived key are used to encrypt the compressed file contents. ODF 1.0 and 1.1 use Blowfish in 8-bit cipher feedback mode, while ODF 1.2 considers it a legacy algorithm and allows Triple DES and AES (with 128, 196 or 256 bits), both in cipher block chaining mode, to be used instead.

Format internals

An OpenDocument file commonly consists of a standard ZIP archive (JAR archive [13] ) containing a number of files and directories; but OpenDocument file can also consist only of a single XML document. An OpenDocument file is commonly a collection of several subdocuments within a (ZIP) package. An OpenDocument file as a single XML is not widely used. According to the OpenDocument 1.0 specification, the ZIP file specification is defined in Info-ZIP Application Note 970311, 1997. [14] [15] The simple compression mechanism used for a package normally makes OpenDocument files significantly smaller than equivalent Microsoft ".doc" or ".ppt" files. This smaller size is important for organizations who store a vast number of documents for long periods of time, and to those organizations who must exchange documents over low bandwidth connections. Once uncompressed, most data is contained in simple text-based XML files, so the uncompressed data contents have the typical ease of modification and processing of XML files. The standard also allows for the creation of a single XML document, which uses <office:document> as the root element, for use in document processing.

The standard allows the inclusion of directories to store images, non-SMIL animations, and other files that are used by the document but cannot be expressed directly in the XML.

Due to the openly specified compression format used, it is possible for a user to extract the container file to manually edit the contained files. This allows repair of a corrupted file or low-level manipulation of the contents.

The zipped set of files and directories includes the following:

The OpenDocument format provides a strong separation between content, layout and metadata. The most notable components of the format are described in the subsections below. The files in XML format are further defined using the RELAX NG language for defining XML schemas. RELAX NG is itself defined by an OASIS specification, as well as by part two of the international standard ISO/IEC 19757: Document Schema Definition Languages (DSDL).

content.xml

content.xml, the most important file, carries the actual content of the document (except for binary data, such as images). The base format is inspired by HTML, and though far more complex, it should be reasonably legible to humans:

<text:hstyle-name="Heading_2">Thisisatitle</text:h><text:pstyle-name="Text_body"/><text:pstyle-name="Text_body">Thisisaparagraph.Theformattinginformationis intheText_bodystyle.Theemptytext:ptagabove isablankparagraph(anemptyline). </text:p>

styles.xml

styles.xml contains style information. OpenDocument makes heavy use of styles for formatting and layout. Most of the style information is here (though some is in content.xml). Styles types include:

The OpenDocument format is somewhat unusual in that using styles for formatting cannot be avoided. Even "manual" formatting is implemented through styles (the application dynamically makes new styles as needed).

meta.xml

meta.xml contains the file metadata. For example, Author, "Last modified by", date of last modification, etc. The contents look somewhat like this:

<meta:creation-date>2003-09-10T15:31:11</meta:creation-date><dc:creator>DanielCarrera</dc:creator><dc:date>2005-06-29T22:02:06</dc:date><dc:language>es-ES</dc:language><meta:document-statistictable-count="6"object-count="0"page-count="59"paragraph-count="676"image-count="2"word-count="16701"character-count="98757"/>

The names of the <dc:...> tags come from the Dublin Core XML standard.

settings.xml

settings.xml includes settings such as the zoom factor or the cursor position. These are properties that are not content or layout.

mimetype (file)

mimetype is just a one-line file with the mimetype of the document. One implication of this is that the file extension is actually immaterial to the format. The file extension is only there for the benefit of the user. It is important to note that this special file is always the first file entry in the ZIP archive and it is uncompressed. Because the ZIP header uses fields with fixed lengths, this allows the direct identification of the different OpenDocument formats without decompression of the content (e. g. with magic bytes).

Thumbnails (directory)

Thumbnails is a separate folder for a document thumbnail. The thumbnail must be saved as “thumbnail.png”. A thumbnail representation of a document should be generated by default when the file is saved. It should be a representation of the first page, first sheet, etc. of the document. The required size for the thumbnails is 128x128 pixel. In order to conform to the Thumbnail Managing Standard (TMS) at www.freedesktop.org, thumbnails must be saved as 8bit, non-interlaced PNG image with full alpha transparency.

META-INF (directory)

META-INF is a separate folder. Information about the files contained in the OpenDocument package is stored in an XML file called the manifest file. The manifest file is always stored at the pathname META-INF/manifest.xml. The main pieces of information stored in the manifest are:

Pictures (directory)

Pictures is a separate folder for images included in the document. This folder is not defined in the OpenDocument specification. Files in this folder can use various image formats, depending on the format of inserted file. While the image data may have an arbitrary format, it is recommended that bitmap graphics are stored in the PNG format and vector graphics in the SVG format.

Reuse of existing formats

By design, OpenDocument reuses existing open XML standards whenever they are available, and it creates new tags only where no existing standard can provide the needed functionality. Thus OpenDocument uses a subset of DublinCore for metadata, MathML for displayed formulas, SMIL for multimedia, XLink for hyperlinks etc.

Although not fully reusing SVG for vector graphics, OpenDocument does use SVG-compatible vector graphics within an ODF-format-specific namespace, but also includes non-SVG graphics.

History

Versions detection

To indicate which version of the OpenDocument specification a file complies with, all root elements take an office:version attribute (in the format revision.version, such as office:version="1.1"), which identifies the version of ODF specification that defined the associated element, its schema, its complete content, and its interpretation.

ODF 1.0/1.1

It is not mandatory to use office:version attribute in ODF 1.0 and ODF 1.1 files, so when an element has office:version omitted, the element is based on ODF 1.0 or 1.1. If the file has a version known to an XML processor, it may validate the document. Otherwise, it is optional to validate the document, but the document must be well formed.

ODF 1.2 and newer

The office:version attribute shall be present in each and every <office:document>, <office:document-content>, <office:document-styles>, <office:document-meta>, and <office:document-settings> element in the XML documents that comprise an OpenDocument 1.2 or newer document. The value of the office:version attribute shall reflect the OpenDocument version. [20]

Conformance

ODF 1.0/1.1

The OpenDocument specification does not specify which elements and attributes conforming applications must, should, or may support. Even typical office applications may only support a subset of the elements and attributes defined in the specification. The specification contains a non-normative table that provides an overview which element and attributes usually are supported by typical office application.

Documents that conform to the OpenDocument specification may contain elements and attributes not specified within the OpenDocument schema. Such elements and attributes must not be part of a namespace that is defined within the specification and are called foreign elements and attributes.

Conforming applications either shall read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place, or shall write documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place. Conforming applications that read and write documents may preserve foreign elements and attributes. In addition to this, conforming applications should preserve meta information and the content of styles.

Conforming applications shall read documents containing processing instructions and should preserve them.

ODF 1.2

ODF 1.2 defines precisely the conformance needs. The specification defines conformance for documents, consumers, and producers, with two conformance classes called conforming and extended conforming. It further defines conforming text, spreadsheet, drawing, presentation, chart, image, formula and database front end documents. Chapter 2 defines the basic requirements for the individual conformance targets. [21]

Footnotes

  1. "OpenOffice.org Document Version Control With Mercurial". Archived from the original on 2017-11-25. Retrieved 2010-06-07.
  2. MIME types - OpenSolaris Default Applications, archived from the original on 2011-07-16, retrieved 2010-06-06
  3. .odb Extension - List of programs that can open .odb files , retrieved 2010-06-06
  4. According to the OpenDocument 1.0 specification, OLE is defined in Kraig Brockschmidt, Inside OLE, Microsoft Press, 1995, ISBN   1-55615-843-2 .
  5. Bruce Byfield (2005-08-23). "FOSS word processors compared: OOo Writer, AbiWord, and KWord" . Retrieved 2010-04-06.
  6. "Sharing files between OpenOffice.org and Microsoft Office". 2005-07-28. Archived from the original on 2010-02-04. Retrieved 2010-04-06.
  7. "SoftMaker Office 2008 focuses on compatibility with Microsoft Office". 2008-11-20. Retrieved 2010-04-06.
  8. "SoftMaker Office 2006 beta: Not a killer app". 2006-11-21. Retrieved 2010-04-06.
  9. Philippe Lagadec (2006-11-30), OpenOffice / OpenDocument and Microsoft Office 2007 / Open XML security (PDF), retrieved 2010-04-06
  10. "OLE object - bitmap representation?". Archived from the original on 2011-07-24. Retrieved 2010-04-06.
  11. "A Rich Edit Control That Displays Bitmaps and Other OLE Objects" . Retrieved 2010-04-06.
  12. "ACC: Why OLE Objects Cause Databases to Grow". 2007-01-19. Archived from the original on 2009-12-13. Retrieved 2010-04-29.
  13. "Web resources & interesting links - easy and simple introduction to OpenDocument Format (ODF)". Archived from the original on 2008-06-02. Retrieved 2010-06-07.
  14. "NEEDS-DISCUSSION: ZIP reference - N 1309" . Retrieved 2010-06-07.
  15. "Zip reference is neither public nor authoritative". 2009-10-11. Retrieved 2010-06-07.
  16. "OASIS Open Document Format for Office Applications (OpenDocument) TC". OASIS website. OASIS. Retrieved 2024-11-26. Open Document Format v1.0 was approved as an OASIS Standard on 1 May 2005.
  17. "OASIS Open Document Format for Office Applications (OpenDocument) TC". OASIS website. OASIS. Retrieved 2024-11-26. Open Document Format v1.1 was approved as an OASIS Standard on 2 February 2007.
  18. "OASIS Open Document Format for Office Applications (OpenDocument) TC". OASIS website. OASIS. Retrieved 2024-11-26. Open Document Format v1.2 was approved as a OASIS Standard on 29 September 2011.
  19. "OASIS Open Document Format for Office Applications (OpenDocument) TC". OASIS website. OASIS. Retrieved 2024-11-26. Open Document Format for Office Applications (OpenDocument) Version 1.3 OASIS Standard was approved by the members of OASIS on 27 April 2021.
  20. "office:version attribute - OpenDocument Version 1.2, Part 1, 29 September 2011" . Retrieved 2024-11-26.
  21. "Conformance defined in OpenDocument Version 1.2, Part 1, 29 September 2011" . Retrieved 2012-12-05.

Related Research Articles

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript, a programming language.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

The Organization for the Advancement of Structured Information Standards is a nonprofit consortium that works on the development, convergence, and adoption of projects - both open standards and open source - for Computer security, blockchain, Internet of things (IoT), emergency management, cloud computing, legal data exchange, energy, content technologies, and other areas.

An open file format is a file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. An open file format is licensed with an open license. For example, an open format can be implemented by both proprietary and free and open-source software, using the typical software licenses used by each. In contrast to open file formats, closed file formats are considered trade secrets.

The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.

XFA stands for XML Forms Architecture, a family of proprietary XML specifications that was suggested and developed by JetForm to enhance the processing of web forms. It can be also used in PDF files starting with the PDF 1.5 specification. The XFA specification is referenced as an external specification necessary for full application of the ISO 32000-1 specification. The XML Forms Architecture was not standardized as an ISO standard, and has been deprecated in PDF 2.0.

The Darwin Information Typing Architecture (DITA) specification defines a set of document types for authoring and organizing topic-oriented information, as well as a set of mechanisms for combining, extending, and constraining document types. It is an open standard that is defined and maintained by the OASIS DITA Technical Committee.

<span class="mw-page-title-main">Microsoft Office 2007</span> Version of Microsoft Office

Microsoft Office 2007 is an office suite for Windows, developed and published by Microsoft. It was officially revealed on March 9, 2006 and was the 12th version of Microsoft Office. It was released to manufacturing on November 3, 2006; it was subsequently made available to volume license customers on November 30, 2006, and later to retail on January 30, 2007. The Mac OS X equivalent, Microsoft Office 2008 for Mac, was released on January 15, 2008.

The OpenDocument format (ODF), an abbreviation for the OASIS Open Document Format for Office Applications, is an open and free document file format for saving and exchanging editable office documents such as text documents, spreadsheets, databases, charts, and presentations. This standard was developed by the OASIS industry consortium, based upon the XML-based file format originally created by OpenOffice.org, and ODF was approved as an OASIS standard on May 1, 2005. It became an ISO standard, ISO/IEC 26300, on May 3, 2006.

OpenFormula is an open standard for exchanging recalculated formulae in spreadsheets. OpenFormula is included in version 1.2 of the OpenDocument standard. OpenFormula was initially proposed and drafted by David A. Wheeler.

Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376. ISO and IEC standardized later versions as ISO/IEC 29500.

The following article details governmental and other organizations from around the world who are in the process of evaluating the suitability of using (adopting) OpenDocument, an open document file format for saving and exchanging office documents that may be edited.

The Open Document Format for Office Applications, commonly known as OpenDocument, was based on OpenOffice.org XML, as used in OpenOffice.org 1, and was standardised by the Organization for the Advancement of Structured Information Standards (OASIS) consortium.

Uniform Office Format, sometimes known as Unified Office Format, is an open standard for office applications developed in China. It includes word processing, presentation, and spreadsheet modules, and is made up of GUI, API, and format specifications. The document format described uses XML contained in a compressed file container, similar to OpenDocument and Office Open XML.

The Open Packaging Conventions (OPC) is a container-file technology initially created by Microsoft to store a combination of XML and non-XML files that together form a single entity such as an Open XML Paper Specification (OpenXPS) document. OPC-based file formats combine the advantages of leaving the independent file entities embedded in the document intact and resulting in much smaller files compared to normal use of XML.

<span class="mw-page-title-main">EPUB</span> E-book format

EPUB is an e-book file format that uses the ".epub" file extension. The term is short for electronic publication and is sometimes stylized as ePUB. EPUB is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers. EPUB is a technical standard published by the International Digital Publishing Forum (IDPF). It became an official standard of the IDPF in September 2007, superseding the older Open eBook (OEB) standard.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.

References