Document file format

Last updated

A document file format is a text or binary file format for storing documents on a storage media, especially for use by computers. There currently exist a multitude of incompatible document file formats.

Contents

Examples of XML-based open standards are DocBook, XHTML, and, more recently, the ISO/IEC standards OpenDocument (ISO 26300:2006) and Office Open XML (ISO 29500:2008).

In 1993, the ITU-T tried to establish a standard for document file formats, known as the Open Document Architecture (ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.

Page description languages such as PostScript and PDF have become the de facto standard for documents that a typical user should only be able to create and read, not edit. In 2001, a series of ISO/IEC standards for PDF began to be published, including the specification for PDF itself, ISO-32000.

HTML is the most used and open international standard and it is also used as document file format. It has also become ISO/IEC standard (ISO 15445:2000).

The default binary file format used by Microsoft Word (.doc) has become widespread de facto standard for office documents, but it is a proprietary format and is not always fully supported by other word processors.

Common document file formats

See also

Related Research Articles

<span class="mw-page-title-main">PDF</span> Portable Document Format, a digital file format

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991. PDF was standardized as ISO 32000 in 2008. The last edition as ISO 32000-2:2020 was published in December 2020.

The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document interchange with Microsoft products. Prior to 2008, Microsoft published updated specifications for RTF with major revisions of Microsoft Word and Office versions.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

The Document Style Semantics and Specification Language (DSSSL) is an international standard developed to provide stylesheets for SGML documents.

.doc is a filename extension used for word processing documents stored on Microsoft's proprietary Microsoft Word Binary File Format; it was the primary format for Microsoft Word until the 2007 version replaced it with Office Open XML .docx files. Microsoft has used the extension since 1983.

The Open Document Architecture (ODA) and interchange format is a free and open international standard document file format maintained by the ITU-T to replace all proprietary document file formats. ODA is detailed in the standards documents CCITT T.411-T.424, which is equivalent to ISO/IEC 8613.

The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.

In computing, formatted text, styled text, or rich text, as opposed to plain text, is digital text which has styling information beyond the minimum of semantic elements: colours, styles, sizes, and special features in HTML.

Various binary formats have been proposed as compact representations for XML. Using a binary XML format generally reduces the verbosity of XML documents thereby also reducing the cost of parsing, but hinders the use of ordinary text editors and third-party tools to view and edit the document. There are several competing formats, but none has yet emerged as a de facto standard, although the World Wide Web Consortium adopted EXI as a Recommendation on 10 March 2011.

Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376. ISO and IEC standardized later versions as ISO/IEC 29500.

A proprietary file format is a file format of a company, organization, or individual that contains data that is ordered and stored according to a particular encoding-scheme, designed by the company or organization to be secret, such that the decoding and interpretation of this stored data is easily accomplished only with particular software or hardware that the company itself has developed. The specification of the data encoding format is not released, or underlies non-disclosure agreements. A proprietary format can also be a file format whose encoding is in fact published, but is restricted through licences such that only the company itself or licensees may use it. In contrast, an open format is a file format that is published and free to be used by everybody.

The Open Packaging Conventions (OPC) is a container-file technology initially created by Microsoft to store a combination of XML and non-XML files that together form a single entity such as an Open XML Paper Specification (OpenXPS) document. OPC-based file formats combine the advantages of leaving the independent file entities embedded in the document intact and resulting in much smaller files compared to normal use of XML.

The following is a comparison of e-book formats used to create and publish e-books.

This is a comparison of the Office Open XML document file format with the OpenDocument file format.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.

References

  1. "Microsoft Office Binary (doc, xls, ppt) File Formats". Microsoft . 2008-02-15. Archived from the original on 2009-03-08. Retrieved 2010-03-18.
  2. Microsoft Corporation (2010-07-23). "MS-DOC - Word Binary File Format (.doc) Structure Specification" . Retrieved 2010-08-08.
  3. "What is DjVu - DjVu.org". DjVu.org. Archived from the original on 2019-01-21. Retrieved 2009-03-05.
  4. Microsoft Corporation (May 1999). "Rich Text Format (RTF) Specification, version 1.6" . Retrieved 2010-03-13.
  5. "4.3 Non-HTML file formats". e-Government Unit. May 2002. Archived from the original on February 18, 2010. Retrieved 2010-03-13.
  6. "RTF (.rtf)—Wolfram Language Documentation". reference.wolfram.com.
  7. "WD: Rich Text Format (RTF) Specification 1.7". support.microsoft.com.
  8. Ranjan Parekh, Ranjan (2006). Principles of Multimedia. Tata McGraw-Hill. p. 87. ISBN   0-07-058833-3.