Apache POI

Last updated
Apache POI
Developer(s) Apache Software Foundation
Stable release
5.2.5 / November 25, 2023;2 months ago (2023-11-25) [1]
Repository POI Repository
Written in Java
Operating system Cross-platform
Type API to access Microsoft Office formats
License Apache License 2.0
Website poi.apache.org

Apache POI, a project run by the Apache Software Foundation, and previously a sub-project of the Jakarta Project, provides pure Java libraries for reading and writing files in Microsoft Office formats, such as Word, PowerPoint and Excel.

Contents

History and roadmap

The name was originally an acronym for "Poor Obfuscation Implementation", [2] referring humorously to the fact that the file formats seemed to be deliberately obfuscated, but poorly, since they were successfully reverse-engineered. This explanation – and those of the similar names for the various sub-projects – were removed from the official web pages in order to better market the tools to businesses who would not consider such humor appropriate. The original authors (Andrew C. Oliver and Marc Johnson) also noted the existence of the Hawaiian poi dish, made of mashed taro root, which had similarly derogatory connotations. [3]

Office Open XML support

POI supports the ISO/IEC 29500:2008 Office Open XML file formats since version 3.5. A significant contribution for OOXML support came from Sourcesense, [4] an open source company which was commissioned by Microsoft to develop this contribution. [5] This link spurred controversy, some POI contributors questioning POI OOXML patent protection regarding Microsoft's Open Specification Promise patent license. [6]

Architecture

The Apache POI project contains the following subcomponents (meaning of acronyms is taken from old documentation):

The HSSF component is the most advanced feature of the library. [11] Other components (HPSF, HWPF, and HSLF) are usable, but less full-featured. [12] [13]

The POI library is also provided as a Ruby [14] or ColdFusion extension.

There are modules for Big Data platforms (e.g. Apache Hive/Apache Flink/Apache Spark), which provide certain functionality of Apache POI, such as the processing of Excel files. [15] [16]

Version history

Legend:Old version, not maintainedOlder version, still maintainedCurrent stable versionLatest preview versionFuture release

Version number

Date of release
Current stable version:5.2.525. November 2023
Old version, no longer maintained: 5.2.429. September 2023
Old version, no longer maintained: 5.2.316. September 2022
Old version, no longer maintained: 5.2.219. March 2022
Old version, no longer maintained: 5.2.103. March 2022
Old version, no longer maintained: 5.2.014. January 2022
Old version, no longer maintained: 5.1.001. November 2021
Old version, no longer maintained: 5.0.020. January 2021
Old version, no longer maintained: 4.1.214. February 2020
Old version, no longer maintained: 4.1.120. October 2019
Old version, no longer maintained: 4.1.009. April 2019
Old version, no longer maintained: 4.0.007. September 2018
Old version, no longer maintained: 3.1715. September 2017
Old version, no longer maintained: 3.1619. April 2017
Old version, no longer maintained: 3.1521. September 2016
Old version, no longer maintained: 3.142. March 2016
Old version, no longer maintained: 3.1329. September 2015
Old version, no longer maintained: 3.1211. May 2015
Old version, no longer maintained: 3.1121. December 2014
Old version, no longer maintained: 3.10.118. August 2014
Old version, no longer maintained: 3.108. February 2014
Old version, no longer maintained: 3.93. December 2012
Old version, no longer maintained: 3.826. March 2012
Old version, no longer maintained: 3.729. October 2010
Old version, no longer maintained: 3.614. December 2009
Old version, no longer maintained: 3.528. September 2009
Old version, no longer maintained: 3.219. October 2008
Old version, no longer maintained: 3.129. June 2008
Old version, no longer maintained: 3.0.24. February 2008
Old version, no longer maintained: 3.0.15. July 2007
Old version, no longer maintained: 3.018. May 2007
Old version, no longer maintained: 2.5.129. February 2004
Old version, no longer maintained: 2.529. February 2004
Old version, no longer maintained: 2.026. January 2004
Old version, no longer maintained: 1.5.116. June 2002
Old version, no longer maintained: 1.56. May 2002
Old version, no longer maintained: 1.2.019. January 2002
Old version, no longer maintained: 1.1.04. January 2002
Old version, no longer maintained: 1.0.211. January 2002
Old version, no longer maintained: 1.0.14. January 2002
Old version, no longer maintained: 1.0.030. December 2001

See also

Related Research Articles

The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document interchange with Microsoft products. Prior to 2008, Microsoft published updated specifications for RTF with major revisions of Microsoft Word and Office versions.

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

.doc is a filename extension used for word processing documents stored on Microsoft's proprietary Microsoft Word Binary File Format. Microsoft has used the extension since 1983.

Vector Markup Language (VML) is an obsolete XML-based file format for two-dimensional vector graphics. It was specified in Part 4 of the Office Open XML standards ISO/IEC 29500 and ECMA-376. According to the specification, VML is a deprecated format included in Office Open XML for legacy reasons only.

A document file format is a text or binary file format for storing documents on a storage media, especially for use by computers. There currently exist a multitude of incompatible document file formats.

The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.

OpenOffice or open office may refer to:

<span class="mw-page-title-main">NeoOffice</span> macOS office suite

NeoOffice is an office suite for the macOS operating system developed by Planamesa Inc. It is a commercial fork of the free and open source LibreOffice office suite, including a word processor, spreadsheet, presentation program and graphics program, it adds some features not present in the macOS versions of LibreOffice and Apache OpenOffice. Current versions are based on LibreOffice 4.4, which was released mid-2014.

The following tables compare general and technical information for a number of office suites:

Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376. ISO and IEC standardized later versions as ISO/IEC 29500.

COM Structured Storage is a technology developed by Microsoft as part of its Windows operating system for storing hierarchical data within a single file. Strictly speaking, the term structured storage refers to a set of COM interfaces that a conforming implementation must provide, and not to a specific implementation, nor to a specific file format. In addition to providing a hierarchical structure for data, structured storage may also provide a limited form of transactional support for data access. Microsoft provides an implementation that supports transactions, as well as one that does not.

Uniform Office Format, sometimes known as Unified Office Format, is an open standard for office applications developed in China. It includes word processing, presentation, and spreadsheet modules, and is made up of GUI, API, and format specifications. The document format described uses XML contained in a compressed file container, similar to OpenDocument and Office Open XML.

The Office Open XML file formats, also known as OOXML, were standardised between December 2006 and November 2008, first by the Ecma International consortium, and subsequently, after a contentious standardization process, by the ISO/IEC's Joint Technical Committee 1.

This is a comparison of the Office Open XML document file format with the OpenDocument file format.

The Office Open XML format (OOXML), is an open and free document file format for saving and exchanging editable office documents such as text documents, spreadsheets, charts, and presentations.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.

Microsoft Office password protection is a security feature that allows Microsoft Office documents to be protected with a user-provided password.

References

  1. "History of Changes" . Retrieved 2022-09-19.
  2. 1 2 3 4 Sundaram, Elango (22 March 2004), "Excelling in Excel with Java", JavaWorld , retrieved 2020-07-21
  3. POI homepage from October 2004, Coyote Song, archived from the original on 2004-10-15, showing original explanations for naming.
  4. SourceSense
  5. McDougall, Paul (26 March 2008). "Microsoft Eyes Open Source Components for Office 2007". InformationWeek . Retrieved 2020-07-21.
  6. Oliver, Andrew C. (27 March 2008), "Rejection of any ENCUMBERED Microsoft Donation to POI", POI development mailing list archives, retrieved 2020-07-21
  7. 1 2 3 4 5 "POI API Documentation". Poi.apache.org. Retrieved 2019-03-07.
  8. "POI-HPBF - Java API To Access Microsoft Publisher Format Files". Poi.apache.org. Retrieved 2019-03-07.
  9. Codeplex NPOI, Microsoft, archived from the original on 2012-03-28
  10. POI-HSMF, Apache, archived from the original on 2011-08-07, retrieved 2011-07-31
  11. POI-HSSF, Apache
  12. POI-HWPF, Apache
  13. POI-HSLF, Apache
  14. POI-Ruby, Apache
  15. "HadoopOffice for Hive/Flink/Spark". Github.com. 2018-07-19. Retrieved 2019-03-07.
  16. "Spark Excel". Github.com. Retrieved 2019-03-07.