Apache POI

Last updated

Apache POI
Developer(s) Apache Software Foundation
Stable release
5.3.0 / July 2, 2024;3 months ago (2024-07-02) [1]
Repository POI Repository
Written in Java
Operating system Cross-platform
Type API to access Microsoft Office formats
License Apache License 2.0
Website poi.apache.org

Apache POI, a project run by the Apache Software Foundation, and previously a sub-project of the Jakarta Project, provides pure Java libraries for reading and writing files in Microsoft Office formats, such as Word, PowerPoint and Excel.

Contents

History and roadmap

The name was originally an acronym for "Poor Obfuscation Implementation", [2] referring humorously to the fact that the file formats seemed to be deliberately obfuscated, but poorly, since they were successfully reverse-engineered. This explanation – and those of the similar names for the various sub-projects – were removed from the official web pages in order to better market the tools to businesses who would not consider such humor appropriate. The original authors (Andrew C. Oliver and Marc Johnson) also noted the existence of the Hawaiian poi dish, made of mashed taro root, which had similarly derogatory connotations. [3]

Office Open XML support

POI supports the ISO/IEC 29500:2008 Office Open XML file formats since version 3.5. A significant contribution for OOXML support came from Sourcesense, [4] an open source company which was commissioned by Microsoft to develop this contribution. [5] This link spurred controversy, some POI contributors questioning POI OOXML patent protection regarding Microsoft's Open Specification Promise patent license. [6]

Architecture

The Apache POI project contains the following subcomponents (meaning of acronyms is taken from old documentation):

The HSSF component is the most advanced feature of the library. [11] Other components (HPSF, HWPF, and HSLF) are usable, but less full-featured. [12] [13]

The POI library is also provided as a Ruby [14] or ColdFusion extension.

There are modules for Big Data platforms (e.g. Apache Hive/Apache Flink/Apache Spark), which provide certain functionality of Apache POI, such as the processing of Excel files. [15] [16]

Version history

Legend:Old version, not maintainedOld version, still maintainedCurrent stable versionLatest preview versionFuture release

Version number

Date of release
Current stable version:5.3.0July 2, 2024
Old version, no longer maintained: 5.2.5November 25, 2023
Old version, no longer maintained: 5.2.4September 28, 2023
Old version, no longer maintained: 5.2.3September 16, 2022
Old version, no longer maintained: 5.2.2March 19, 2022
Old version, no longer maintained: 5.2.1March 3, 2022
Old version, no longer maintained: 5.2.0January 14, 2022
Old version, no longer maintained: 5.1.0November 1, 2021
Old version, no longer maintained: 5.0.0January 20, 2021
Old version, no longer maintained: 4.1.2February 14, 2020
Old version, no longer maintained: 4.1.1October 20, 2019
Old version, no longer maintained: 4.1.0April 9, 2019
Old version, no longer maintained: 4.0.0September 7, 2018
Old version, no longer maintained: 3.17September 15, 2017
Old version, no longer maintained: 3.16April 19, 2017
Old version, no longer maintained: 3.15September 21, 2016
Old version, no longer maintained: 3.14March 2, 2016
Old version, no longer maintained: 3.13September 29, 2015
Old version, no longer maintained: 3.12May 11, 2015
Old version, no longer maintained: 3.11December 21, 2014
Old version, no longer maintained: 3.10.1August 18, 2014
Old version, no longer maintained: 3.10February 8, 2014
Old version, no longer maintained: 3.9December 3, 2012
Old version, no longer maintained: 3.8March 26, 2012
Old version, no longer maintained: 3.7October 29, 2010
Old version, no longer maintained: 3.6December 14, 2009
Old version, no longer maintained: 3.5September 28, 2009
Old version, no longer maintained: 3.2October 19, 2008
Old version, no longer maintained: 3.1June 29, 2008
Old version, no longer maintained: 3.0.2February 4, 2008
Old version, no longer maintained: 3.0.1July 5, 2007
Old version, no longer maintained: 3.0May 18, 2007
Old version, no longer maintained: 2.5.1February 29, 2004
Old version, no longer maintained: 2.5February 29, 2004
Old version, no longer maintained: 2.0January 26, 2004
Old version, no longer maintained: 1.5.1June 16, 2002
Old version, no longer maintained: 1.5May 6, 2002
Old version, no longer maintained: 1.2.0January 19, 2002
Old version, no longer maintained: 1.1.0January 4, 2002
Old version, no longer maintained: 1.0.2January 11, 2002
Old version, no longer maintained: 1.0.1January 4, 2002
Old version, no longer maintained: 1.0.0December 30, 2001

See also

Related Research Articles

The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document interchange with Microsoft products. Prior to 2008, Microsoft published updated specifications for RTF with major revisions of Microsoft Word and Office versions.

<span class="mw-page-title-main">Serialization</span> Conversion process for computer data

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.

.doc is a filename extension used for word processing documents stored on Microsoft's proprietary Microsoft Word Binary File Format; it was the primary format for Microsoft Word until the 2007 version replaced it with Office Open XML .docx files. Microsoft has used the extension since 1983.

Vector Markup Language (VML) is an obsolete XML-based file format for two-dimensional vector graphics. It was specified in Part 4 of the Office Open XML standards ISO/IEC 29500 and ECMA-376. According to the specification, VML is a deprecated format included in Office Open XML for legacy reasons only.

The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.

OpenOffice or open office may refer to:

<span class="mw-page-title-main">NeoOffice</span> macOS office suite

NeoOffice was an office suite for the macOS operating system developed by Planamesa Inc. It was a commercial fork of the free and open source LibreOffice office suite, including a word processor, spreadsheet, presentation program, and graphics program. It added some features not present in the macOS versions of LibreOffice and Apache OpenOffice. The last few versions were based on LibreOffice 4.4, which was released mid-2014.

<span class="mw-page-title-main">WordPad</span> Basic word processor formerly included with Microsoft Windows

WordPad is a word processor included with Windows 95 and later. Similarly to its predecessor Microsoft Write, it is a basic word processor, positioned as more advanced than the Notepad text editor by supporting rich text editing, but with a subset of the functionality of Microsoft Word.

An INI file is a configuration file for computer software that consists of plain text with a structure and syntax comprising key–value pairs organized in sections. The name of these configuration files comes from the filename extension INI, short for initialization, used in the MS-DOS operating system which popularized this method of software configuration. The format has become an informal standard in many contexts of configuration, but many applications on other operating systems use different file name extensions, such as conf and cfg.

The following tables compare general and technical information for a number of office suites:

Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376. ISO and IEC standardized later versions as ISO/IEC 29500.

COM Structured Storage is a technology developed by Microsoft as part of its Windows operating system for storing hierarchical data within a single file. Strictly speaking, the term structured storage refers to a set of COM interfaces that a conforming implementation must provide, and not to a specific implementation, nor to a specific file format. In addition to providing a hierarchical structure for data, structured storage may also provide a limited form of transactional support for data access. Microsoft provides an implementation that supports transactions, as well as one that does not.

Uniform Office Format, sometimes known as Unified Office Format, is an open standard for office applications developed in China. It includes word processing, presentation, and spreadsheet modules, and is made up of GUI, API, and format specifications. The document format described uses XML contained in a compressed file container, similar to OpenDocument and Office Open XML.

The Office Open XML file formats, also known as OOXML, were standardised between December 2006 and November 2008, first by the Ecma International consortium, and subsequently, after a contentious standardization process, by the ISO/IEC's Joint Technical Committee 1.

This is a comparison of the Office Open XML document file format with the OpenDocument file format.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.

<span class="mw-page-title-main">Apache Spark</span> Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

References

  1. "History of Changes" . Retrieved October 28, 2024.
  2. 1 2 3 4 Sundaram, Elango (22 March 2004), "Excelling in Excel with Java", JavaWorld , retrieved 2020-07-21
  3. POI homepage from October 2004, Coyote Song, archived from the original on October 15, 2004, showing original explanations for naming.
  4. SourceSense
  5. McDougall, Paul (26 March 2008). "Microsoft Eyes Open Source Components for Office 2007". InformationWeek . Retrieved 2020-07-21.
  6. Oliver, Andrew C. (27 March 2008), "Rejection of any ENCUMBERED Microsoft Donation to POI", POI development mailing list archives, retrieved 2020-07-21
  7. 1 2 3 4 5 "POI API Documentation". Poi.apache.org. Retrieved March 7, 2019.
  8. "POI-HPBF - Java API To Access Microsoft Publisher Format Files". Poi.apache.org. Retrieved March 7, 2019.
  9. Codeplex NPOI, Microsoft, archived from the original on March 28, 2012
  10. POI-HSMF, Apache, archived from the original on August 7, 2011, retrieved July 31, 2011
  11. POI-HSSF, Apache
  12. POI-HWPF, Apache
  13. POI-HSLF, Apache
  14. POI-Ruby, Apache
  15. "HadoopOffice for Hive/Flink/Spark". Github.com. July 19, 2018. Retrieved March 7, 2019.
  16. "Spark Excel". Github.com. Retrieved March 7, 2019.