Developer(s) | Apache Software Foundation |
---|---|
Stable release | 5.3.0 / July 2, 2024 [1] |
Repository | POI Repository |
Written in | Java |
Operating system | Cross-platform |
Type | API to access Microsoft Office formats |
License | Apache License 2.0 |
Website | poi |
Apache POI, a project run by the Apache Software Foundation, and previously a sub-project of the Jakarta Project, provides pure Java libraries for reading and writing files in Microsoft Office formats, such as Word, PowerPoint and Excel.
The name was originally an acronym for "Poor Obfuscation Implementation", [2] referring humorously to the fact that the file formats seemed to be deliberately obfuscated, but poorly, since they were successfully reverse-engineered. This explanation – and those of the similar names for the various sub-projects – were removed from the official web pages in order to better market the tools to businesses who would not consider such humor appropriate. The original authors (Andrew C. Oliver and Marc Johnson) also noted the existence of the Hawaiian poi dish, made of mashed taro root, which had similarly derogatory connotations. [3]
POI supports the ISO/IEC 29500:2008 Office Open XML file formats since version 3.5. A significant contribution for OOXML support came from Sourcesense, [4] an open source company which was commissioned by Microsoft to develop this contribution. [5] This link spurred controversy, some POI contributors questioning POI OOXML patent protection regarding Microsoft's Open Specification Promise patent license. [6]
The Apache POI project contains the following subcomponents (meaning of acronyms is taken from old documentation):
The HSSF component is the most advanced feature of the library. [11] Other components (HPSF, HWPF, and HSLF) are usable, but less full-featured. [12] [13]
The POI library is also provided as a Ruby [14] or ColdFusion extension.
There are modules for Big Data platforms (e.g. Apache Hive/Apache Flink/Apache Spark), which provide certain functionality of Apache POI, such as the processing of Excel files. [15] [16]
Legend: | Old version, not maintained | Old version, still maintained | Current stable version | Future release |
---|
Version number | Date of release |
---|---|
5.3.0 | July 2, 2024 |
5.2.5 | November 25, 2023 |
5.2.4 | September 28, 2023 |
5.2.3 | September 16, 2022 |
5.2.2 | March 19, 2022 |
5.2.1 | March 3, 2022 |
5.2.0 | January 14, 2022 |
5.1.0 | November 1, 2021 |
5.0.0 | January 20, 2021 |
4.1.2 | February 14, 2020 |
4.1.1 | October 20, 2019 |
4.1.0 | April 9, 2019 |
4.0.0 | September 7, 2018 |
3.17 | September 15, 2017 |
3.16 | April 19, 2017 |
3.15 | September 21, 2016 |
3.14 | March 2, 2016 |
3.13 | September 29, 2015 |
3.12 | May 11, 2015 |
3.11 | December 21, 2014 |
3.10.1 | August 18, 2014 |
3.10 | February 8, 2014 |
3.9 | December 3, 2012 |
3.8 | March 26, 2012 |
3.7 | October 29, 2010 |
3.6 | December 14, 2009 |
3.5 | September 28, 2009 |
3.2 | October 19, 2008 |
3.1 | June 29, 2008 |
3.0.2 | February 4, 2008 |
3.0.1 | July 5, 2007 |
3.0 | May 18, 2007 |
2.5.1 | February 29, 2004 |
2.5 | February 29, 2004 |
2.0 | January 26, 2004 |
1.5.1 | June 16, 2002 |
1.5 | May 6, 2002 |
1.2.0 | January 19, 2002 |
1.1.0 | January 4, 2002 |
1.0.2 | January 11, 2002 |
1.0.1 | January 4, 2002 |
1.0.0 | December 30, 2001 |
The Rich Text Format is a proprietary document file format with published specification developed by Microsoft Corporation from 1987 until 2008 for cross-platform document interchange with Microsoft products. Prior to 2008, Microsoft published updated specifications for RTF with major revisions of Microsoft Word and Office versions.
In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of objects does not include any of their associated methods with which they were previously linked.
.doc is a filename extension used for word processing documents stored on Microsoft's proprietary Microsoft Word Binary File Format; it was the primary format for Microsoft Word until the 2007 version replaced it with Office Open XML .docx
files. Microsoft has used the extension since 1983.
Vector Markup Language (VML) is an obsolete XML-based file format for two-dimensional vector graphics. It was specified in Part 4 of the Office Open XML standards ISO/IEC 29500 and ECMA-376. According to the specification, VML is a deprecated format included in Office Open XML for legacy reasons only.
The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.
OpenOffice or open office may refer to:
NeoOffice was an office suite for the macOS operating system developed by Planamesa Inc. It was a commercial fork of the free and open source LibreOffice office suite, including a word processor, spreadsheet, presentation program, and graphics program. It added some features not present in the macOS versions of LibreOffice and Apache OpenOffice. The last few versions were based on LibreOffice 4.4, which was released mid-2014.
WordPad is a word processor included with Windows 95 and later. Similarly to its predecessor Microsoft Write, it is a basic word processor, positioned as more advanced than the Notepad text editor by supporting rich text editing, but with a subset of the functionality of Microsoft Word.
An INI file is a configuration file for computer software that consists of plain text with a structure and syntax comprising key–value pairs organized in sections. The name of these configuration files comes from the filename extension INI, short for initialization, used in the MS-DOS operating system which popularized this method of software configuration. The format has become an informal standard in many contexts of configuration, but many applications on other operating systems use different file name extensions, such as conf and cfg.
The following tables compare general and technical information for a number of office suites:
Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376. ISO and IEC standardized later versions as ISO/IEC 29500.
COM Structured Storage is a technology developed by Microsoft as part of its Windows operating system for storing hierarchical data within a single file. Strictly speaking, the term structured storage refers to a set of COM interfaces that a conforming implementation must provide, and not to a specific implementation, nor to a specific file format. In addition to providing a hierarchical structure for data, structured storage may also provide a limited form of transactional support for data access. Microsoft provides an implementation that supports transactions, as well as one that does not.
Uniform Office Format, sometimes known as Unified Office Format, is an open standard for office applications developed in China. It includes word processing, presentation, and spreadsheet modules, and is made up of GUI, API, and format specifications. The document format described uses XML contained in a compressed file container, similar to OpenDocument and Office Open XML.
The Office Open XML file formats, also known as OOXML, were standardised between December 2006 and November 2008, first by the Ecma International consortium, and subsequently, after a contentious standardization process, by the ISO/IEC's Joint Technical Committee 1.
This is a comparison of the Office Open XML document file format with the OpenDocument file format.
The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.