Expat (software)

Last updated
Expat
Original author(s) James Clark
Developer(s) Clark Cooper, et al.
Initial release1998;26 years ago (1998)
Stable release
2.6.0 [1]   OOjs UI icon edit-ltr-progressive.svg / 6 February 2024;0 days ago (6 February 2024)
Repository
Written in C
Operating system Portable
Type XML parser library
License MIT License [2]
Website libexpat.github.io

Expat is a stream-oriented XML 1.0 parser library, written in C. As one of the first available open-source XML parsers, Expat has found a place in many open-source projects. Such projects include the Apache HTTP Server, Mozilla, Perl, Python and PHP. It is also bound in many other languages.

Contents

Timeline

Software developer James Clark released version 1.0 in 1998 while serving as technical lead on the XML Working Group at the World Wide Web Consortium.[ citation needed ] Clark released two more versions, 1.1 and 1.2, before turning the project over to a group led by Clark Cooper and Fred Drake in 2000. The new group released version 1.95.0 in September 2000 and continues to release new versions to incorporate bug fixes and enhancements.

Availability

GitHub hosts the Expat project. Versions exist for most[ quantify ] major[ citation needed ] operating-systems.

Deployment

To use the Expat library, programs first register handler functions with Expat. When Expat parses an XML document, it calls the registered handlers as it finds relevant tokens in the input stream. These tokens and their associated handler calls are called events. Typically, programs register handler functions for XML element start or stop events and character events. Expat provides facilities for more sophisticated event handling such as XML Namespace declarations, processing instructions and DTD events.

Expat's parsing events resemble the events defined in the Simple API for XML (SAX), but Expat is not a SAX-compliant parser. Projects incorporating the Expat library often build SAX and possibly DOM parsers on top of Expat. While Expat is mainly a stream-based (push) parser, it supports stopping and restarting parsing at arbitrary times, thus making the implementation of a pull parser relatively easy as well.

Related Research Articles

<span class="mw-page-title-main">Document Object Model</span> Convention for representing and interacting with objects in HTML, XHTML, and XML documents

The Document Object Model (DOM) is a cross-platform and language-independent interface that treats an HTML or XML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with a logical tree. Each branch of the tree ends in a node, and each node contains objects. DOM methods allow programmatic access to the tree; with them one can change the structure, style or content of a document. Nodes can have event handlers attached to them. Once an event is triggered, the event handlers get executed.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.

In computing, the Java API for XML Processing, or JAXP, one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces:

GNU Bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification in Bison syntax, warns about any parsing ambiguities, and generates a parser that reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.

SAX is an event-driven online algorithm for lexing and parsing XML documents, with an API developed by the XML-DEV mailing list. SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole—building the full abstract syntax tree of an XML document for convenience of the user—SAX parsers operate on each piece of the XML document sequentially, issuing parsing events while making a single pass through the input stream.

<span class="mw-page-title-main">Apache Xerces</span>

In computing, Xerces is Apache's collection of software libraries for parsing, validating, serializing and manipulating XML. The library implements a number of standard APIs for XML parsing, including DOM, SAX and SAX2. The implementation is available in the Java, C++ and Perl programming languages.

Lex is a computer program that generates lexical analyzers.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

YAML(see § History and name) is a human-readable data serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Extensible Markup Language (XML) but has a minimal syntax that intentionally differs from Standard Generalized Markup Language (SGML). It uses a more compact format that uses [...] for lists and {...<nowiki>}</nowiki> for maps as well as Python-style indentation to indicate nesting. However, as it forbids using tab characters as indentation, only some JSON files are valid YAML 1.2.

James Clark is a software engineer and creator of various open-source software including groff, expat and several XML specifications.

The PHP Extension and Application Repository, or PEAR, is a repository of PHP software code. Stig S. Bakken founded the PEAR project in 1999 to promote the re-use of code that performs common functions. The project seeks to provide a structured library of code, maintain a system for distributing code and for managing code packages, and promote a standard coding style. Though community-driven, the PEAR project has a PEAR Group which serves as the governing body and takes care of administrative tasks. Each PEAR code package comprises an independent project under the PEAR umbrella. It has its own development team, versioning-control and documentation.

Streaming API for XML (StAX) is an application programming interface (API) to read and write XML documents, originating from the Java programming language community.

Tea is a high-level scripting language for the Java environment. It combines features of Scheme, Tcl, and Java.

libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets.

jQuery is a JavaScript library designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animations, and Ajax. It is free, open-source software using the permissive MIT License. As of August 2022, jQuery is used by 77% of the 10 million most popular websites. Web analysis indicates that it is the most widely deployed JavaScript library by a large margin, having at least three to four times more usage than any other JavaScript library.

A loop-switch sequence is a programming antipattern where a clear set of steps is implemented as a switch-within-a-loop. The loop-switch sequence is a specific derivative of spaghetti code.

XML documents typically refer to external entities, for example the public and/or system ID for the Document Type Definition. These external relationships are expressed using URIs, typically as URLs.

Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of cross-platform XML processing technologies centered on a non-extractive XML, "document-centric" parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as one of the following:

LibSBML is an open-source software library that provides an application programming interface (API) for the SBML format. The libSBML library can be embedded in a software application or used in a web servlet as part of the application or servlet's implementation of support for reading, writing, and manipulating SBML documents and data streams. The core of libSBML is written in ISO standard C++; the library provides API for many programming languages via interfaces generated with the help of SWIG.

References

  1. "Release 2.6.0".
  2. "COPYING". Github. Retrieved 16 September 2019.