HaXml

Last updated
HaXml
Stable release
v1.25.13 [1] / July 13, 2023;6 months ago (2023-07-13)
Repository https://github.com/HaXml/HaXml
Written in Haskell
Type Computer library
License LGPL-2.1 license

HaXml is a collection of utilities for parsing, filtering, transforming, and generating XML documents using Haskell. [2]

Contents

Overview

HaXml utilities include [2] [3] :

HaXml provides a combinator library with a set of higher-order functions which process the XML documents after they are represented using the native Haskell data types. [4] The basic data type is Content which represents the document subset of XML. [5]

HaXml allows to convert XML to Haskell data and vice versa, and it also allows to convert XML to XML (by transforming or filtering). The common usage of the HaXml's parser includes defining the method of traversing the XML data and it has the CFilter type (content filter), where type CFilter = Content -> [Content]. It means that this function defined by the user will take a fragment of an XML data and either return more fragments or none at all. This approach allows to choose XML elements satisfying certain conditions (e.g. tags with certain name or all children of a specified tag). [6] [7]

Example

In the chapter 22 "Extended Example: Web Client Programming" of the Real World Haskell by Bryan O'Sullivan, Don Stewart, and John Goerzen, the following example is considered. [6] The XML file looks like this (simplified version):

<?xml version="1.0" encoding="UTF-8"?><rssxmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd"version="2.0"><channel><title>HaskellRadio</title><link>http://www.example.com/radio/</link><description>Descriptionofthispodcast</description><item>Firstitem</item><item>Seconditem</item></channel></rss>

The following content filter is constructed:

channel::CFilterchannel=tag"rss"/>tag"channel"

This filter is later used to get the title of the channel:

getTitle::Content->StringgetTitledoc=contentToStringDefault"Untitled Podcast"(channel/>tag"title"/>txt$doc)

Related Research Articles

A document type definition (DTD) is a specification file that contains set of markup declarations that define a document type for an SGML-family markup language. The DTD specification file can be used to validate documents.

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

<span class="mw-page-title-main">RSS</span> Family of web feed formats

RSS is a web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. Subscribing to RSS feeds can allow a user to keep track of many different websites in a single news aggregator, which constantly monitor sites for new content, removing the need for the user to manually check them. News aggregators can be built into a browser, installed on a desktop computer, or installed on a mobile device.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.

<span class="mw-page-title-main">Atom (web standard)</span> Web standards

The name Atom applies to a pair of related Web standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol is a simple HTTP-based protocol for creating and updating web resources.

In web development, "tag soup" is a pejorative for syntactically or structurally incorrect HTML written for a web page. Because web browsers have historically treated structural or syntax errors in HTML leniently, there has been little pressure for web developers to follow published standards, and therefore there is a need for all browser implementations to provide mechanisms to cope with the appearance of "tag soup", accepting and correcting for invalid syntax and structure where possible.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

In the Standard Generalized Markup Language (SGML), an entity is a primitive data type, which associates a string with either a unique alias or an SGML reserved word. Entities are foundational to the organizational structure and definition of SGML documents. The SGML specification defines numerous entity types, which are distinguished by keyword qualifiers and context. An entity string value may variously consist of plain text, SGML tags, and/or references to previously defined entities. Certain entity types may also invoke external documents. Entities are called by reference.

The term CDATA, meaning character data, is used for distinct, but related, purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure.

Open Power Template is a web template engine written in PHP 5. A common strategy in designing web application is separation of the application logic from the presentation. OPT is a tool for implementing such separation. The presentation layer is represented by templates, text files with HTML code and extra instructions controlling the data substitution.

A Formal Public Identifier (FPI) is a short piece of text with a particular structure that may be used to uniquely identify a product, specification or document. FPIs were introduced as part of Standard Generalized Markup Language (SGML), and serve particular purposes in formats historically derived from SGML. Some of their most common uses are as part of document type declarations (DOCTYPEs) and document type definitions (DTDs) in SGML, XML and historically HTML, but they are also used in the vCard and iCalendar file formats to identify the software product which generated the file.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

<span class="mw-page-title-main">XML transformation language</span> Type of programming language

An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.

<span class="mw-page-title-main">Yesod (web framework)</span>

Yesod is a web framework based on the programming language Haskell for productive development of type-safe, representational state transfer (REST) model based, high performance web applications, developed by Michael Snoyman, et al. It is free and open-source software released under an MIT License.

A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

<span class="mw-page-title-main">Journal Article Tag Suite</span>

The Journal Article Tag Suite (JATS) is an XML format used to describe scientific literature published online. It is a technical standard developed by the National Information Standards Organization (NISO) and approved by the American National Standards Institute with the code Z39.96-2012.

References

  1. "Release v1.25.13". GitHub . Retrieved January 10, 2024.
  2. 1 2 Gajda, Michał J.; Krylov, Dmitry (November 5, 2020). "Fast XML/HTML tools for Haskell: XML TypeLift and improved Xeno". Zenodo. arXiv: 2011.03536v1 . doi:10.5281/zenodo.3929549. S2CID   226282051.
  3. "README". GitHub . Retrieved January 10, 2024.
  4. Shin-Cheng Mu; Zhenjiang Hu; Masato Takeichi. "Bidirectionalising HaXML" (PDF). Archived from the original (PDF) on January 10, 2024. Retrieved January 10, 2024.
  5. Ohlendorf, Manuel (January 6, 2007). "A Cookbook for the Haskell XML Toolbox with Examples for Processing RDF Documents" (PDF). fhwedel Computer Science Department. p. 78. Archived (PDF) from the original on January 13, 2024. Retrieved January 13, 2024.
  6. 1 2 O'Sullivan, Bryan; Goerzen, John; Stewart, Don (2008). "Chapter 22. Extended Example: Web Client Programming". Real World Haskell . O'Reilly Media. ISBN   978-0596514983.
  7. Wallace, Malcolm; Runciman, Colin (September 1, 1999). "Haskell and XML: generic combinators or type-based translation?" (PDF). ACM SIGPLAN Notices. 34 (9): 148–159. doi:10.1145/317765.317794.