XMLStarlet

Last updated
XMLStarlet
Original author(s) Dagobert Michelsen, Noam Postavsky, Mikhail Grushinskiy
Initial release8 February 2005;19 years ago (2005-02-08)
Stable release
1.6.1 [1]   OOjs UI icon edit-ltr-progressive.svg / 9 August 2014;9 years ago (9 August 2014)
Repository
Written in C
Operating system Unix-like, Windows, CygWin, Mac OS
Type XML parser
License MIT License
Website xmlstar.sourceforge.net

XMLStarlet is a set of command line utilities (toolkit) to query, transform, validate, and edit XML documents and files using a simple set of shell commands in a way similar to how it is done with UNIX grep, sed, awk, diff, patch, join, etc commands.

Contents

This set of command line utilities can be used by those who want to test XPath query or execute commands on the fly as well as deal with many XML documents or for automated XML processing with shell scripts.

To run XMLStarlet utility you can download it from the official site, then simply type 'xml' on the command line with the corresponding commands or queries to execute (see #Examples below).

Features

The toolkit's feature set includes the following options:

The XMLStarlet command line utility is written in C and uses libxml2 and libxslt. Implementation of extensive choice of options for XMLStarlet utility was only possible because of rich feature set of both libraries: libxml2 and libxslt. XMLStarlet is linked statically to both libxml2 and libxslt, so generally all you need to process XML documents is one executable file.

XMLStarlet is open source free software released under an MIT License which allows free use and distribution for both commercial and non-commercial projects.

Examples

Consider the following XML document 'xmlfile1.xml' example:

<?xml version="1.0" encoding="utf-8"?><wikimedia><projects><projectname="Wikipedia"launch="2001-01-05"><editions><editionlanguage="English">en.wikipedia.org</edition><editionlanguage="German">de.wikipedia.org</edition><editionlanguage="French">fr.wikipedia.org</edition><editionlanguage="Polish">pl.wikipedia.org</edition><editionlanguage="Spanish">es.wikipedia.org</edition></editions></project><projectname="Wiktionary"launch="2002-12-12"><editions><editionlanguage="English">en.wiktionary.org</edition><editionlanguage="French">fr.wiktionary.org</edition><editionlanguage="Vietnamese">vi.wiktionary.org</edition><editionlanguage="Turkish">tr.wiktionary.org</edition><editionlanguage="Spanish">es.wiktionary.org</edition></editions></project><projectname="Wikiversity"launch="2006-10-04"><editions><editionlanguage="English">en.wikiversity.org</edition></editions></project></projects></wikimedia>

On a command prompt the following five XPath queries are executed on the above XML file 'xmlfile1.xml'.

$ xmlstarletsel-t-v"//wikimedia/projects/project/@name"xmlfile1.xml WikipediaWiktionaryWikiversity
$ xmlstarletsel-t-v"/wikimedia/projects/project[last()]/@*"xmlfile1.xml Wikiversity2006-10-04
$ xmlstarletsel-t-v"/wikimedia/projects/project[@name='Wiktionary']/editions/edition"xmlfile1.xml en.wiktionary.orgfr.wiktionary.orgvi.wiktionary.orgtr.wiktionary.orges.wiktionary.org
$ xmlstarletsel-t-v"/wikimedia/projects/project[@name='Wiktionary']/editions/edition[@language!='Turkish' and @language!='Spanish']"xmlfile1.xml en.wiktionary.orgfr.wiktionary.orgvi.wiktionary.org
$ xmlstarletsel-t-v"/wikimedia/projects/project/editions/edition[position() >= 3]/@*"xmlfile1.xml FrenchPolishSpanishVietnameseTurkishSpanish

An XML document can be validated against an XSD schema saved in file 'xsdfile.xsd' as follows:

$ xmlstarletval-e-sxsdfile.xsdxmlfile1.xml xmlfile1.xml - valid

See also

Notes

    1. "XMLStarlet command line XML toolkit - Browse /xmlstarlet/1.6.1 at SourceForge.net".

    Related Research Articles

    <span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

    Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

    In computing, the term Extensible Stylesheet Language (XSL) is used to refer to a family of languages used to transform and render XML documents.

    XSLT is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification.

    DocBook is a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

    XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

    An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

    XPath 2.0 is a version of the XPath language defined by the World Wide Web Consortium, W3C. It became a recommendation on 23 January 2007. As a W3C Recommendation it was superseded by XPath 3.0 on 10 April 2014.

    eXist-db is an open source software project for NoSQL databases built on XML technology. It is classified as both a NoSQL document-oriented database system and a native XML database. Unlike most relational database management systems (RDBMS) and NoSQL databases, eXist-db provides XQuery and XSLT as its query and application programming languages.

    libxml2 is a software library for parsing XML documents. It is also the basis for the libxslt library which processes XSLT-1.0 stylesheets.

    The identity transform is a data transformation that copies the source data into the destination data without change.

    <span class="mw-page-title-main">Event Viewer</span> Component of Microsofts Windows NT operating system

    Event Viewer is a component of Microsoft's Windows NT operating system that lets administrators and users view the event logs on a local or remote machine. Applications and operating-system components can use this centralized log service to report events that have taken place, such as a failure to start a component or to complete an action. In Windows Vista, Microsoft overhauled the event system.

    XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.

    libxslt is the XSLT C library developed for the GNOME project. It provides an implementation of XSLT 1.0, plus most of the EXSLT set of processor-portable extensions functions and some of Saxon's evaluate and expressions extensions. libxslt is based on libxml2, which it uses for XML parsing, tree manipulation and XPath support. It is free software released under the MIT License and can be reused in commercial applications.

    XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

    Content Assembly Mechanism (CAM) is an XML-based standard for creating and managing information exchanges that are interoperable and deterministic descriptions of machine-processable information content flows into and out of XML structures. CAM is a product of the OASIS Content Assembly Technical Committee.

    XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

    <span class="mw-page-title-main">Nokogiri (software)</span>

    Nokogiri is an open source software library to parse HTML and XML in Ruby. It depends on libxml2 and libxslt to provide its functionality.

    srcML is a document-oriented XML representation of source code. It was created in a collaborative effort between Michael L. Collard and Jonathan I. Maletic. The abbreviation, srcML, is short for Source Markup Language. srcML wraps source code (text) with information from the Abstract Syntax Tree or AST (tags) into a single XML document. All original text is preserved so that the original source code document can be recreated from the srcML markup. The only exception is the possibility of newline normalization.

    XPath 3 is the latest version of the XML Path Language, a query language for selecting nodes in XML documents. It supersedes XPath 1.0 and XPath 2.0.