PhyloXML

Last updated
PhyloXML
Filename extensions .phyloxml
Internet media type text/x-phyloxml+xml
Developed byMira V Han and Christian M Zmasek
Initial release27 October 2009(14 years ago) (2009-10-27)
Type of format phylogenetic trees
Extended from XML
Open format?Yes
Website phyloxml.org

PhyloXML is an XML language for the analysis, exchange, and storage of phylogenetic trees (or networks) and associated data. [1] The structure of phyloXML is described by XML Schema Definition (XSD) language.

Contents

A shortcoming of current formats for describing phylogenetic trees (such as Nexus and Newick/New Hampshire) is a lack of a standardized means to annotate tree nodes and branches with distinct data fields (which in the case of a basic species tree might be: species names, branch lengths, and possibly multiple support values). Data storage and exchange is even more cumbersome in studies in which trees are the result of a reconciliation of some kind:

To alleviate this, a variety of ad-hoc, special purpose formats have come into use (such as the NHX format, which focuses on the needs of gene-function and phylogenomic studies).

A well defined XML format addresses these problems in a general and extensible manner and allows for interoperability between specialized and general purpose software.

An example of a program for visualizing phyloXML is Archaeopteryx.

Basic phyloXML example

<phyloxmlxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.phyloxml.org http://www.phyloxml.org/1.10/phyloxml.xsd"xmlns="http://www.phyloxml.org"><phylogenyrooted="true"><name>examplefromProf.JoeFelsenstein'sbook"InferringPhylogenies"</name><description>MrBayesbasedonMAFFTalignment</description><clade><cladebranch_length="0.06"><confidencetype="probability">0.88</confidence><cladebranch_length="0.102"><name>A</name></clade><cladebranch_length="0.23"><name>B</name></clade></clade><cladebranch_length="0.5"><name>C</name></clade></clade></phylogeny></phyloxml>

Related Research Articles

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

In computing, RELAX NG is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also offers a popular compact, non-XML syntax. Compared to other XML schema languages RELAX NG is considered relatively simple.

GPS Exchange Format (GPX) is an XML schema designed as a common GPS data format for software applications. It can be used to describe waypoints, tracks, and routes. It is an open format and can be used without the need to pay license fees. Location data is stored in tags and can be interchanged between GPS devices and software. Common software applications for the data include viewing tracks projected onto various map sources, annotating maps, and geotagging photographs based on the time they were taken.

<span class="mw-page-title-main">XBRL</span> Exchange format for business information

XBRL is a freely available and global framework for exchanging business information. XBRL allows the expression of semantics commonly required in business reporting. The standard was originally based on XML, but now additionally supports reports in JSON and CSV formats, as well as the original XML-based syntax. XBRL is also increasingly used in its Inline XBRL variant, which embeds XBRL tags into an HTML document. One common use of XBRL is the exchange of financial information, such as in a company's annual financial report. The XBRL standard is developed and published by XBRL International, Inc. (XII).

Apache Wicket, commonly referred to as Wicket, is a component-based web application framework for the Java programming language conceptually similar to JavaServer Faces and Tapestry. It was originally written by Jonathan Locke in April 2004. Version 1.0 was released in June 2005. It graduated into an Apache top-level project in June 2007.

Speech Recognition Grammar Specification (SRGS) is a W3C standard for how speech recognition grammars are specified. A speech recognition grammar is a set of word patterns, and tells a speech recognition system what to expect a human to say. For instance, if you call an auto-attendant application, it will prompt you for the name of a person. It will then start up a speech recognizer, giving it a speech recognition grammar. This grammar contains the names of the people in the auto attendant's directory and a collection of sentence patterns that are the typical responses from callers to the prompt.

Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

GraphML is an XML-based file format for graphs. The GraphML file format results from the joint effort of the graph drawing community to define a common format for exchanging graph structure data. It uses an XML-based syntax and supports the entire range of possible graph structure constellations including directed, undirected, mixed graphs, hypergraphs, and application-specific attributes.

Catalogue Service for the Web (CSW), sometimes seen as Catalogue Service - Web, is a standard for exposing a catalogue of geospatial records in XML on the Internet. The catalogue is made up of records that describe geospatial data, geospatial services, and related resources.

Semantic Interpretation for Speech Recognition (SISR) defines the syntax and semantics of annotations to grammar rules in the Speech Recognition Grammar Specification (SRGS). Since 5 April 2007, it is a World Wide Web Consortium recommendation.

The Pronunciation Lexicon Specification (PLS) is a W3C Recommendation, which is designed to enable interoperable specification of pronunciation information for both speech recognition and speech synthesis engines within voice browsing applications. The language is intended to be easy to use by developers while supporting the accurate specification of pronunciation information for international use.

Liquibase is an open-source database-independent library for tracking, managing and applying database schema changes. It was started in 2006 to allow easier tracking of database changes, especially in an agile software development environment.

Archaeopteryx is an interactive computer software program, written in Java, for viewing, editing, and analyzing phylogenetic trees. This type of program can be used for a variety of analyses of molecular data sets, but is particularly designed for phylogenomics. Besides tree description formats with limited expressiveness, it also implements the phyloXML format. Archaeopteryx is the successor to Java program A Tree Viewer (ATV).

Apache Click is a page and component oriented web application framework for the Java language and is built on top of the Java Servlet API.

The Office Open XML file formats are a set of file formats that can be used to represent electronic office documents. There are formats for word processing documents, spreadsheets and presentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.

Directed Graph Markup Language (DGML) is an XML-based file format for directed graphs.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

Data Format Description Language is a modeling language for describing general text and binary data in a standard way. It was published as an Open Grid Forum Recommendation in February 2021, and in April 2024 was published as an ISO standard.

gSOAP is a C and C++ software development toolkit for SOAP/XML web services and generic XML data bindings. Given a set of C/C++ type declarations, the compiler-based gSOAP tools generate serialization routines in source code for efficient XML serialization of the specified C and C++ data structures. Serialization takes zero-copy overhead.

The PROV standard defines a data model, serializations, and definitions to support the interchange of provenance information on the Web. Here provenance includes all "information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness".

References

  1. Han, Mira V.; Zmasek, Christian M. (2009). "phyloXML: XML for evolutionary biology and comparative genomics". BMC Bioinformatics. 10. United Kingdom: BioMed Central: 356. doi: 10.1186/1471-2105-10-356 . PMC   2774328 . PMID   19860910.