Initial release | 21 September 2001 |
---|---|
Written in | Perl |
Type | Bioinformatics |
License | Artistic License and GPL |
Website | biomoby |
BioMOBY is a registry of web services used in bioinformatics. It allows interoperability between biological data hosts and analytical services by annotating services with terms taken from standard ontologies. BioMOBY is released under the Artistic License. [1]
The BioMoby project began at the Model Organism Bring Your own Database Interface Conference (MOBY-DIC), held in Emma Lake, Saskatchewan on September 21, 2001. It stemmed from a conversation between Mark D Wilkinson and Suzanna Lewis during a Gene Ontology developers meeting at the Carnegie Institute, Stanford, where the functionalities of the Genquire and Apollo genome annotation tools were being discussed and compared. The lack of a simple standard that would allow these tools to interact with the myriad of data-sources required to accurately annotate a genome was a critical need of both systems.
Funding for the BioMOBY project was subsequently adopted by Genome Prairie (2002-2005), Genome Alberta (2005-date), in part through Genome Canada, a not-for-profit institution leading the Canadian X-omic initiatives.
There are two main branches of the BioMOBY project. One is a web-service-based approach, while the other utilizes Semantic Web technologies. This article will refer only to the Web Service specifications. The other branch of the project, Semantic Moby, is described in a separate entry.
The Moby project defines three Ontologies that describe biological data-types, biological data-formats, and bioinformatics analysis types. Most of the interoperable behaviours seen in Moby are achieved through the Object (data-format) and Namespace (data-type) ontologies.
The MOBY Namespace Ontology is derived from the Cross-Reference Abbreviations List of the Gene Ontology project. It is simply a list of abbreviations for the different types of identifiers that are used in bioinformatics. For example, Genbank has "gi" identifiers that are used to enumerate all of their sequence records - this is defined as "NCBI_gi" in the Namespace Ontology.
The MOBY Object Ontology is an ontology consisting of IS-A, HAS-A, and HAS relationships between data formats. For example, a DNASequence IS-A GenericSequence and HAS-A String representing the text of the sequence. All data in Moby must be represented as some type of MOBY Object. An XML serialization of this ontology is defined in the Moby API such that any given ontology node has a predictable XML structure.
Thus, between these two ontologies, a service provider and/or a client program can receive a piece of Moby XML, and immediately know both its structure, and its "intent" (semantics).
The final core component of Moby is the MOBY Central web service registry. [2] MOBY Central is aware of the Object, Namespace and Service ontologies, and thus can match consumers who have in-hand Moby data, with service providers who claim to consume that data-type (or some compatible ontological data-type) or to perform a particular operation on it. This "semantic matching" helps ensure that only relevant service providers are identified in a registry query, and moreover, ensures that the in-hand data can be passed to that service provider verbatim. As such, the interaction between a consumer and a service provider can be partially or fully automated, as shown in the Gbrowse Moby [3] and Ahab clients respectively. [4]
BioMOBY does not, for its core operations, utilize the RDF or OWL standards from the W3C. This is in part because neither of these standards were stable in 2001, when the project began, and in part because the library support for these standards were not "commodity" in any of the most common languages (i.e. Perl and Java) at that time.
Nevertheless, the BioMOBY system exhibits what can only be described as Semantic Web-like behaviours. The BioMOBY Object Ontology controls the valid data structures in exactly the same way as an OWL ontology defines an RDF data instance. BioMOBY Web Services consume and generate BioMOBY XML, [5] the structure of which is defined by the BioMOBY Object Ontology. As such, BioMOBY Web Services have been acting as prototypical Semantic Web Services since 2001, despite not using the eventual RDF/OWL standards.
However, BioMOBY does utilize the RDF/OWL standards, as of 2006, for the description of its Objects, Namespaces, Service, and Registry. Increasingly these ontologies are being used to govern the behaviour of all BioMOBY functions using DL reasoners.
There are several client applications that can search and browse the BioMOBY registry of services. One of the most popular is the Taverna workbench built as part of the MyGrid project. The first BioMOBY client was Gbrowse Moby, [6] written in 2001 to allow access to the prototype version of BioMoby Services. Gbrowse Moby, in addition to being a BioMoby browser, now works in tandem with the Taverna workbench to create SCUFL workflows reflecting the Gbrowse Moby browsing session that can then be run in a high-throughput environment. The Seahawk applet also provides the ability to export a session history as a Taverna workflow, in what constitutes a programming by example functionality. [7]
The Ahab client is a fully automated data mining tool. [4] Given a starting point, it will discover, and execute, every possible BioMOBY service and provide the results in a clickable interface.
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of terms and relational expressions that represent the entities in that subject area. The field which studies ontologies so conceived is sometimes referred to as applied ontology.
The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.
The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.
The Industry Foundation Classes (IFC) is a CAD data exchange data schema intended for description of architectural, building and construction industry data.
RDF Schema (Resource Description Framework Schema, variously abbreviated as RDFS, RDF(S), RDF-S, or RDF/S) is a set of classes with certain properties using the RDF extensible knowledge representation data model, providing basic elements for the description of ontologies. It uses various forms of RDF vocabularies, intended to structure RDF resources. RDF and RDFS can be saved in a triplestore, then one can extract some knowledge from them using a query language, like SPARQL.
SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.
FOAF is a machine-readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe themselves. FOAF allows groups of people to describe social networks without the need for a centralised database.
A web resource is any identifiable resource present on or connected to the World Wide Web. Resources are identified using Uniform Resource Identifiers (URIs). In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework (RDF).
A semantic wiki is a wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, provide the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like a database through semantic queries.
Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data.
A semantic web service, like conventional web services, is the server end of a client–server system for machine-to-machine interaction via the World Wide Web. Semantic services are a component of the semantic web because they use markup which makes data machine-readable in a detailed and sophisticated way.
The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people dedicated to build and maintain ontologies related to the life sciences. The OBO Foundry establishes a set of principles for ontology development for creating a suite of interoperable reference ontologies in the biomedical domain. Currently, there are more than a hundred ontologies that follow the OBO Foundry principles.
Semantic publishing on the Web, or semantic web publishing, refers to publishing information on the web as documents accompanied by semantic markup. Semantic publication provides a way for computers to understand the structure and even the meaning of the published information, making information search and data integration more efficient.
Semantically Interlinked Online Communities Project is a Semantic Web technology. SIOC provides methods for interconnecting discussion methods such as blogs, forums and mailing lists to each other. It consists of the SIOC ontology, an open-standard machine-readable format for expressing the information contained both explicitly and implicitly in Internet discussion methods, of SIOC metadata producers for a number of popular blogging platforms and content management systems, and of storage and browsing/searching systems for leveraging this SIOC data.
Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name Taverna Workbench, then a project under the Apache incubator. Taverna allowed users to integrate many different software components, including WSDL SOAP or REST Web services, such as those provided by the National Center for Biotechnology Information, the European Bioinformatics Institute, the DNA Databank of Japan (DDBJ), SoapLab, BioMOBY and EMBOSS. The set of available services was not finite and users could import new service descriptions into the Taverna Workbench.
The concept of the Social Semantic Web subsumes developments in which social interactions on the Web lead to the creation of explicit and semantically rich knowledge representations. The Social Semantic Web can be seen as a Web of collective knowledge systems, which are able to provide useful information based on human contributions and which get better as more people participate. The Social Semantic Web combines technologies, strategies and methodologies from the Semantic Web, social software and the Web 2.0.
In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities of a given domain of interest. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling.
The MIRIAM Registry, a by-product of the MIRIAM Guidelines, is a database of namespaces and associated information that is used in the creation of uniform resource identifiers. It contains the set of community-approved namespaces for databases and resources serving, primarily, the biological sciences domain. These shared namespaces, when combined with 'data collection' identifiers, can be used to create globally unique identifiers for knowledge held in data repositories. For more information on the use of URIs to annotate models, see the specification of SBML Level 2 Version 2.
Identifiers.org is a project providing stable and perennial identifiers for data records used in the Life Sciences. The identifiers are provided in the form of Uniform Resource Identifiers (URIs). Identifiers.org is also a resolving system, that relies on collections listed in the MIRIAM Registry to provide direct access to different instances of the identified records.