GeoSPARQL

Last updated

GeoSPARQL is a standard for representation and querying of geospatial linked data for the Semantic Web from the Open Geospatial Consortium (OGC). [1] The definition of a small ontology based on well-understood OGC standards is intended to provide a standardized exchange basis for geospatial RDF data which can support both qualitative and quantitative spatial reasoning and querying with the SPARQL database query language. [2]

A geographic information system (GIS) is a system designed to capture, store, manipulate, analyze, manage, and present spatial or geographic data. GIS applications are tools that allow users to create interactive queries, analyze spatial information, edit data in maps, and present the results of all these operations. GIS sometimes refers to geographic information science (GIScience), the science underlying geographic concepts, applications, and systems.

In computing, linked data is a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the internet to become a global database.

The Semantic Web is an extension of the World Wide Web through standards by the World Wide Web Consortium (W3C). The standards promote common data formats and exchange protocols on the Web, most fundamentally the Resource Description Framework (RDF). According to the W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries". The Semantic Web is therefore regarded as an integrator across different content, information applications and systems.

Contents

The Ordnance Survey Linked Data Platform uses OWL mappings for GeoSPARQL equivalent properties in its vocabulary. [3] [4] The LinkedGeoData data set is a work of the Agile Knowledge Engineering and Semantic Web (AKSW) research group at the University of Leipzig, [5] a group mostly known for DBpedia, that uses the GeoSPARQL vocabulary to represent OpenStreetMap data.

Ordnance Survey organisation that creates maps of Great Britain

Ordnance Survey (OS) is the national mapping agency of the United Kingdom which covers the island of Great Britain. Since 1 April 2015 part of Ordnance Survey has operated as Ordnance Survey Ltd, a government-owned company, 100% in public ownership. The Ordnance Survey Board remains accountable to the Secretary of State for Business, Energy and Industrial Strategy. It is also a member of the Public Data Group.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects. Ontologies resemble class hierarchies in object-oriented programming but there are several critical differences. Class hierarchies are meant to represent structures used in source code that evolve fairly slowly whereas ontologies are meant to represent information on the Internet and are expected to be evolving almost constantly. Similarly, ontologies are typically far more flexible as they are meant to represent information on the Internet coming from all sorts of heterogeneous data sources. Class hierarchies on the other hand are meant to be fairly static and rely on far less diverse and more structured sources of data such as corporate databases.

DBpedia online database project

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets. Tim Berners-Lee described DBpedia as one of the most famous parts of the decentralized Linked Data effort.

In particular, GeoSPARQL provides for:

Topology Branch of mathematics

In mathematics, topology is concerned with the properties of space that are preserved under continuous deformations, such as stretching, twisting, crumpling and bending, but not tearing or gluing.

In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains.

RDF Schema is a set of classes with certain properties using the RDF extensible knowledge representation data model, providing basic elements for the description of ontologies, otherwise called RDF vocabularies, intended to structure RDF resources. These resources can be saved in a triplestore to reach them with the query language SPARQL.

Example

The following example SPARQL query could help model the question "What is within the bounding box defined by 38°54′49″N77°05′20″W / 38.913574°N 77.089005°W / 38.913574; -77.089005 and 38°53′11″N77°01′48″W / 38.886321°N 77.029953°W / 38.886321; -77.029953 ?" [6]

PREFIXgeo:<http://www.opengis.net/ont/geosparql#>PREFIXgeof:<http://www.opengis.net/def/function/geosparql/>SELECT?whatWHERE{?whatgeo:hasGeometry?geometry.FILTER(geof:sfWithin(?geometry,"POLYGON((-77.089005 38.913574,-77.029953 38.913574,-77.029953 38.886321,-77.089005 38.886321,-77.089005 38.913574))"^^geo:wktLiteral))}

RCC8 use in GeoSPARQL

RCC8 has been implemented in GeoSPARQL as described below:

A graphical representation of Region Connection Calculus (RCC: Randell, Cui and Cohn, 1992) and the links to the equivalent naming by the Open Geospatial Consortium (OGC) with their equivalent URIs. Region Connection Calculus 8 Relations and Open Geospatial Consortium relations.svg
A graphical representation of Region Connection Calculus (RCC: Randell, Cui and Cohn, 1992) and the links to the equivalent naming by the Open Geospatial Consortium (OGC) with their equivalent URIs.

Implementations

There are (almost) no complete implementations of GeoSPARQL, there are, however partial or vendor implementations of GeoSPARQL. Currently there are the following implementations:

Apache Marmotta
GeoSPARQL was implemented in the context of the Google Summer of Code 2015. [7] on Apache Marmotta; it uses PostGIS, and it is available just for PostgreSQL.
Apache Jena
Since version 2.11 Apache Jena has a GeoSPARQL extension. [8]
Parliament
Parliament has an almost complete implementation of GeoSPARQL by using JENA and a modified ARQ query processor. [9]
Eclipse RDF4J
Eclipse RDF4J is an open-source Java framework for scalable RDF processing, storage, reasoning and SPARQL querying. It offers support for a large subset of GeoSPARQL functionality. [10]
Strabon
Strabon is an open-source semantic spatiotemporal RDF store that supports two popular extensions of SPARQL: stSPARQL and GeoSPARQL. Strabon is built by extending the well-known RDF store Sesame and extends Sesame's components to manage thematic, spatial and temporal data that is stored in the backend RDBMS. It has been fully tested with PostgreSQL (with PostGIS and PostgreSQL-Temporal extensions [11] ) and MonetDB (with geom [12] module).
OpenSahara uSeekM IndexingSail Sesame Sail plugin
uSeekM IndexingSail uses a PostGIS installation to deliver GeoSPARQL. They deliver partial implementation of GeoSPARQL along with some vendor prefixes. [13] [14]
Oracle Spatial and Graph
GraphDB
GraphDB is an enterprise ready Semantic Graph Database, compliant with W3C Standards. Semantic graph databases (also called RDF triplestores) provide the core infrastructure for solutions where modelling agility, data integration, relationship exploration and cross-enterprise data publishing and consumption are important.
Stardog
Stardog is an enterprise data unification platform built on smart graph technology: query, search, inference, and data virtualization.

Submission

The GeoSPARQL standard was submitted to the OGC by:

With regards to future work, the GeoSPARQL standard states:

Obvious extensions are to define new conformance classes for other standard serializations of geometry data (e.g. KML, GeoJSON). In addition, significant work remains in developing vocabularies for spatial data, and expanding the GeoSPARQL vocabularies with OWL axioms to aid in logical spatial reasoning would be a valuable contribution. There are also large amounts of existing feature data represented in either a GML file (or similar serialization) or in a datastore supporting the general feature model. It would be beneficial to develop standard processes for converting (or virtually converting and exposing) this data to RDF.

See also

Related Research Articles

PostGIS geospatial extension for the PostgreSQL Database

PostGIS is an open source software program that adds support for geographic objects to the PostgreSQL object-relational database. PostGIS follows the Simple Features for SQL specification from the Open Geospatial Consortium (OGC).

Query languages or data query languages (DQLs) are computer languages used to make queries in databases and information systems.

SPARQL RDF query language

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 became an official W3C Recommendation, and SPARQL 1.1 in March, 2013.

A spatial database is a database that is optimized for storing and querying data that represents objects defined in a geometric space. Most spatial databases allow the representation of simple geometric objects such as points, lines and polygons. Some spatial databases handle more complex structures such as 3D objects, topological coverages, linear networks, and TINs. While typical databases have developed to manage various numeric and character types of data, such databases require additional functionality to process spatial data types efficiently, and developers have often added geometry or feature data types. The Open Geospatial Consortium developed the Simple Features specification and sets standards for adding spatial functionality to database systems. The SQL/MM Spatial ISO/EIC standard is a part the SQL/MM multimedia standard and extends the Simple Features standard with data types that support circular interpolations.

A spatial query is a special type of database query supported by geodatabases and spatial databases. The queries differ from non-spatial SQL queries in several important ways. Two of the most important are that they allow for the use of geometry data types such as points, lines and polygons and that these queries consider the spatial relationship between these geometries.

Oracle Spatial and Graph, formerly Oracle Spatial, forms a separately-licensed option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

An RDF query language is a computer language, specifically a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.

RDF4J A java framework for parsing, storing, inferencing and querying RDF data

Eclipse RDF4J is an open-source framework for querying and analyzing RDF data. It was created by the Dutch software company Aduna as part of "On-To-Knowledge", a semantic web project that ran from 1999 to 2002. It contains implementations of an in-memory triplestore and an on-disk triplestore, along with two separate Servlet packages that can be used to manage and provide access to these triplestores, on a permanent server. The Sesame Rio package contains a simple API for Java-based RDF parsers and writers. Parsers and writers for popular RDF serialisations are distributed along with Sesame, and users can easily extend the list by putting their parsers and writers on the Java classpath when running their application.

A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with. The inference rules are commonly specified by means of an ontology language, and often a description logic language. Many reasoners use first-order predicate logic to perform reasoning; inference commonly proceeds by forward chaining and backward chaining. There are also examples of probabilistic reasoners, including Pei Wang's non-axiomatic reasoning system, and probabilistic logic networks.

A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject-predicate-object, like "Bob is 35" or "Bob knows Fred".

AllegroGraph is a closed source triplestore which is designed to store RDF triples, a standard format for Linked Data. It also operates as a document store designed for storing, retrieving and managing document-oriented information, in JSON-LD format. AllegroGraph is currently in use in commercial projects and a US Department of Defense project. It is also the storage component for the TwitLogic project that is bringing the Semantic Web to Twitter data.

Open Geospatial Consortium standards organization

The Open Geospatial Consortium (OGC), an international voluntary consensus standards organization, originated in 1994. In the OGC, more than 500 commercial, governmental, nonprofit and research organizations worldwide collaborate in a consensus process encouraging development and implementation of open standards for geospatial content and services, sensor web and Internet of Things, GIS data processing and data sharing.

Apache Marmotta open platform for linked data

Apache Marmotta is a linked data platform that comprises several components. In its most basic configuration it is a Linked Data server. Marmotta is one of the reference projects early implementing the new Linked Data Platform recommendation that is being developed by W3C.

This is a comparison of triplestores, also known as subject-predicate-object databases. Some of these database management systems have been built as database engines from scratch, while others have been built on top of existing commercial relational database engines. Like the early development of online analytical processing (OLAP) databases, this intermediate approach allowed large and powerful database engines to be constructed for little programming effort in the initial phases of triplestore development. Long-term though it seems that native triplestores will have the advantage for performance. A difficulty with implementing triplestores over SQL is that although triples may thus be stored, implementing efficient querying of a graph-based RDF model onto SQL queries is difficult.

Semantic queries allow for queries and analytics of associative and contextual nature. Semantic queries enable the retrieval of both explicitly and implicitly derived information based on syntactic, semantic and structural information contained in data. They are designed to deliver precise results or to answer more fuzzy and wide open questions through pattern matching and digital reasoning.

NitrosBase is a Russian high-performance multi-model database system. The database system supports relational, graph and document database models.

Blazegraph graph database

Blazegraph is a triplestore and graph database, which is used in the Wikidata SPARQL endpoint.

References

  1. Battle & Kolas 2012, p. 355.
  2. Battle & Kolas 2012, p. 358.
  3. Goodwin, John (26 April 2013). "GeoSPARQL and Ordnance Survey Linked Data". John’s Weblog .External link in |website= (help)
  4. Gemma (3 June 2013). "New Linked Data service launches". Ordnance Survey Blog .External link in |website= (help)
  5. "Imprint". AKSW. 2012-05-18.
  6. Battle & Kolas 2012, p. 363.
  7. https://wiki.apache.org/marmotta/GSoC/2015/MARMOTTA-584
  8. [https://jena.apache.org/documentation/query/spatial-query.html
  9. http://parliament.semwebcentral.org/
  10. http://docs.rdf4j.org/programming/#_geosparql
  11. https://github.com/jeff-davis/PostgreSQL-Temporal
  12. https://www.monetdb.org/Documentation/Extensions/GIS
  13. https://dev.opensahara.com/projects/useekm/wiki/IndexingSail#GeoSPARQL
  14. https://dev.opensahara.com/projects/useekm/wiki/GeoReference