Semantic query

Last updated December 12, 2024

Semantic queries allow for queries and analytics of associative and contextual nature. Semantic queries enable the retrieval of both explicitly and implicitly derived information based on syntactic, semantic and structural information contained in data. They are designed to deliver precise results (possibly the distinctive selection of one single piece of information) or to answer more fuzzy and wide open questions through pattern matching and digital reasoning.

Semantic queries work on named graphs, linked data or triples. This enables the query to process the actual relationships between information and infer the answers from the network of data. This is in contrast to semantic search, which uses semantics (meaning of language constructs) in unstructured text to produce a better search result. (See natural language processing.)

From a technical point of view, semantic queries are precise relational-type operations much like a database query. They work on structured data and therefore have the possibility to utilize comprehensive features like operators (e.g. >, < and =), namespaces, pattern matching, subclassing, transitive relations, semantic rules and contextual full text search. The semantic web technology stack of the W3C is offering SPARQL ^[1]^[2] to formulate semantic queries in a syntax similar to SQL. Semantic queries are used in triplestores, graph databases, semantic wikis, natural language and artificial intelligence systems.

Background

Relational databases represent all relationships between data in an implicit manner only.^[3]^[4] For example, the relationships between customers and products (stored in two content-tables and connected with an additional link-table) only come into existence in a query statement (SQL in the case of relational databases) written by a developer. Writing the query demands exact knowledge of the database schema.^[5]^[6]

Linked-Data represent all relationships between data in an explicit manner. In the above example, no query code needs to be written. The correct product for each customer can be fetched automatically. Whereas this simple example is trivial, the real power of linked-data comes into play when a network of information is created (customers with their geo-spatial information like city, state and country; products with their categories within sub- and super-categories). Now the system can automatically answer more complex queries and analytics that look for the connection of a particular location with a product category. The development effort for this query is omitted. Executing a semantic query is conducted by walking the network of information and finding matches (also called Data Graph Traversal).

Another important aspect of semantic queries is that the type of the relationship can be used to incorporate intelligence into the system. The relationship between a customer and a product has a fundamentally different nature than the relationship between a neighbourhood and its city. The latter enables the semantic query engine to infer that a customer living in Manhattan is also living in New York City whereas other relationships might have more complicated patterns and "contextual analytics". This process is called inference or reasoning and is the ability of the software to derive new information based on given facts.

Articles

Velez, Golda (2008). "Semantics Help Wall Street Cope With Data Overload". wallstreetandtech.com.
Zhifeng, Xiao (2009). "Spatial information semantic query based on SPARQL". In Liu, Yaolin; Tang, Xinming (eds.). International Symposium on Spatial Analysis, Spatial-Temporal Data Modeling, and Data Mining. Vol. 7492. SPIE. pp. 74921P. Bibcode:2009SPIE.7492E..60X. doi:10.1117/12.838556. S2CID 62191842.
Aquin, Mathieu (2010). "Watson, more than a Semantic Web search engine" (PDF). Semantic Web Journal.
Dworetzky, Tom (2011). "How Siri Works: iPhone's 'Brain' Comes from Natural Language Processing". International Business Times.
Horwitt, Elisabeth (2011). "The semantic Web gets down to business". computerworld.com.
Rodriguez, Marko (2011). "Graph Pattern Matching with Gremlin". Marko A. Rodriguez. markorodriguez.com on Graph Computing.
Sequeda, Juan (2011). "SPARQL Nuts & Bolts". Cambridge Semantics.
Freitas, Andre (2012). "Querying Heterogeneous Datasets on the Linked Data Web" (PDF). IEEE Internet Computing.
Kauppinen, Tomi (2012). "Using the SPARQL Package in R to handle Spatial Linked Data". linkedscience.org.
Lorentz, Alissa (2013). "With Big Data, Context is a Big Issue". Wired.

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and data serialization formats, of which the most widely used is Turtle.

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve information. A well known example is the Structured Query Language (SQL).

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

Semantic integration is the process of interrelating information from diverse sources, for example calendars and to do lists, email archives, presence information, documents of all sorts, contacts, search results, and advertising and marketing relevance derived from them. In this regard, semantics focuses on the organization of and action upon information by acting as an intermediary between heterogeneous data sources, which may conflict not only by structure but also context or value.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

An RDF query language is a computer language, specifically a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.

Ontotext is a software company that produces software relating to data management. Its main products are GraphDB, an RDF database; and Ontotext Platform, a general data management platform based on knowledge graphs. It was founded in 2000 in Bulgaria, and now has offices internationally. Together with the BBC, Ontotext developed one of the early large-scale industrial semantic applications, Dynamic Semantic Publishing, starting in 2010.

Apache Jena is an open source Semantic Web framework for Java. It provides an API to extract data from and write to RDF graphs. The graphs are represented as an abstract "model". A model can be sourced with data from files, databases, URLs or a combination of these. A model can also be queried through SPARQL 1.1.

A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject–predicate–object, like "Bob is 35" or "Bob knows Fred".

MarkLogic Server is a document-oriented database developed by MarkLogic. It is a NoSQL multi-model database that evolved from an XML database to natively store JSON documents and RDF triples, the data model for semantics. MarkLogic is designed to be a data hub for operational and analytical data.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

AllegroGraph is a closed source triplestore which is designed to store RDF triples, a standard format for Linked Data. It also operates as a document store designed for storing, retrieving and managing document-oriented information, in JSON-LD format. AllegroGraph is currently in use in commercial projects and a US Department of Defense project. It is also the storage component for the TwitLogic project that is bringing the Semantic Web to Twitter data.

The following is provided as an overview of and topical guide to databases:

GeoSPARQL is a model for representing and querying geospatial linked data for the Semantic Web. It is standardized by the Open Geospatial Consortium as OGC GeoSPARQL. The definition of a small ontology based on well-understood OGC standards is intended to provide a standardized exchange basis for geospatial RDF data which can support both qualitative and quantitative spatial reasoning and querying with the SPARQL database query language.

Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph.

Schema-agnostic databases or vocabulary-independent databases aim at supporting users to be abstracted from the representation of the data, supporting the automatic semantic matching between queries and databases. Schema-agnosticism is the property of a database of mapping a query issued with the user terminology and structure, automatically mapping it to the dataset vocabulary.

NitrosBase is a Russian high-performance multi-model database system. The database system supports relational, graph and document database models.

GQL is a standardized query language for property graphs first described in ISO/IEC 76120, released in April 2024 by ISO/IEC.

References

↑ "Introducing SPARQL: Querying the Semantic Web". XML.com. 2005.
↑ "SPARQL Query Language for RDF". W3C. 2008.
↑ Semantic queries in databases: problems and challenges. Cikm '09. ACM Digital Library. 2009. pp. 1505–1508. doi:10.1145/1645953.1646157. ISBN 9781605585123. S2CID 1578867.
↑ "Karma: A System for Mapping Structured Sources into the Semantic Web" (PDF). eswc-conferences.org. 2012.
↑ "A Scalable Approach to Learn Semantic Models of Structured Sources" (PDF). 8th IEEE International Conference on Semantic Computing. 2014.
↑ "Semantics for Big Data Integration and Analysis" (PDF). AAAI Fall Symposium on Semantics for Big Data. 2013.

External links

W3C Semantic Web Standards - Query

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[XML.com-1] "Introducing SPARQL: Querying the Semantic Web". XML.com. 2005.

[W3C-2] "SPARQL Query Language for RDF". W3C. 2008.

[ACM-DL-3] Semantic queries in databases: problems and challenges. Cikm '09. ACM Digital Library. 2009. pp. 1505–1508. doi:10.1145/1645953.1646157. ISBN 9781605585123. S2CID 1578867.

[ESWC-4] "Karma: A System for Mapping Structured Sources into the Semantic Web" (PDF). eswc-conferences.org. 2012.

[IEEE-5] "A Scalable Approach to Learn Semantic Models of Structured Sources" (PDF). 8th IEEE International Conference on Semantic Computing. 2014.

[AAAI-6] "Semantics for Big Data Integration and Analysis" (PDF). AAAI Fall Symposium on Semantics for Big Data. 2013.

[1]

[2]

[3]

[4]

[5]

[6]