Blank node

Last updated

Example of a blank node in a RDF graph Blank node example (w3.org).jpg
Example of a blank node in a RDF graph

In RDF, a blank node (also called bnode) is a node in an RDF graph representing a resource for which a URI or literal is not given. [1] The resource represented by a blank node is also called an anonymous resource. According to the RDF standard a blank node can only be used as subject or object of an RDF triple.

Contents

Notation in serialization formats

Blank nodes can be denoted through blank node identifiers in the following formats, RDF/XML, RDFa, Turtle, N3 and N-Triples. The following example shows how it works in RDF/XML.

<rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:ex="http://example.org/data#"><rdf:Descriptionrdf:about="http://example.org/web-data"ex:title="Web Data"><ex:professorrdf:nodeID="b"/></rdf:Description><rdf:Descriptionrdf:nodeID="b"ex:fullName="Alice Carol"><ex:homePagerdf:resource="http://example.net/alice-carol"/></rdf:Description></rdf:RDF>

The blank node identifiers are only limited in scope to a serialization of a particular RDF graph, i.e. the node _:b in the subsequent example does not represent the same node as a node named _:b in any other graph.

Blank nodes can also be denoted through nested elements (in RDF/XML, RDFa, Turtle and N3). Here is the same triples with the above.

<rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:ex="http://example.org/data#"><rdf:Descriptionrdf:about="http://example.org/web-data"ex:title="Web Data"><ex:professor><rdf:Descriptionex:fullName="Alice Carol"><ex:homePagerdf:resource="http://example.net/alice-carol"/></rdf:Description></ex:professor></rdf:Description></rdf:RDF>

Below is the same example in RDFa.

<pabout="http://example.org/web-data"><spanproperty="ex:title">WebData</span><spanrel="ex:professor"><aproperty="ex:fullName"rel="ex:homePage"href="http://example.net/alice-carol">AliceCarol</a></span></p>

Below is the same example in Turtle.

@prefixex:<http://example.org/data#>.<http://example.org/web-data>ex:title"Web Data";ex:professor[ex:fullName"Alice Carol";ex:homePage<http://example.net/alice-carol>].

Usability

Blank nodes are treated as simply indicating the existence of a thing, without using a URI (Uniform Resource Identifier) to identify any particular thing. This is not the same as assuming that the blank node indicates an 'unknown' URI. [1]

Anonymous resources in RDF

From a technical perspective they give the capability to:

  1. describe multi-component structures, like the RDF containers,
  2. describe reification (i.e. provenance information),
  3. represent complex attributes without having to name explicitly the auxiliary node (e.g. the address of a person consisting of the street, the number, the postal code and the city) and
  4. offer protection of the inner information (e.g. protecting the sensitive information of the customers from the browsers). [2]

Below there is an example where blank nodes are used to represent resources in the aforementioned ways. In particular, the blank node with the identifier '_:students' represents a Bag RDF Container, the blank node with the identifier '_:address' represents a complex attribute and those with the identifiers '_:activity1' and '_:activity2' represent events in the lifecycle of a digital object.

<http://example.org/web-data>ex:title"Web Data";ex:professor_:entity;ex:students_:students;ex:generatedBy_:activity1._:entityex:fullName"Alice Carol";ex:homePage<http://example.net/alice-carol>;ex:hasAddress_:address._:addressaex:Address;ex:streetAddress"123 Main St.";ex:postalCode"A1A1A1";ex:addressLocality"London"._:studentsardf:Bag;ex:hasMember_:s1;ex:hasMember_:s2._:activity1aex:Event;ex:creator_:entity;ex:atTime"Tuesday 11 February, 06:51:00 CST"._:activity2aex:Event,ex:Update;ex:actionOver_:activity1;ex:creator_:entity2;ex:atTime"Monday 17 February, 08:12:00 CST".

Anonymous classes in OWL

The ontology language OWL uses blank nodes to represent anonymous classes such as unions or intersections of classes, [3] or classes called restrictions, defined by a constraint on a property. [4]

For example, to express that a person has at most one birth date, one will define the class "Person" as a subclass of an anonymous class of type "owl:Restriction". This anonymous class is defined by two attributes specifying the constrained property and the constraint itself (cardinality ≤ 1)

<owl:Classrdf:about="http://example.org/ontology/Person"><rdfs:subClassOf><owl:Restriction><owl:maxCardinality>1</owl:maxCardinality><owl:onPropertyrdf:resource="http://xmlns.com/foaf/0.1/birthDate"/></owl:Restriction></rdfs:subClassOf></owl:Class>

Blank nodes in published data

Blank node prevalence

According to an empirical survey [5] in Linked Data published on the Web, out of the 783 domains contributing to the corpus, 345 (44.1%) did not publish any blank nodes. The average percentage of unique terms which were blank nodes for each domain was 7.5%, indicating that although a small number of high-volume domains publish many blank nodes, many other domains publish blank nodes more infrequently.

From the 286.3 MB unique terms found in data-level positions the 165.4 MB (57.8%) were blank nodes, 92.1 MB (32.2%) were URIs, and 28.9 MB (10%) were literals. Each blank node had on average 5.2 data-level occurrences. It occurred, on average, 0.99 times in the object position of a non-rdf:type triple, and 4.2 times in the subject position of a triple.

Structure of blank nodes

According to the same empirical survey of linked data published on the Web, the majority of documents surveyed contain tree-based blank node structures. A small fraction contain complex blank node structures for which various tasks are potentially very expensive to compute.

Sensitive tasks

The existence of blank nodes requires special treatment in various tasks, whose complexity grows exponentially to the number of these nodes.

Comparing RDF graphs

The inability to match blank nodes increases the delta size (the number of triples that need to be deleted and added in order to transform one RDF graph to another) and does not assist in detecting the changes between subsequent versions of a Knowledge Base. Building a mapping between the blank nodes of two compared Knowledge Bases that minimizes the delta size is NP-Hard in the general case. [6]

BNodeLand is a framework that deals with this problem and proposes solutions through particular tools. [7]

Entailment checking

Regarding the entailment problem it is proved that (a) deciding simple or RDF/S entailment of RDF graphs is NP-Complete, [8] and (b) deciding equivalence of simple RDF graphs is Isomorphism-Complete.

See also

Related Research Articles

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

RDF Schema (Resource Description Framework Schema, variously abbreviated as RDFS, RDF(S), RDF-S, or RDF/S) is a set of classes with certain properties using the RDF extensible knowledge representation data model, providing basic elements for the description of ontologies. It uses various forms of RDF vocabularies, intended to structure RDF resources. RDF and RDFS can be saved in a triplestore, then one can extract some knowledge from them using a query language, like SPARQL.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

<span class="mw-page-title-main">FOAF</span> Semantic Web ontology to describe relations between people

FOAF is a machine-readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe themselves. FOAF allows groups of people to describe social networks without the need for a centralised database.

A web resource is any identifiable resource present on or connected to the World Wide Web. Resources are identified using Uniform Resource Identifiers (URIs). In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework (RDF).

GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages. It is a W3C Recommendation, and enables users to obtain RDF triples out of XML documents, including XHTML. The GRDDL specification shows examples using XSLT, however it was intended to be abstract enough to allow for other implementations as well. It became a Recommendation on September 11, 2007.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

<span class="mw-page-title-main">RDFLib</span> Python library to serialize, parse and process RDF data

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. This library contains parsers/serializers for almost all of the known RDF serializations, such as RDF/XML, Turtle, N-Triples, & JSON-LD, many of which are now supported in their updated form. The library also contains both in-memory and persistent Graph back-ends for storing RDF information and numerous convenience functions for declaring graph namespaces, lodging SPARQL queries and so on. It is in continuous development with the most recent stable release, rdflib 6.1.1 having been released on 20 December 2021. It was originally created by Daniel Krech with the first release in November, 2002.

In computing, Terse RDF Triple Language (Turtle) is a syntax and file format for expressing data in the Resource Description Framework (RDF) data model. Turtle syntax is similar to that of SPARQL, an RDF query language. It is a common data format for storing RDF data, along with N-Triples, JSON-LD and RDF/XML.

N-Triples is a format for storing and transmitting data. It is a line-based, plain text serialisation format for RDF graphs, and a subset of the Turtle format. N-Triples should not be confused with Notation3 which is a superset of Turtle. N-Triples was primarily developed by Dave Beckett at the University of Bristol and Art Barstow at the World Wide Web Consortium (W3C).

The Semantic Web Stack, also known as Semantic Web Cake or Semantic Web Layer Cake, illustrates the architecture of the Semantic Web.

The FAO geopolitical ontology is an ontology developed by the Food and Agriculture Organization of the United Nations (FAO) to describe, manage and exchange data related to geopolitical entities such as countries, territories, regions and other similar areas.

<span class="mw-page-title-main">Named graph</span> Extension of the RDF data model

Named graphs are a key concept of Semantic Web architecture in which a set of Resource Description Framework statements are identified using a URI, allowing descriptions to be made of that set of statements such as context, provenance information or other such metadata.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

The European Legislation Identifier (ELI) ontology is a vocabulary for representing metadata about national and European Union (EU) legislation. It is designed to provide a standardized way to identify and describe the context and content of national or EU legislation, including its purpose, scope, relationships with other legislations and legal basis. This will guarantee easier identification, access, exchange and reuse of legislation for public authorities, professional users, academics and citizens. ELI paves the way for knowledge graphs, based on semantic web standards, of legal gazettes and official journals.

A semantic triple, or RDF triple or simply triple, is the atomic data entity in the Resource Description Framework (RDF) data model. As its name indicates, a triple is a sequence of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions.

Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) standard language for describing Resource Description Framework (RDF) graphs. SHACL has been designed to enhance the semantic and technical interoperability layers of ontologies expressed as RDF graphs.

References

  1. 1 2 "RDF 1.1 Semantics" . Retrieved 6 April 2024.
  2. L. Chen, H. Zhang, Y. Chen, and W. Guo. Blank Nodes in RDF. Journal of Software, 2012.
  3. "OWL Web Ontology Language Parsing OWL in RDF/XML".
  4. "OWL Web Ontology Language Reference" . Retrieved 6 April 2024.
  5. A. Mallea, M. Arenas, A. Hogan, and A. Polleres. On Blank Nodes. In Procs of the 10th Intern. Semantic Web Conference (ISWC 2011), 2011.
  6. Y. Tzitzikas, C. Lantzaki, and D. Zeginis. Blank Node Matching and RDF/S Comparison Functions. In Procs of the 11th Intern. Semantic Web Conference (ISWC 2012), 2012.
  7. BNodeLand forth.gr
  8. H. J. ter Horst. "Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary." J. of Web Sem. 3:79-115, 2005.