The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and data serialization formats, of which the most widely used is Turtle (Terse RDF Triple Language).
RDF is a directed graph composed of triple statements. An RDF graph statement is represented by: (1) a node for the subject, (2) an arc from subject to object, representing a predicate, and (3) a node for the object. Each of these parts can be identified by a Uniform Resource Identifier (URI). An object can also be a literal value. This simple, flexible data model has a lot of expressive power to represent complex situations, relationships, and other things of interest, while also being appropriately abstract.
RDF was adopted as a W3C recommendation in 1999. The RDF 1.0 specification was published in 2004, and the RDF 1.1 specification in 2014. SPARQL is a standard query language for RDF graphs. RDF Schema (RDFS), Web Ontology Language (OWL) and SHACL (Shapes Constraint Language) are ontology languages that are used to describe RDF data.
The RDF data model [1] is similar to classical conceptual modeling approaches (such as entity–relationship or class diagrams). It is based on the idea of making statements about resources (in particular web resources) in expressions of the form subject–predicate–object, known as triples . The subject denotes the resource; the predicate denotes traits or aspects of the resource, and expresses a relationship between the subject and the object.
For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue". Therefore, RDF uses subject instead of object (or entity) in contrast to the typical approach of an entity–attribute–value model in object-oriented design: entity (sky), attribute (color), and value (blue).
RDF is an abstract model with several serialization formats (being essentially specialized file formats). In addition the particular encoding for resources or triples can vary from format to format.
This mechanism for describing resources is a major component in the W3C's Semantic Web activity: an evolutionary stage of the World Wide Web in which automated software can store, exchange, and use machine-readable information distributed throughout the Web, in turn enabling users to deal with the information with greater efficiency and certainty. RDF's simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity.
A collection of RDF statements intrinsically represents a labeled, directed multigraph. This makes an RDF data model better suited to certain kinds of knowledge representation than other relational or ontological models.
As RDFS, OWL and SHACL demonstrate, one can build additional ontology languages upon RDF.
The initial RDF design, intended to "build a vendor-neutral and operating system- independent system of metadata", [2] derived from the W3C's Platform for Internet Content Selection (PICS), an early web content labelling system, [3] but the project was also shaped by ideas from Dublin Core, and from the Meta Content Framework (MCF), [2] which had been developed during 1995 to 1997 by Ramanathan V. Guha at Apple and Tim Bray at Netscape. [4]
A first public draft of RDF appeared in October 1997, [5] [6] issued by a W3C working group that included representatives from IBM, Microsoft, Netscape, Nokia, Reuters, SoftQuad, and the University of Michigan. [3]
In 1999, the W3C published the first recommended RDF specification, the Model and Syntax Specification ("RDF M&S"). [7] This described RDF's data model and an XML serialization. [8]
Two persistent misunderstandings about RDF developed at this time: firstly, due to the MCF influence and the RDF "Resource Description" initialism, the idea that RDF was specifically for use in representing metadata; secondly that RDF was an XML format rather than a data model, and only the RDF/XML serialisation being XML-based. RDF saw little take-up in this period, but there was significant work done in Bristol, around ILRT at Bristol University and HP Labs, and in Boston at MIT. RSS 1.0 and FOAF became exemplar applications for RDF in this period.
The recommendation of 1999 was replaced in 2004 by a set of six specifications: [9] "The RDF Primer", [10] "RDF Concepts and Abstract", [11] "RDF/XML Syntax Specification (revised)", [12] "RDF Semantics", [13] "RDF Vocabulary Description Language 1.0", [14] and "The RDF Test Cases". [15]
This series was superseded in 2014 by the following six "RDF 1.1" documents: "RDF 1.1 Primer", [16] "RDF 1.1 Concepts and Abstract Syntax", [17] "RDF 1.1 XML Syntax", [18] "RDF 1.1 Semantics", [19] "RDF Schema 1.1", [20] and "RDF 1.1 Test Cases". [21]
The vocabulary defined by the RDF specification is as follows: [22]
rdf:XMLLiteral
rdf:Property
rdf:Statement
rdf:Alt
, rdf:Bag
, rdf:Seq
rdfs:Container
is a super-class of the three)rdf:List
rdf:nil
rdf:List
representing the empty listrdfs:Resource
rdfs:Literal
rdfs:Class
rdfs:Datatype
rdfs:Container
rdfs:ContainerMembershipProperty
rdf:_1
, rdf:_2
, ..., all of which are sub-properties of rdfs:member
rdf:type
rdf:Property
used to state that a resource is an instance of a classrdf:first
rdf:rest
rdf:first
rdf:value
rdf:subject
rdf:predicate
rdf:object
rdf:Statement
, rdf:subject
, rdf:predicate
, rdf:object
are used for reification (see below).
rdfs:subClassOf
rdfs:subPropertyOf
rdfs:domain
rdfs:range
rdfs:label
rdfs:comment
rdfs:member
rdfs:seeAlso
rdfs:isDefinedBy
This vocabulary is used as a foundation for RDF Schema, where it is extended.
Filename extension | .ttl |
---|---|
Internet media type | text/turtle [23] |
Developed by | World Wide Web Consortium |
Standard | RDF 1.1 Turtle: Terse RDF Triple Language January 9, 2014 |
Open format? | Yes |
Filename extension | .trig |
---|---|
Internet media type | application/trig [24] |
Developed by | World Wide Web Consortium |
Standard | RDF 1.1 TriG: RDF Dataset Language February 25, 2014 |
Open format? | Yes |
Filename extension | .rdf |
---|---|
Internet media type | application/rdf+xml [25] |
Developed by | World Wide Web Consortium |
Standard | Concepts and Abstract Syntax February 10, 2004 |
Open format? | Yes |
Several common serialization formats are in use, including:
RDF/XML is sometimes misleadingly called simply RDF because it was introduced among the other W3C specifications defining RDF and it was historically the first W3C standard RDF serialization format. However, it is important to distinguish the RDF/XML format from the abstract RDF model itself. Although the RDF/XML format is still in use, other RDF serializations are now preferred by many RDF users, both because they are more human-friendly, [34] and because some RDF graphs are not representable in RDF/XML due to restrictions on the syntax of XML QNames.
With a little effort, virtually any arbitrary XML may also be interpreted as RDF using GRDDL (pronounced 'griddle'), Gleaning Resource Descriptions from Dialects of Languages.
RDF triples may be stored in a type of database called a triplestore.
The subject of an RDF statement is either a uniform resource identifier (URI) or a blank node, both of which denote resources. Resources indicated by blank nodes are called anonymous resources. They are not directly identifiable from the RDF statement. The predicate is a URI which also indicates a resource, representing a relationship. The object is a URI, blank node or a Unicode string literal. As of RDF 1.1 resources are identified by Internationalized Resource Identifiers (IRIs); IRI are a generalization of URI. [35]
In Semantic Web applications, and in relatively popular applications of RDF like RSS and FOAF (Friend of a Friend), resources tend to be represented by URIs that intentionally denote, and can be used to access, actual data on the World Wide Web. But RDF, in general, is not limited to the description of Internet-based resources. In fact, the URI that names a resource does not have to be dereferenceable at all. For example, a URI that begins with "http:" and is used as the subject of an RDF statement does not necessarily have to represent a resource that is accessible via HTTP, nor does it need to represent a tangible, network-accessible resource — such a URI could represent absolutely anything. However, there is broad agreement that a bare URI (without a # symbol) which returns a 300-level coded response when used in an HTTP GET request should be treated as denoting the internet resource that it succeeds in accessing.
Therefore, producers and consumers of RDF statements must agree on the semantics of resource identifiers. Such agreement is not inherent to RDF itself, although there are some controlled vocabularies in common use, such as Dublin Core Metadata, which is partially mapped to a URI space for use in RDF. The intent of publishing RDF-based ontologies on the Web is often to establish, or circumscribe, the intended meanings of the resource identifiers used to express data in RDF. For example, the URI:
http://www.w3.org/TR/2004/REC-owl-guide-20040210/wine#Merlot
is intended by its owners to refer to the class of all Merlot red wines by vintner (i.e., instances of the above URI each represent the class of all wine produced by a single vintner), a definition which is expressed by the OWL ontology — itself an RDF document — in which it occurs. Without careful analysis of the definition, one might erroneously conclude that an instance of the above URI was something physical, instead of a type of wine.
Note that this is not a 'bare' resource identifier, but is rather a URI reference, containing the '#' character and ending with a fragment identifier.
The body of knowledge modeled by a collection of statements may be subjected to reification, in which each statement (that is each triple subject-predicate-object altogether) is assigned a URI and treated as a resource about which additional statements can be made, as in "Jane says that John is the author of document X". Reification is sometimes important in order to deduce a level of confidence or degree of usefulness for each statement.
In a reified RDF database, each original statement, being a resource, itself, most likely has at least three additional statements made about it: one to assert that its subject is some resource, one to assert that its predicate is some resource, and one to assert that its object is some resource or literal. More statements about the original statement may also exist, depending on the application's needs.
Borrowing from concepts available in logic (and as illustrated in graphical notations such as conceptual graphs and topic maps), some RDF model implementations acknowledge that it is sometimes useful to group statements according to different criteria, called situations, contexts, or scopes, as discussed in articles by RDF specification co-editor Graham Klyne. [36] [37] For example, a statement can be associated with a context, named by a URI, in order to assert an "is true in" relationship. As another example, it is sometimes convenient to group statements by their source, which can be identified by a URI, such as the URI of a particular RDF/XML document. Then, when updates are made to the source, corresponding statements can be changed in the model, as well.
Implementation of scopes does not necessarily require fully reified statements. Some implementations allow a single scope identifier to be associated with a statement that has not been assigned a URI, itself. [38] [39] Likewise named graphs in which a set of triples is named by a URI can represent context without the need to reify the triples. [40]
The predominant query language for RDF graphs is SPARQL. SPARQL is an SQL-like language, and a recommendation of the W3C as of January 15, 2008.
The following is an example of a SPARQL query to show country capitals in Africa, using a fictional ontology:
PREFIXex:<http://example.com/exampleOntology#>SELECT?capital?countryWHERE{?xex:cityname?capital;ex:isCapitalOf?y.?yex:countryname?country;ex:isInContinentex:Africa.}
Other non-standard ways to query RDF graphs include:
SHACL Advanced Features specification [42] (W3C Working Group Note), the most recent version of which is maintained by the SHACL Community Group defines support for SHACL Rules, used for data transformations, inferences and mappings of RDF based on SHACL shapes.
The predominant language for describing and validating RDF graphs is SHACL (Shapes Constraint Language). [43] SHACL specification is divided in two parts: SHACL Core and SHACL-SPARQL. SHACL Core consists of a list of built-in constraints such as cardinality, range of values and many others. SHACL-SPARQL describes SPARQL-based constraints and an extension mechanism to declare new constraint components.
Other non-standard ways to describe and validate RDF graphs include:
The following example is taken from the W3C website [47] describing a resource with statements "there is a Person identified by http://www.w3.org/People/EM/contact#me, whose name is Eric Miller, whose email address is e.miller123(at)example (changed for security purposes), and whose title is Dr."
The resource "http://www.w3.org/People/EM/contact#me" is the subject.
The objects are:
The subject is a URI.
The predicates also have URIs. For example, the URI for each predicate:
In addition, the subject has a type (with URI http://www.w3.org/1999/02/22-rdf-syntax-ns#type), which is person (with URI http://www.w3.org/2000/10/swap/pim/contact#Person).
Therefore, the following "subject, predicate, object" RDF triples can be expressed:
In standard N-Triples format, this RDF can be written as:
<http://www.w3.org/People/EM/contact#me><http://www.w3.org/2000/10/swap/pim/contact#fullName>"Eric Miller".<http://www.w3.org/People/EM/contact#me><http://www.w3.org/2000/10/swap/pim/contact#mailbox><mailto:e.miller123(at)example>.<http://www.w3.org/People/EM/contact#me><http://www.w3.org/2000/10/swap/pim/contact#personalTitle>"Dr.".<http://www.w3.org/People/EM/contact#me><http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/2000/10/swap/pim/contact#Person>.
Equivalently, it can be written in standard Turtle (syntax) format as:
@prefixeric:<http://www.w3.org/People/EM/contact#>.@prefixcontact:<http://www.w3.org/2000/10/swap/pim/contact#>.@prefixrdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>.eric:mecontact:fullName"Eric Miller".eric:mecontact:mailbox<mailto:e.miller123(at)example>.eric:mecontact:personalTitle"Dr.".eric:merdf:typecontact:Person.
Or, it can be written in RDF/XML format as:
<?xml version="1.0" encoding="utf-8"?><rdf:RDFxmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#"xmlns:eric="http://www.w3.org/People/EM/contact#"xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Descriptionrdf:about="http://www.w3.org/People/EM/contact#me"><contact:fullName>EricMiller</contact:fullName></rdf:Description><rdf:Descriptionrdf:about="http://www.w3.org/People/EM/contact#me"><contact:mailboxrdf:resource="mailto:e.miller123(at)example"/></rdf:Description><rdf:Descriptionrdf:about="http://www.w3.org/People/EM/contact#me"><contact:personalTitle>Dr.</contact:personalTitle></rdf:Description><rdf:Descriptionrdf:about="http://www.w3.org/People/EM/contact#me"><rdf:typerdf:resource="http://www.w3.org/2000/10/swap/pim/contact#Person"/></rdf:Description></rdf:RDF>
Certain concepts in RDF are taken from logic and linguistics, where subject-predicate and subject-predicate-object structures have meanings similar to, yet distinct from, the uses of those terms in RDF. This example demonstrates:
In the English language statement 'New York has the postal abbreviation NY' , 'New York' would be the subject, 'has the postal abbreviation' the predicate and 'NY' the object.
Encoded as an RDF triple, the subject and predicate would have to be resources named by URIs. The object could be a resource or literal element. For example, in the N-Triples form of RDF, the statement might look like:
<urn:x-states:New%20York><http://purl.org/dc/terms/alternative>"NY".
In this example, "urn:x-states:New%20York" is the URI for a resource that denotes the US state New York, "http://purl.org/dc/terms/alternative" is the URI for a predicate (whose human-readable definition can be found here [48] ), and "NY" is a literal string. Note that the URIs chosen here are not standard, and do not need to be, as long as their meaning is known to whatever is reading them.
In a like manner, given that "https://en.wikipedia.org/wiki/Tony_Benn" identifies a particular resource (regardless of whether that URI could be traversed as a hyperlink, or whether the resource is actually the Wikipedia article about Tony Benn), to say that the title of this resource is "Tony Benn" and its publisher is "Wikipedia" would be two assertions that could be expressed as valid RDF statements. In the N-Triples form of RDF, these statements might look like the following:
<https://en.wikipedia.org/wiki/Tony_Benn><http://purl.org/dc/elements/1.1/title>"Tony Benn".<https://en.wikipedia.org/wiki/Tony_Benn><http://purl.org/dc/elements/1.1/publisher>"Wikipedia".
To an English-speaking person, the same information could be represented simply as:
The title of this resource, which is published by Wikipedia, is 'Tony Benn'
However, RDF puts the information in a formal way that a machine can understand. The purpose of RDF is to provide an encoding and interpretation mechanism so that resources can be described in a way that particular software can understand it; in other words, so that software can access and use information that it otherwise could not use.
Both versions of the statements above are wordy because one requirement for an RDF resource (as a subject or a predicate) is that it be unique. The subject resource must be unique in an attempt to pinpoint the exact resource being described. The predicate needs to be unique in order to reduce the chance that the idea of Title or Publisher will be ambiguous to software working with the description. If the software recognizes http://purl.org/dc/elements/1.1/title (a specific definition for the concept of a title established by the Dublin Core Metadata Initiative), it will also know that this title is different from a land title or an honorary title or just the letters t-i-t-l-e put together.
The following example, written in Turtle, shows how such simple claims can be elaborated on, by combining multiple RDF vocabularies. Here, we note that the primary topic of the Wikipedia page is a "Person" whose name is "Tony Benn":
@prefixrdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>.@prefixfoaf:<http://xmlns.com/foaf/0.1/>.@prefixdc:<http://purl.org/dc/elements/1.1/>.<https://en.wikipedia.org/wiki/Tony_Benn>dc:publisher"Wikipedia";dc:title"Tony Benn";foaf:primaryTopic[afoaf:Person;foaf:name"Tony Benn"].
Some uses of RDF include research into social networking. It will also help people in business fields understand better their relationships with members of industries that could be of use for product placement. [57] It will also help scientists understand how people are connected to one another.
RDF is being used to gain a better understanding of road traffic patterns. This is because the information regarding traffic patterns is on different websites, and RDF is used to integrate information from different sources on the web. Before, the common methodology was using keyword searching, but this method is problematic because it does not consider synonyms. This is why ontologies are useful in this situation. But one of the issues that comes up when trying to efficiently study traffic is that to fully understand traffic, concepts related to people, streets, and roads must be well understood. Since these are human concepts, they require the addition of fuzzy logic. This is because values that are useful when describing roads, like slipperiness, are not precise concepts and cannot be measured. This would imply that the best solution would incorporate both fuzzy logic and ontology. [58]
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.
XML Linking Language, or XLink, is an XML markup language and W3C specification that provides methods for creating internal and external links within XML documents, and associating metadata with those links.
RDF Schema (Resource Description Framework Schema, variously abbreviated as RDFS, RDF(S), RDF-S, or RDF/S) is a set of classes with certain properties using the RDF extensible knowledge representation data model, providing basic elements for the description of ontologies. It uses various forms of RDF vocabularies, intended to structure RDF resources. RDF and RDFS can be saved in a triplestore, then one can extract some knowledge from them using a query language, like SPARQL.
SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.
FOAF is a machine-readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe themselves. FOAF allows groups of people to describe social networks without the need for a centralised database.
A web resource is any identifiable resource present on or connected to the World Wide Web. Resources are identified using Uniform Resource Identifiers (URIs). In the Semantic Web, web resources and their semantic properties are described using the Resource Description Framework (RDF).
GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages. It is a W3C Recommendation, and enables users to obtain RDF triples out of XML documents, including XHTML. The GRDDL specification shows examples using XSLT, however it was intended to be abstract enough to allow for other implementations as well. It became a Recommendation on September 11, 2007.
RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information. This library contains parsers/serializers for almost all of the known RDF serializations, such as RDF/XML, Turtle, N-Triples, & JSON-LD, many of which are now supported in their updated form. The library also contains both in-memory and persistent Graph back-ends for storing RDF information and numerous convenience functions for declaring graph namespaces, lodging SPARQL queries and so on. It is in continuous development with the most recent stable release, rdflib 6.1.1 having been released on 20 December 2021. It was originally created by Daniel Krech with the first release in November, 2002.
In computing, Terse RDF Triple Language (Turtle) is a syntax and file format for expressing data in the Resource Description Framework (RDF) data model. Turtle syntax is similar to that of SPARQL, an RDF query language. It is a common data format for storing RDF data, along with N-Triples, JSON-LD and RDF/XML.
An RDF query language is a computer language, specifically a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.
In RDF, a blank node is a node in an RDF graph representing a resource for which a URI or literal is not given. The resource represented by a blank node is also called an anonymous resource. According to the RDF standard a blank node can only be used as subject or object of an RDF triple.
In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database.
N-Triples is a format for storing and transmitting data. It is a line-based, plain text serialisation format for RDF graphs, and a subset of the Turtle format. N-Triples should not be confused with Notation3 which is a superset of Turtle. N-Triples was primarily developed by Dave Beckett at the University of Bristol and Art Barstow at the World Wide Web Consortium (W3C).
A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject–predicate–object, like "Bob is 35" or "Bob knows Fred".
Named graphs are a key concept of Semantic Web architecture in which a set of Resource Description Framework statements are identified using a URI, allowing descriptions to be made of that set of statements such as context, provenance information or other such metadata.
XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.
A semantic triple, or RDF triple or simply triple, is the atomic data entity in the Resource Description Framework (RDF) data model. As its name indicates, a triple is a sequence of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions.
Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) standard language for describing Resource Description Framework (RDF) graphs. SHACL has been designed to enhance the semantic and technical interoperability layers of ontologies expressed as RDF graphs.