This article is written like a manual or guidebook.(July 2021) |
Developer(s) | Daniel Krech (creator), Gunnar Grimnes, Joern Hees (past maintainers), Nicholas J. Car (maintainer) |
---|---|
Initial release | June 4, 2002 |
Stable release | 6.2.0 / July 16, 2022 [1] |
Repository | |
Written in | Python |
Operating system | Cross-platform |
Type | Library |
License | BSD |
Website | rdflib |
RDFLib is a Python library for working with RDF, [2] a simple yet powerful language for representing information. This library contains parsers/serializers for almost all of the known RDF serializations, such as RDF/XML, Turtle, N-Triples, & JSON-LD, many of which are now supported in their updated form (e.g. Turtle 1.1). The library also contains both in-memory and persistent Graph back-ends for storing RDF information and numerous convenience functions for declaring graph namespaces, lodging SPARQL [3] queries and so on. It is in continuous development with the most recent stable release, rdflib 6.1.1 having been released on 20 December 2021. It was originally created by Daniel Krech with the first release in November, 2002.
A number of other Python projects use rdflib for RDF manipulation, including:
This section is empty. You can help by adding to it. (July 2010) |
RDFLib's use of various Python idioms mean it is fairly simple for programmers with only junior Python skills to manipulate RDF. On the other hand, the Python idioms are simple enough that someone familiar with RDF, but not Python, can probably work out how to use rdflib quite easily.
The core class in RDFLib is Graph which is a Python dictionary used to store collections of RDF triples in memory. It redefines certain built-in Python object methods in order to exhibit simple graph behaviour, such as simple graph merging via addition (i.e. g3 = g1 + g2
).
RDFLib graphs emulate container types and are best thought of as a set of 3-item triples:
set([ (subject,predicate,object), (subject1,predicate1,object1), ... (subjectN,predicateN,objectN) ])
RDFLib graphs are not sorted containers; they have ordinary Python set operations, e.g. add()
methods that search triples and return them in arbitrary order.
The following RDFLib classes (listed below) model RDF terms in a graph and inherit from a common Identifier class, which extends Python unicode. Instances of these are nodes in an RDF graph.
RDFLib provides mechanisms for managing namespaces. In particular, there is a Namespace class which takes (as its only argument) the Base URI of the namespace. Fully qualified URIs in the namespace can be constructed by attribute / dictionary access on Namespace instances:
>>> fromrdflibimportNamespace>>> SDO=Namespace("https://schema.org/")>>> SDO.Personhttps://schema.org/Person>>> SDO['url']https://schema.org/url
RDFLib graphs also override __iter__ in order to support iteration over the contained triples:
forsubject,predicate,object_insomeGraph:assert(subject,predicate,object_)insomeGraph,"Iterator / Container Protocols are Broken!!"
__iadd__ and __isub__ are overridden to support adding and subtracting Graphs to/from each other (in place):
RDFLib graphs support basic triple pattern matching with a triples((subject,predicate,object)) function. This function is a generator of triples that match the pattern given by the arguments. The arguments of these are RDF terms that restrict the triples that are returned. Terms that are None are treated as a wildcard.
forsubject,predicate,object_insomeGraph.triples((None,URIRef("https://schema.org/name"),None)):print("{} has name {}".format(s,o))# prints all the triples with the predicate being https://schema.org/name
Triples can be added in two ways:
Similarly, triples can be removed by a call to remove: remove((subject, predicate, object))
RDFLib 'Literal's essentially behave like Unicode characters with an XML Schema datatype or language attribute. The class provides a mechanism to both convert Python literals (and their built-ins such as time/date/datetime) into equivalent RDF Literals and (conversely) convert Literals to their Python equivalent. There is some support of considering datatypes in comparing Literal instances, implemented as an override to __eq__. This mapping to and from Python literals is achieved with the following dictionaries:
PythonToXSD={basestring:(None,None),float:(None,XSD_NS+u'float'),int:(None,XSD_NS+u'int'),long:(None,XSD_NS+u'long'),bool:(None,XSD_NS+u'boolean'),date:(lambdai:i.isoformat(),XSD_NS+u'date'),time:(lambdai:i.isoformat(),XSD_NS+u'time'),datetime:(lambdai:i.isoformat(),XSD_NS+u'dateTime'),}
Maps Python instances to WXS datatyped Literals
XSDToPython={XSD_NS+u'time':(None,_strToTime),XSD_NS+u'date':(None,_strToDate),XSD_NS+u'dateTime':(None,_strToDateTime),XSD_NS+u'string':(None,None),XSD_NS+u'normalizedString':(None,None),XSD_NS+u'token':(None,None),XSD_NS+u'language':(None,None),XSD_NS+u'boolean':(None,lambdai:i.lower()in['1','true']),XSD_NS+u'decimal':(float,None),XSD_NS+u'integer':(long,None),XSD_NS+u'nonPositiveInteger':(int,None),XSD_NS+u'long':(long,None),XSD_NS+u'nonNegativeInteger':(int,None),XSD_NS+u'negativeInteger':(int,None),XSD_NS+u'int':(int,None),XSD_NS+u'unsignedLong':(long,None),XSD_NS+u'positiveInteger':(int,None),XSD_NS+u'short':(int,None),XSD_NS+u'unsignedInt':(long,None),XSD_NS+u'byte':(int,None),XSD_NS+u'unsignedShort':(int,None),XSD_NS+u'unsignedByte':(int,None),XSD_NS+u'float':(float,None),XSD_NS+u'double':(float,None),XSD_NS+u'base64Binary':(base64.decodestring,None),XSD_NS+u'anyURI':(None,None),}
Maps WXS datatyped Literals to Python. This mapping is used by the toPython() method defined on all Literal instances.
This section needs to be updated.(October 2019) |
RDFLIb supports a majority of the current SPARQL 1.1 specification and includes a harness for the publicly available RDF DAWG test suite. Support for SPARQL is provided by two methods:
rdflib.graph.query()
- used to pose SPARQL SELECT or ASK queries to a graph (or Store of Graphs)rdflib.graph.update()
- used to change graph content or return RDF using INSERT, DELETE and CONSTRUCT SPARQL statementsA Universal RDF Store Interface
This document attempts to summarize some fundamental components of an RDF store. The motivation is to outline a standard set of interfaces for providing the necessary support needed in order to persist an RDF Graph in a way that is universal and not tied to any specific implementation. For the most part, the core RDF model is adhered to as well as terminology that is consistent with the RDF Model specifications. However, this suggested interface also extends an RDF store with additional requirements necessary to facilitate the aspects of Notation 3 that go beyond the RDF model to provide a framework for First Order Predicate Logic processing and persistence.
<urn:uuid:conjunctive-graph-foo>rdf:type:ConjunctiveGraph<urn:uuid:conjunctive-graph-foo>rdf:typelog:Truth<urn:uuid:conjunctive-graph-foo>:persistedBy:MySQL
Chimezie said "higher-order statements are complicated"
:chimezie:said{:higherOrderStatementsrdf:type:complicated}
The following Notation 3 document:
{?xa:N3Programmer}=>{?x:has[a:Migraine]}
Could cause the following statements to be asserted in the store:
_:alog:implies_:b
This statement would be asserted in the partition associated with quoted statements (in a formula named _:a)
?xrdf:type:N3Programmer
Finally, these statements would be asserted in the same partition (in a formula named _:b)
?x:has_:c_:crdf:type:Migraine
Formulae and Variables as Terms
Formulae and variables are distinguishable from URI references, Literals, and BNodes by the following syntax:
{..}-Formula?x-Variable
They must also be distinguishable in persistence to ensure they can be round tripped. Other issues regarding the persistence of N3 terms.
An RDF store should provide standard interfaces for the management of database connections. Such interfaces are standard to most database management systems (Oracle, MySQL, Berkeley DB, Postgres, etc..) The following methods are defined to provide this capability:
The configuration string is understood by the store implementation and represents all the necessary parameters needed to locate an individual instance of a store. This could be similar to an ODBC string, or in fact be an ODBC string if the connection protocol to the underlying database is ODBC. The open function needs to fail intelligently in order to clearly express that a store (identified by the given configuration string) already exists or that there is no store (at the location specified by the configuration string) depending on the value of create.
An RDF store could provide a standard set of interfaces for the manipulation, management, and/or retrieval of its contained triples (asserted or quoted):
This function can be thought of as the primary mechanism for producing triples with nodes that match the corresponding terms and term pattern provided. A conjunctive query can be indicated by either providing a value of NULL/None/Empty string value for context or the identifier associated with the Conjunctive Graph.
These interfaces work on contexts and formulae (for stores that are formula-aware) interchangeably.
RDFLib defines the following kinds of Graphs:
A Conjunctive Graph is the most relevant collection of graphs that are considered to be the boundary for closed world assumptions. This boundary is equivalent to that of the store instance (which is itself uniquely identified and distinct from other instances of Store that signify other Conjunctive Graphs). It is equivalent to all the named graphs within it and associated with a _default_ graph which is automatically assigned a BNode for an identifier - if one isn't given.
RDFLib graphs support an additional extension of RDF semantics for formulae. For the academically inclined, Graham Klyne's 'formal' extension (see external links) is probably a good read.
Formulae are represented formally by the 'QuotedGraph' class and disjoint from regular RDF graphs in that their statements are quoted.
RDFLib provides an abstracted Store API for persistence of RDF and Notation 3. The Graph class works with instances of this API (as the first argument to its constructor) for triple-based management of an RDF store including: garbage collection, transaction management, update, pattern matching, removal, length, and database management (_open_ / _close_ / _destroy_) . Additional persistence mechanisms can be supported by implementing this API for a different store. Currently supported databases:
Store instances can be created with the plugin function:
fromrdflibimportpluginfromrdflib.storeimportStoreplugin.get('.. one of the supported Stores ..',Store)(identifier=..idofconjunctivegraph..)
There are a few high-level APIs that extend RDFLib graphs into other Pythonic idioms. For more a more explicit Python binding, there are Sparta, SuRF & FunOWL.
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats with Turtle currently being the most widely used notation.
A conceptual graph (CG) is a formalism for knowledge representation. In the first published paper on CGs, John F. Sowa used them to represent the conceptual schemas used in database systems. The first book on CGs applied them to a wide range of topics in artificial intelligence, computer science, and cognitive science.
RDF Schema is a set of classes with certain properties using the RDF extensible knowledge representation data model, providing basic elements for the description of ontologies. It uses various forms of RDF vocabularies, intended to structure RDF resources. RDF and RDFS can be saved in a triplestore, then one can extract some knowledge from them using a query language, like SPARQL.
SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.
Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.
In computing, Terse RDF Triple Language (Turtle) is a syntax and file format for expressing data in the Resource Description Framework (RDF) data model. Turtle syntax is similar to that of SPARQL, an RDF query language. It is a common data format for storing RDF data, along with N-Triples, JSON-LD and RDF/XML.
An RDF query language is a computer language, specifically a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.
A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with. The inference rules are commonly specified by means of an ontology language, and often a description logic language. Many reasoners use first-order predicate logic to perform reasoning; inference commonly proceeds by forward chaining and backward chaining. There are also examples of probabilistic reasoners, including non-axiomatic reasoning systems, and probabilistic logic networks.
SPARUL, or SPARQL/Update, was a declarative data manipulation language that extended the SPARQL 1.0 query language standard. SPARUL provided the ability to insert, delete and update RDF data held within a triple store or quad store. SPARUL was originally written by Hewlett-Packard and has been used as the foundation for the current W3C recommendation entitled SPARQL 1.1 Update. With the publication of SPARQL 1.1, SPARUL is superseded and should only be consulted as a source of inspiration for possible future refinements of SPARQL, but not for real-world applications.
N-Triples is a format for storing and transmitting data. It is a line-based, plain text serialisation format for RDF graphs, and a subset of the Turtle format. N-Triples should not be confused with Notation3 which is a superset of Turtle. N-Triples was primarily developed by Dave Beckett at the University of Bristol and Art Barstow at the World Wide Web Consortium (W3C).
A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject–predicate–object, like "Bob is 35" or "Bob knows Fred".
A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.
AllegroGraph is a closed source triplestore which is designed to store RDF triples, a standard format for Linked Data. It also operates as a document store designed for storing, retrieving and managing document-oriented information, in JSON-LD format. AllegroGraph is currently in use in commercial projects and a US Department of Defense project. It is also the storage component for the TwitLogic project that is bringing the Semantic Web to Twitter data.
Named graphs are a key concept of Semantic Web architecture in which a set of Resource Description Framework statements are identified using a URI, allowing descriptions to be made of that set of statements such as context, provenance information or other such metadata.
GeoSPARQL is a standard for representation and querying of geospatial linked data for the Semantic Web from the Open Geospatial Consortium (OGC). The definition of a small ontology based on well-understood OGC standards is intended to provide a standardized exchange basis for geospatial RDF data which can support both qualitative and quantitative spatial reasoning and querying with the SPARQL database query language.
This is a comparison of triplestores, also known as subject-predicate-object databases. Some of these database management systems have been built as database engines from scratch, while others have been built on top of existing commercial relational database engines. Like the early development of online analytical processing (OLAP) databases, this intermediate approach allowed large and powerful database engines to be constructed for little programming effort in the initial phases of triplestore development. Long-term though it seems that native triplestores will have the advantage for performance. A difficulty with implementing triplestores over SQL is that although triples may thus be stored, implementing efficient querying of a graph-based RDF model onto SQL queries is difficult.
Semantic queries allow for queries and analytics of associative and contextual nature. Semantic queries enable the retrieval of both explicitly and implicitly derived information based on syntactic, semantic and structural information contained in data. They are designed to deliver precise results or to answer more fuzzy and wide open questions through pattern matching and digital reasoning.
A semantic triple, or RDF triple or simply triple, is the atomic data entity in the Resource Description Framework (RDF) data model. As its name indicates, a triple is a set of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions.
Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) standard language for describing Resource Description Framework (RDF) graphs. SHACL has been designed to enhance the semantic and technical interoperability layers of ontologies expressed as RDF graphs.