Cypher (query language)

Last updated

Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph. [1]

Contents

Cypher was largely an invention of Andrés Taylor while working for Neo4j, Inc. (formerly Neo Technology) in 2011. [2] Cypher was originally intended to be used with the graph database Neo4j, but was opened up through the openCypher project in October 2015. [3]

The language was designed with the power and capability of SQL (standard query language for the relational database model) in mind, but Cypher was based on the components and needs of a database built upon the concepts of graph theory. In a graph model, data is structured as nodes (vertices in math and network science) and relationships (edges in math and network science) to focus on how entities in the data are connected and related to one another.

Graph model

Cypher is based on the Property Graph Model, which organizes data into nodes and edges (called “relationships” in Cypher). In addition to those standard graph elements of nodes and relationships, the property graph model adds labels and properties for describing finer categories and attributes of the data.

Nodes are the entities in the graph. They can hold any number of attributes (key-value pairs) called properties. Nodes can be tagged with zero or more labels (like tags or categories), representing their different roles in a domain. Relationships provide directed, named, semantically-relevant connections between two node entities. A relationship always has a direction, a start node, an end node, and exactly one relationship type. Like nodes, relationships can also have properties. [4]

Labels can group similar nodes together by assigning zero or more node labels. Labels are kind of like tags and allow you to specify certain types of entities to look for or create. Properties are key-value pairs with a binding of a string key and some value from the Cypher type system. Cypher queries are assembled with patterns of nodes and relationships with any specified filtering on labels and properties to create, read, update, delete data found in the specified pattern.

Type system

The Cypher type system includes many of the common types used in other programming and query languages. Supported types include scalar value types such as boolean, string, number, integer, and floating-point numbers. It also supports temporal types like datetime, localdatetime, date, time, localtime, and duration. Container types for maps and lists are available, along with graph types for node, relationship, and path, and a void type. [5]

Syntax

The Cypher query language depicts patterns of nodes and relationships and filters those patterns based on labels and properties. Cypher’s syntax is based on ASCII art, which is text-based visual art for computers. This makes the language very visual and easy to read because it both visually and structurally represents the data specified in the query. For instance, nodes are represented with parentheses around the attributes and information regarding the entity. Relationships are depicted with an arrow (either directed or undirected) with the relationship type in brackets.

//node(variable:Label{propertyKey:'propertyValue'})//relationship-[variable:RELATIONSHIP_TYPE]->//Cypher pattern(node1:LabelA)-[rel1:RELATIONSHIP_TYPE]->(node2:LabelB)

Keywords

Similar to other query languages, Cypher contains a variety of keywords for specifying patterns, filtering patterns, and returning results. Among those most common are: MATCH, WHERE, and RETURN. These operate slightly differently than the SELECT and WHERE in SQL; however, they have similar purposes.

MATCH is used before describing the search pattern for finding nodes, relationships, or combinations of nodes and relationships together. [6] WHERE in Cypher is used to add additional constraints to patterns and filter out any unwanted patterns. [7] Cypher’s RETURN formats and organizes how the results should be outputted. Just as with other query languages, you can return the results with specific properties, lists, ordering, and more. [8]

Using the keywords with the pattern syntax shown above, the example query below will search for the pattern of the node (Actor label and property called name with value of 'Nicole Kidman') connected by a relationship (ACTED_IN type and outgoing direction away from the first node) to another node (Movie label). The WHERE clause then filters to only keep patterns where the Movie node in the match clause has a year property that is less than the value of the parameter passed in. In the return, the query specifies to output the movie nodes that fit the pattern and filtering from the match and where clauses.

MATCH(nicole:Actor{name:'Nicole Kidman'})-[:ACTED_IN]->(movie:Movie)WHEREmovie.year<$yearParameterRETURNmovie

Cypher also contains keywords to specify clauses for writing, updating, and deleting data. CREATE and DELETE are used to create and delete nodes and relationships. SET and REMOVE are used to set values to properties and add labels on nodes. MERGE is used to create nodes uniquely without duplicates. Nodes can only be deleted when they have no other relationships still existing. For example: [9]

MATCH(startContent:Content)-[relationship:IS_RELATED_TO]->(endContent:Content)WHEREendContent.source='user'OPTIONALMATCH(endContent)-[r]-()DELETErelationship,endContent

Standardization

With the openCypher project, an effort began to standardize Cypher as the query language for graph processing. As part of this process there have been five face-to-face openCypher Implementers Meetings (oCIMs). The first meeting took place in February 2017 at SAP's headquarters in Walldorf in Germany, coincident with a meeting of the Linked Data Benchmark Council. The most recent OCIM took place in Berlin, [10] coincident with the W3C Workshop on Web Standards for Graph Data Management, in March 2019. [11]

At that meeting, there was a consensus to work towards Cypher becoming a significant input into a wider project for an international standardized Graph Query Language called GQL. In September 2019, a proposal for a GQL standard project was approved by a vote of national standards bodies which are members of ISO/IEC Joint Technical Committee 1 (responsible for information technology standards). [12] The GQL project proposal states the following:

Using graph as a fundamental representation for data modeling is an emerging approach in data management. In this approach, the data set is modeled as a graph, representing each data entity as a vertex (also called a node) of the graph and each relationship between two entities as an edge between corresponding vertices. The graph data model has been drawing attention for its unique advantages. Firstly, the graph model can be a natural fit for data sets that have hierarchical, complex, or even arbitrary structures. Such structures can be easily encoded into the graph model as edges. This can be more convenient than the relational model, which requires the normalization of the data set into a set of tables with fixed row types. Secondly, the graph model enables efficient execution of expensive queries or data analytic functions that need to observe multi-hop relationships among data entities, such as reachability queries, shortest or cheapest path queries, or centrality analysis. There are two graph models in current use: the Resource Description Framework (RDF) model and the Property Graph model. The RDF model has been standardized by W3C in a number of specifications. The Property Graph model, on the other hand, has a multitude of implementations in graph databases, graph algorithms, and graph processing facilities. However, a common, standardized query language for property graphs (like SQL for relational database systems) is missing. GQL is proposed to fill this void. [13]

As of 2024, the GQL Standard has been published as the standard graph query language under ISO/IEC 39075:2024. [14] The first open-source implementation of a subset of the language is already available. [15] [16] Aside from the implementation, one can also find a formalization and read the syntax of the specific subset of GQL. [17]

See also

Related Research Articles

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

<span class="mw-page-title-main">Topic map</span> Knowledge organization system

A topic map is a standard for the representation and interchange of knowledge, with an emphasis on the findability of information. Topic maps were originally developed in the late 1990s as a way to represent back-of-the-book index structures so that multiple indexes from different sources could be merged. However, the developers quickly realized that with a little additional generalization, they could create a meta-model with potentially far wider application. The ISO/IEC standard is formally known as ISO/IEC 13250:2003.

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve information. A well known example is the Structured Query Language (SQL).

The SQL SELECT statement returns a result set of rows, from one or more tables.

An XML database is a data persistence software system that allows data to be specified, and sometimes stored, in XML format. This data can be queried, transformed, exported and returned to a calling system. XML databases are a flavor of document-oriented databases which are in turn a category of NoSQL database.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

Oracle Spatial and Graph, formerly Oracle Spatial, is a free option component of the Oracle Database. The spatial features in Oracle Spatial and Graph aid users in managing geographic and location-data in a native type within an Oracle database, potentially supporting a wide range of applications — from automated mapping, facilities management, and geographic information systems (AM/FM/GIS), to wireless location services and location-enabled e-business. The graph features in Oracle Spatial and Graph include Oracle Network Data Model (NDM) graphs used in traditional network applications in major transportation, telcos, utilities and energy organizations and RDF semantic graphs used in social networks and social interactions and in linking disparate data sets to address requirements from the research, health sciences, finance, media and intelligence communities.

Entity Framework (EF) is an open source object–relational mapping (ORM) framework for ADO.NET. It was originally shipped as an integral part of .NET Framework, however starting with Entity Framework version 6.0 it has been delivered separately from the .NET Framework.

A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject–predicate–object, like "Bob is 35" or "Bob knows Fred".

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

<span class="mw-page-title-main">Neo4j</span> Graph database implemented in Java

Neo4j is a graph database management system (GDBMS) developed by Neo4j, Inc.

In computing, Open Data Protocol (OData) is an open protocol that allows the creation and consumption of queryable and interoperable Web service APIs in a standard way. Microsoft initiated OData in 2007. Versions 1.0, 2.0, and 3.0 are released under the Microsoft Open Specification Promise. Version 4.0 was standardized at OASIS, with a release in March 2014. In April 2015 OASIS submitted OData v4 and OData JSON Format v4 to ISO/IEC JTC 1 for approval as an international standard. In December 2016, ISO/IEC published OData 4.0 Core as ISO/IEC 20802-1:2016 and the OData JSON Format as ISO/IEC 20802-2:2016.

Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) standard language for describing Resource Description Framework (RDF) graphs. SHACL has been designed to enhance the semantic and technical interoperability layers of ontologies expressed as RDF graphs.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

GQL is a standard graph query language published 2024-04-12 as ISO/IEC 39075:2024.

SQL:2023 or ISO/IEC 9075:2023 is the ninth edition of the ISO (1987) and ANSI (1986) standard for the SQL database query language. It was formally adopted in June 2023.

References

  1. "Cypher Introduction". Neo4j. Retrieved 2019-11-08.
  2. "Cypher: An Evolving Query Language for Property Graphs" (PDF). Proceedings of the 2018 International Conference on Management of Data. ACM. Retrieved 2018-06-27.
  3. "Meet openCypher: The SQL for Graphs - Neo4j Graph Database". Neo4j Graph Database. 2015-10-21. Retrieved 2019-11-08.
  4. "Property Graph Model". GitHub. Retrieved 2019-11-08.
  5. "Cypher Type System". GitHub. Retrieved 2019-11-08.
  6. "Cypher manual - MATCH clause". Neo4j. Retrieved 2019-11-08.
  7. "Cypher manual - WHERE clause". Neo4j. Retrieved 2019-11-08.
  8. "Cypher manual - RETURN clause". Neo4j. Retrieved 2019-11-08.
  9. "Cypher manual clauses". Neo4j. Retrieved 2019-11-08.
  10. "Events · openCypher".
  11. "W3C Workshop on Web Standardization for Graph Data. Creating Bridges: RDF, Property Graph and SQL". W3C. Retrieved September 29, 2019.
  12. "ISO/IEC WD 39075 Information Technology — Database Languages — GQL". ISO. Retrieved September 29, 2019.
  13. "ISO/IEC JTC 1/SC 32 N 3007 - ISO/IEC NP 39075 Information Technology -- Database Languages -- GQL". British Standards Institute. Retrieved September 29, 2019.
  14. https://www.iso.org/standard/76120.html
  15. "GQL Parser". GitHub . Retrieved January 18, 2021.
  16. "First GQL research implementation from Olof Morra at TU Eindhoven!". Alastair Green. Retrieved January 18, 2021.
  17. "A Semantics of GQL; a New Query Language for Property Graphs Formalized" (PDF). Olof Morra. Retrieved January 18, 2021.