Graph rewriting

Last updated

In computer science, graph transformation, or graph rewriting, concerns the technique of creating a new graph out of an original graph algorithmically. It has numerous applications, ranging from software engineering (software construction and also software verification) to layout algorithms and picture generation.

Contents

Graph transformations can be used as a computation abstraction. The basic idea is that if the state of a computation can be represented as a graph, further steps in that computation can then be represented as transformation rules on that graph. Such rules consist of an original graph, which is to be matched to a subgraph in the complete state, and a replacing graph, which will replace the matched subgraph.

Formally, a graph rewriting system usually consists of a set of graph rewrite rules of the form , with being called pattern graph (or left-hand side) and being called replacement graph (or right-hand side of the rule). A graph rewrite rule is applied to the host graph by searching for an occurrence of the pattern graph (pattern matching, thus solving the subgraph isomorphism problem) and by replacing the found occurrence by an instance of the replacement graph. Rewrite rules can be further regulated in the case of labeled graphs, such as in string-regulated graph grammars.

Sometimes graph grammar is used as a synonym for graph rewriting system, especially in the context of formal languages; the different wording is used to emphasize the goal of constructions, like the enumeration of all graphs from some starting graph, i.e. the generation of a graph language – instead of simply transforming a given state (host graph) into a new state.

Graph rewriting approaches

Top: Example graph rewrite rule (optimization from compiler construction: multiplication with 2 replaced by addition). Bottom: Application of the rule to optimize "y=x*2" into "y=x+x". GraphRewriteExample color.png
Top: Example graph rewrite rule (optimization from compiler construction: multiplication with 2 replaced by addition). Bottom: Application of the rule to optimize "y=x*2" into "y=x+x".

Algebraic approach

The algebraic approach to graph rewriting is based upon category theory. The algebraic approach is further divided into sub-approaches, the most common of which are the double-pushout (DPO) approach and the single-pushout (SPO) approach . Other sub-approaches include the sesqui-pushout and the pullback approach.

From the perspective of the DPO approach a graph rewriting rule is a pair of morphisms in the category of graphs and graph homomorphisms between them: , also written , where is injective. The graph K is called invariant or sometimes the gluing graph. A rewriting step or application of a rule r to a host graph G is defined by two pushout diagrams both originating in the same morphism , where D is a context graph (this is where the name double-pushout comes from). Another graph morphism models an occurrence of L in G and is called a match . Practical understanding of this is that is a subgraph that is matched from (see subgraph isomorphism problem), and after a match is found, is replaced with in host graph where serves as an interface, containing the nodes and edges which are preserved when applying the rule. The graph is needed to attach the pattern being matched to its context: if it is empty, the match can only designate a whole connected component of the graph .

In contrast a graph rewriting rule of the SPO approach is a single morphism in the category of labeled multigraphs and partial mappings that preserve the multigraph structure: . Thus a rewriting step is defined by a single pushout diagram. Practical understanding of this is similar to the DPO approach. The difference is, that there is no interface between the host graph G and the graph G' being the result of the rewriting step.

From the practical perspective, the key distinction between DPO and SPO is how they deal with the deletion of nodes with adjacent edges, in particular, how they avoid that such deletions may leave behind "dangling edges". The DPO approach only deletes a node when the rule specifies the deletion of all adjacent edges as well (this dangling condition can be checked for a given match), whereas the SPO approach simply disposes the adjacent edges, without requiring an explicit specification.

There is also another algebraic-like approach to graph rewriting, based mainly on Boolean algebra and an algebra of matrices, called matrix graph grammars. [1]

Determinate graph rewriting

Yet another approach to graph rewriting, known as determinate graph rewriting, came out of logic and database theory. [2] In this approach, graphs are treated as database instances, and rewriting operations as a mechanism for defining queries and views; therefore, all rewriting is required to yield unique results (up to isomorphism), and this is achieved by applying any rewriting rule concurrently throughout the graph, wherever it applies, in such a way that the result is indeed uniquely defined.

Term graph rewriting

Another approach to graph rewriting is term graph rewriting, which involves the processing or transformation of term graphs (also known as abstract semantic graphs) by a set of syntactic rewrite rules.

Term graphs are a prominent topic in programming language research since term graph rewriting rules are capable of formally expressing a compiler's operational semantics. Term graphs are also used as abstract machines capable of modelling chemical and biological computations as well as graphical calculi such as concurrency models. Term graphs can perform automated verification and logical programming since they are well-suited to representing quantified statements in first order logic. Symbolic programming software is another application for term graphs, which are capable of representing and performing computation with abstract algebraic structures such as groups, fields and rings.

The TERMGRAPH conference [3] focuses entirely on research into term graph rewriting and its applications.

Classes of graph grammar and graph rewriting system

Graph rewriting systems naturally group into classes according to the kind of representation of graphs that are used and how the rewrites are expressed. The term graph grammar, otherwise equivalent to graph rewriting system or graph replacement system, is most often used in classifications. Some common types are:

Implementations and applications

Graphs are an expressive, visual and mathematically precise formalism for modelling of objects (entities) linked by relations; objects are represented by nodes and relations between them by edges. Nodes and edges are commonly typed and attributed. Computations are described in this model by changes in the relations between the entities or by attribute changes of the graph elements. They are encoded in graph rewrite/graph transformation rules and executed by graph rewrite systems/graph transformation tools.

See also

Related Research Articles

<span class="mw-page-title-main">Chomsky hierarchy</span> Hierarchy of classes of formal grammars

The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a language's vocabulary that are valid according to the language's syntax. Linguist Noam Chomsky theorized that four different classes of formal grammars existed that could generate increasingly complex languages. Each class can also completely generate the language of all inferior classes.

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars, in the sense that there are languages that can be described by a CSG but not by a context-free grammar. Context-sensitive grammars are less general than unrestricted grammars. Thus, CSGs are positioned between context-free and unrestricted grammars in the Chomsky hierarchy.

<span class="mw-page-title-main">Context-free grammar</span> Type of formal grammar

In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules can be applied to a nonterminal symbol regardless of its context. In particular, in a context-free grammar, each production rule is of the form

<span class="mw-page-title-main">Graph theory</span> Area of discrete mathematics

In mathematics, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices which are connected by edges. A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.

<span class="mw-page-title-main">L-system</span> Rewriting system and type of formal grammar

An L-system or Lindenmayer system is a parallel rewriting system and a type of formal grammar. An L-system consists of an alphabet of symbols that can be used to make strings, a collection of production rules that expand each symbol into some larger string of symbols, an initial "axiom" string from which to begin construction, and a mechanism for translating the generated strings into geometric structures. L-systems were introduced and developed in 1968 by Aristid Lindenmayer, a Hungarian theoretical biologist and botanist at the University of Utrecht. Lindenmayer used L-systems to describe the behaviour of plant cells and to model the growth processes of plant development. L-systems have also been used to model the morphology of a variety of organisms and can be used to generate self-similar fractals.

<span class="mw-page-title-main">Petri net</span> One of several mathematical modeling systems for the description of distributed systems

A Petri net, also known as a place/transition (PT) net, is one of several mathematical modeling languages for the description of distributed systems. It is a class of discrete event dynamic system. A Petri net is a directed bipartite graph that has two types of elements: places and transitions. Place elements are depicted as white circles and transition elements are depicted as rectangles. A place can contain any number of tokens, depicted as black circles. A transition is enabled if all places connected to it as inputs contain at least one token. Some sources state that Petri nets were invented in August 1939 by Carl Adam Petri—at the age of 13—for the purpose of describing chemical processes.

In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems. In their most basic form, they consist of a set of objects, plus relations on how to transform those objects.

In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H. Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete. However certain other cases of subgraph isomorphism may be solved in polynomial time.

In category theory, a branch of mathematics, a pushout is the colimit of a diagram consisting of two morphisms f : ZX and g : ZY with a common domain. The pushout consists of an object P along with two morphisms XP and YP that complete a commutative square with the two given morphisms f and g. In fact, the defining universal property of the pushout essentially says that the pushout is the "most general" way to complete this commutative square. Common notations for the pushout are and .

In computer science, an abstract semantic graph (ASG) or term graph is a form of abstract syntax in which an expression of a formal or programming language is represented by a graph whose vertices are the expression's subterms. An ASG is at a higher level of abstraction than an abstract syntax tree, which is used to express the syntactic structure of an expression or program.

In theoretical computer science and mathematical logic a string rewriting system (SRS), historically called a semi-Thue system, is a rewriting system over strings from a alphabet. Given a binary relation between fixed strings over the alphabet, called rewrite rules, denoted by , an SRS extends the rewriting relation to all strings in which the left- and right-hand side of the rules appear as substrings, that is , where , , , and are strings.

Regulated rewriting is a specific area of formal languages studying grammatical systems which are able to take some kind of control over the production applied in a derivation step. For this reason, the grammatical systems studied in Regulated Rewriting theory are also called "Grammars with Controlled Derivations". Among such grammars can be noticed:

A Post canonical system, also known as a Post production system, as created by Emil Post, is a string-manipulation system that starts with finitely-many strings and repeatedly transforms them by applying a finite set j of specified rules of a certain form, thus generating a formal language. Today they are mainly of historical relevance because every Post canonical system can be reduced to a string rewriting system, which is a simpler formulation. Both formalisms are Turing complete.

<span class="mw-page-title-main">Formal grammar</span> Structure of a formal language

A formal grammar describes how to form strings from an alphabet of a formal language that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form. A formal grammar is defined as a set of production rules for such strings in a formal language.

Controlled grammars are a class of grammars that extend, usually, the context-free grammars with additional controls on the derivations of a sentence in the language. A number of different kinds of controlled grammars exist, the four main divisions being Indexed grammars, grammars with prescribed derivation sequences, grammars with contextual conditions on rule application, and grammars with parallelism in rule application. Because indexed grammars are so well established in the field, this article will address only the latter three kinds of controlled grammars.

In computer science, double pushout graph rewriting refers to a mathematical framework for graph rewriting. It was introduced as one of the first algebraic approaches to graph rewriting in the article "Graph-grammars: An algebraic approach" (1973). It has since been generalized to allow rewriting structures which are not graphs, and to handle negative application conditions, among other extensions.

A term graph is a representation of an expression in a formal language as a generalized graph whose vertices are terms. Term graphs are a more powerful form of representation than expression trees because they can represent not only common subexpressions but also cyclic/recursive subexpressions.

In computer science, a single pushout graph rewriting or SPO graph rewriting refers to a mathematical framework for graph rewriting, and is used in contrast to the double-pushout approach of graph rewriting.

In computer science, an attributed graph grammar is a class of graph grammar that associates vertices with a set of attributes and rewrites with functions on attributes. In the algebraic approach to graph grammars, they are usually formulated using the double-pushout approach or the single-pushout approach.

Hartmut Ehrig was a German computer scientist and professor of theoretical computer science and formal specification. He was a pioneer in algebraic specification of abstract data types, and in graph grammars.

References

Citations

  1. Perez 2009 covers this approach in detail.
  2. "A Graph-Oriented Object Model for Database End-User Interfaces" (PDF).
  3. "TERMGRAPH".

Sources