Abstract semantic graph

Last updated

In computer science, an abstract semantic graph (ASG) or term graph is a form of abstract syntax in which an expression of a formal or programming language is represented by a graph whose vertices are the expression's subterms. An ASG is at a higher level of abstraction than an abstract syntax tree (or AST), which is used to express the syntactic structure of an expression or program.

Contents

ASGs are more complex and concise than ASTs because they may contain shared subterms (also known as "common subexpressions"). [1] Abstract semantic graphs are often used as an intermediate representation by compilers to store the results of performing common subexpression elimination upon abstract syntax trees. ASTs are trees and are thus incapable of representing shared terms. ASGs are usually directed acyclic graphs (DAG), although in some applications graphs containing cycles [ clarification needed ] may be permitted. For example, a graph containing a cycle might be used to represent the recursive expressions that are commonly used in functional programming languages as non-looping iteration constructs. The mutability of these types of graphs, is studied in the field of graph rewriting.

The nomenclature term graph is associated with the field of term graph rewriting, [2] which involves the transformation and processing of expressions by the specification of rewriting rules, [3] whereas abstract semantic graph is used when discussing linguistics, programming languages, type systems and compilation.

Abstract syntax trees are not capable of sharing subexpression nodes because it is not possible for a node in a proper tree to have more than one parent. Although this conceptual simplicity is appealing, it may come at the cost of redundant representation and, in turn, possibly inefficiently duplicating the computation of identical terms. For this reason ASGs are often used as an intermediate language at a subsequent compilation stage to abstract syntax tree construction via parsing.

An abstract semantic graph is typically constructed from an abstract syntax tree by a process of enrichment and abstraction. The enrichment can for example be the addition of back-pointers, edges from an identifier node (where a variable is being used) to a node representing the declaration of that variable. The abstraction can entail the removal of details which are relevant only in parsing, not for semantics.

Example: Code Refactoring

For example, consider the case of code refactoring. To represent the implementation of a function that takes an input argument, the received parameter is conventionally given an arbitrary, distinct name in the source code so that it can be referenced. The abstract representation of this conceptual entity, a "function argument" instance, will likely be mentioned in the function signature, and also one or more times within the implementation code body. Since the function as a whole is the parent of both its header or "signature" information as well as its implementation body, an AST would not be able to use the same node to co-identify the multiple uses or appearances of the argument entity. This is solved by the DAG nature of an ASG. A key advantage of having a single, distinct node identity for any given code element is that each element's properties are, by definition, uniquely stored. This simplifies refactoring operations, because there is exactly one existential nexus for any given property instantiation. If the developer decides to change a property value such as the "name" of any code element (the "function argument" in this example), the ASG inherently exposes that value in exactly one place, and it follows that any such property changes are implicitly, trivially, and immediately propagated globally.

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

<span class="mw-page-title-main">Martin Fowler (software engineer)</span> American software developer, author and public speaker

Martin Fowler is a British software developer, author and international public speaker on software development, specialising in object-oriented analysis and design, UML, patterns, and agile software development methodologies, including extreme programming.

<span class="mw-page-title-main">S-expression</span> Data serialization format

In computer programming, an S-expression is an expression in a like-named notation for nested list (tree-structured) data. S-expressions were invented for and popularized by the programming language Lisp, which uses them for source code as well as data.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

<span class="mw-page-title-main">Abstract syntax tree</span> Tree representation of the abstract syntactic structure of source code

An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text written in a formal language. Each node of the tree denotes a construct occurring in the text. It is sometimes called just a syntax tree.

<span class="mw-page-title-main">Parse tree</span> Tree in formal language theory

A parse tree or parsing tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax, the term syntax tree is more common.

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

An attribute grammar is a formal way to supplement a formal grammar with semantic information processing. Semantic information is stored in attributes associated with terminal and nonterminal symbols of the grammar. The values of attributes are the result of attribute evaluation rules associated with productions of the grammar. Attributes allow the transfer of information from anywhere in the abstract syntax tree to anywhere else, in a controlled and formal way.

In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 2004 and is closely related to the family of top-down parsing languages introduced in the early 1970s. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

In computer science, graph transformation, or graph rewriting, concerns the technique of creating a new graph out of an original graph algorithmically. It has numerous applications, ranging from software engineering to layout algorithms and picture generation.

In computer science, extensible programming is a style of computer programming that focuses on mechanisms to extend the programming language, compiler, and runtime system (environment). Extensible programming languages, supporting this style of programming, were an active area of work in the 1960s, but the movement was marginalized in the 1970s. Extensible programming has become a topic of renewed interest in the 21st century.

In computer science, higher-order abstract syntax is a technique for the representation of abstract syntax trees for languages with variable binders.

A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and in fewer cases the transformations result in programs that semantically differ from the original in predictable ways.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

The DMS Software Reengineering Toolkit is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source languages for large scale software systems. DMS was originally motivated by a theory for maintaining designs of software called Design Maintenance Systems. DMS and "Design Maintenance System" are registered trademarks of Semantic Designs.

XPath is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

The syntax and semantics of Prolog, a programming language, are the sets of rules that define how a Prolog program is written and how it is interpreted, respectively. The rules are laid out in ISO standard ISO/IEC 13211 although there are differences in the Prolog implementations.

In mathematical logic, a term denotes a mathematical object while a formula denotes a mathematical fact. In particular, terms appear as components of a formula. This is analogous to natural language, where a noun phrase refers to an object and a whole sentence refers to a fact.

A term graph is a representation of an expression in a formal language as a generalized graph whose vertices are terms. Term graphs are a more powerful form of representation than expression trees because they can represent not only common subexpressions but also cyclic/recursive subexpressions.

References

  1. Garner, Richard (2011). "An abstract view on syntax with sharing". Journal of Logic and Computation. 22 (6): 1427–1452. arXiv: 1009.3682 . doi:10.1093/logcom/exr021. The notion of term graph encodes a refinement of inductively generated syntax in which regard is paid to the sharing and discard of subterms.
  2. Plump, D. (1999). Ehrig, Hartmut; Engels, G.; Rozenberg, Grzegorz (eds.). Handbook of Graph Grammars and Computing by Graph Transformation: applications, languages and tools. Vol. 2. World Scientific. pp. 9–13. ISBN   9789810228842.
  3. Barendregt, H. P.; Eekelen, M. C. J. D.; Glauert, J. R. W.; Kennaway, J. R.; Plasmeijer, M. J.; Sleep, M. R. (1987). "Term graph rewriting". In Bakker, J. W.; Nijman, A. J.; Treleaven, P. C. (eds.). PARLE Parallel Architectures and Languages Europe (PARLE 1987). Lecture Notes in Computer Science. Vol. 259. Springer. pp. 141–158. doi:10.1007/3-540-17945-3_8. ISBN   978-3-540-17945-0.

Further reading