Abstract syntax

Last updated

In computer science, the abstract syntax of data is its structure described as a data type (possibly, but not necessarily, an abstract data type), independent of any particular representation or encoding. [1] This is particularly used in the representation of text in computer languages, [2] which are generally stored in a tree structure as an abstract syntax tree. Abstract syntax, which only consists of the structure of data, is contrasted with concrete syntax, which also includes information about the representation. For example, concrete syntax includes features like parentheses (for grouping) or commas (for lists), which are not included in the abstract syntax, as they are implicit in the structure.

Contents

Abstract syntaxes are classified as first-order abstract syntax (FOAS), if the structure is abstract but names (identifiers) are still concrete (and thus requires name resolution), and higher-order abstract syntax, if the names themselves are abstract. [3]

Uses

To be implemented either for computation or communications, a mapping from the abstract syntax to specific machine representations and encodings must be defined; these may be called the "concrete syntax" (in language implementation) [4] or the "transfer syntax" (in communications).

A compiler's internal representation of a program will typically be specified by an abstract syntax in terms of categories such as "statement", "expression" and "identifier". This is independent of the source syntax (concrete syntax) of the language being compiled (though it will often be very similar). A parse tree is similar to an abstract syntax tree but it will typically also contain features such as parentheses, which are syntactically significant but which are implicit in the structure of the abstract syntax tree.

Algebraic data types are particularly well-suited to the implementation of abstract syntax. [5]

See also

Related Research Articles

In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a user of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user. For example, a stack has push/pop operations that follow a Last-In-First-Out rule, and can be concretely implemented using either a list or an array. Another example is a set which stores values, without any particular order, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a Boolean "in" or "not in".

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

CLU is a programming language created at the Massachusetts Institute of Technology (MIT) by Barbara Liskov and her students starting in 1973. While it did not find extensive use, it introduced many features that are used widely now, and is seen as a step in the development of object-oriented programming (OOP).

In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map values to other values, rather than a sequence of imperative statements which update the running state of the program.

<span class="mw-page-title-main">Trie</span> K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

In software engineering and computer science, abstraction is the process of generalizing concrete details, such as attributes, away from the study of objects and systems to focus attention on details of greater importance. Abstraction is a fundamental concept in computer science and software engineering, especially within the object-oriented programming paradigm. Examples of this include:

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

<span class="mw-page-title-main">Abstract syntax tree</span> Tree representation of the abstract syntactic structure of source code

An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text written in a formal language. Each node of the tree denotes a construct occurring in the text. It is sometimes called just a syntax tree.

<span class="mw-page-title-main">Data type</span> Attribute of data

In computer science and computer programming, a data type is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these values as machine types. A data type specification in a program constrains the possible values that an expression, such as a variable or a function call, might take. On literal data, it tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support basic data types of integer numbers, floating-point numbers, characters and Booleans.

In programming language theory, semantics is the rigorous mathematical study of the meaning of programming languages. Semantics assigns computational meaning to valid strings in a programming language syntax. It is closely related to, and often crosses over with, the semantics of mathematical proofs.

Axiom is a free, general-purpose computer algebra system. It consists of an interpreter environment, a compiler and a library, which defines a strongly typed hierarchy.

A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging from widely used languages for common domains, such as HTML for web pages, down to languages used by only one or a few pieces of software, such as MUSH soft code. DSLs can be further subdivided by the kind of language, and include domain-specific markup languages, domain-specific modeling languages, and domain-specific programming languages. Special-purpose computer languages have always existed in the computer age, but the term "domain-specific language" has become more popular due to the rise of domain-specific modeling. Simpler DSLs, particularly ones used by a single application, are sometimes informally called mini-languages.

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties from Prolog. It is often used as a query language for deductive databases. Datalog has been applied to problems in data integration, networking, program analysis, and more.

In computer science, an abstract semantic graph (ASG) or term graph is a form of abstract syntax in which an expression of a formal or programming language is represented by a graph whose vertices are the expression's subterms. An ASG is at a higher level of abstraction than an abstract syntax tree, which is used to express the syntactic structure of an expression or program.

In computer science, higher-order abstract syntax is a technique for the representation of abstract syntax trees for languages with variable binders.

In computing, a meta-circular evaluator (MCE) or meta-circular interpreter (MCI) is an interpreter which defines each feature of the interpreted language using a similar facility of the interpreter's host language. For example, interpreting a lambda application may be implemented using function application. Meta-circular evaluation is most prominent in the context of Lisp. A self-interpreter is a meta-circular interpreter where the interpreted language is nearly identical to the host language; the two terms are often used synonymously.

In functional programming, a generalized algebraic data type is a generalization of parametric algebraic data types.

A structure editor, also structured editor or projectional editor, is any document editor that is cognizant of the document's underlying structure. Structure editors can be used to edit hierarchical or marked up text, computer programs, diagrams, chemical formulas, and any other type of content with clear and well-defined structure. In contrast, a text editor is any document editor used for editing plain text files.

<span class="mw-page-title-main">History of compiler construction</span>

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

A term graph is a representation of an expression in a formal language as a generalized graph whose vertices are terms. Term graphs are a more powerful form of representation than expression trees because they can represent not only common subexpressions but also cyclic/recursive subexpressions.

References

  1. Fiore, M.; Plotkin, G.; Turi, D. (1999). "Abstract syntax and variable binding". Proceedings. 14th Symposium on Logic in Computer Science (Cat. No. PR00158). pp. 193–202. doi:10.1109/LICS.1999.782615. ISBN   0-7695-0158-3. S2CID   7384052 . Retrieved 2023-11-02.
  2. "ASTLOG: A language for examining abstract syntax trees". DSL'97: Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997. USENIX Association. 15 October 1997. p. 18.
  3. Pfenning, F.; Elliott, C. (1988-06-01). "Higher-order abstract syntax". ACM SIGPLAN Notices . 23 (7): 199–208. doi: 10.1145/960116.54010 . ISSN   0362-1340.
  4. Wile, David S. (1997). "Abstract syntax from concrete syntax". Proceedings of the 19th international conference on Software engineering - ICSE '97. ACM Press. pp. 472–480. doi:10.1145/253228.253388. ISBN   978-0-89791-914-2. S2CID   14351497.
  5. Corradini, Andrea; Gadducci, Fabio (2002-09-17). "A functorial semantics for multi-algebras and partial algebras, with applications to syntax". Theoretical Computer Science. 286 (2): 293–322. doi: 10.1016/S0304-3975(01)00319-X . ISSN   0304-3975.