Unparser

Last updated
A parse tree, which generates "John hit the ball" when it is unparsed. ParseTree.svg
A parse tree, which generates "John hit the ball" when it is unparsed.

In computing, an unparser is a system that constructs a set of characters or image components from a given parse tree. [1] [2]

Computing activity requiring, benefiting from, or creating computers

Computing is any activity that uses computers. It includes developing hardware and software, and using computers to manage and process information, communicate and entertain. Computing is a critically important, integral component of modern industrial technology. Major computing disciplines include computer engineering, software engineering, computer science, information systems, and information technology.

Parse tree ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar

A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax, the term syntax tree is more common.

An unparser is in effect the reverse of a traditional parser that takes a set of string of characters and produces a parse tree. Unparsing generally involves the application of a specific set of rules to the parse tree as a "tree walk" takes place. [1]

Parsing, syntax analysis, or syntactic analysis is the process of analysing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

In computer science, tree traversal is a form of graph traversal and refers to the process of visiting each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

Given that the tree may involve both textual and graphic elements, the unparser may have two separate modules, each of which handles the relevant components. [2] In such cases the "master unparser" looks up the "master unparse table" to determine if a given nested structure should be handled by one module, or the other. [2]

See also

In computer programming, bidirectional transformations (bx) are programs in which a single piece of code can be run in several ways, such that the same data are sometimes considered as input, and sometimes as output. For example, a bx run in the forward direction might transform input I into output O, while the same bx run backward would take as input versions of I and O and produce a new version of I as its output.

In formal language theory, a grammar is a set of production rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context—only their form.

Related Research Articles

A compiler is a computer program that transforms computer code written in one programming language into another programming language. Compilers are a type of translator that support digital devices, primarily computers. The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language to create an executable program.

Formal language set of strings of symbols that may be constrained by rules that are specific to it

In mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.

XML Markup language developed by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The W3C's XML 1.0 Specification and several other related specifications—all of them free open standards—define XML.

L-system

An L-system or Lindenmayer system is a parallel rewriting system and a type of formal grammar. An L-system consists of an alphabet of symbols that can be used to make strings, a collection of production rules that expand each symbol into some larger string of symbols, an initial "axiom" string from which to begin construction, and a mechanism for translating the generated strings into geometric structures. L-systems were introduced and developed in 1968 by Aristid Lindenmayer, a Hungarian theoretical biologist and botanist at the University of Utrecht. Lindenmayer used L-systems to describe the behaviour of plant cells and to model the growth processes of plant development. L-systems have also been used to model the morphology of a variety of organisms and can be used to generate self-similar fractals.

In computer science, the Cocke–Younger–Kasami algorithm is a parsing algorithm for context-free grammars, named after its inventors, John Cocke, Daniel Younger and Tadao Kasami. It employs bottom-up parsing and dynamic programming.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine. The input may be a text file containing the grammar written in BNF or EBNF that defines the syntax of a programming language, and whose generated output is some source code of the parser for the programming language, although other definitions exist. Usually, the resulting source code will have to be extended upon before a complete compiler emerges.

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters into a sequence of tokens. A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer. A lexer is generally combined with a parser, which together analyze the syntax of programming languages, web pages, and so forth.

In computer science, top-down parsing is a parsing strategy where one first looks at the highest level of the parse tree and works down the parse tree by using the rewriting rules of a formal grammar. LL parsers are a type of parser that uses a top-down parsing strategy.

Visual programming language

In computing, a visual programming language (VPL) is any programming language that lets users create programs by manipulating program elements graphically rather than by specifying them textually. A VPL allows programming with visual expressions, spatial arrangements of text and graphic symbols, used either as elements of syntax or secondary notation. For example, many VPLs are based on the idea of "boxes and arrows", where boxes or other screen objects are treated as entities, connected by arrows, lines or arcs which represent relations.

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts, such as in simple mutually recursive descent parsing. Although related to caching, memoization refers to a specific case of this optimization, distinguishing it from forms of caching such as buffering or page replacement. In the context of some logic programming languages, memoization is also known as tabling; see also lookup table.

In computer science, an abstract semantic graph (ASG) or term graph is a form of abstract syntax in which an expression of a formal or programming language is represented by a graph whose vertices are the expression's subterms. An ASG is at a higher level of abstraction than an abstract syntax tree, which is used to express the syntactic structure of an expression or program.

In computer science, graph transformation, or graph rewriting, concerns the technique of creating a new graph out of an original graph algorithmically. It has numerous applications, ranging from software engineering to layout algorithms and picture generation.

Syntax (programming languages) in programming languages

In computer science, the syntax of a computer language is the set of rules that defines the combinations of symbols that are considered to be a correctly structured document or fragment in that language. This applies both to programming languages, where the document represents source code, and markup languages, where the document represents data. The syntax of a language defines its surface form. Text-based computer languages are based on sequences of characters, while visual programming languages are based on the spatial layout and connections between symbols. Documents that are syntactically invalid are said to have a syntax error.

In computer science, scannerless parsing refers to performing both tokenization and parsing in a single step, rather than breaking it up into a pipeline of a lexer followed by a parser, executing concurrently. It also refers to the associated grammar: using a single formalism to express both the lexical grammar and phrase level grammar used to parse a language.

The DMS Software Reengineering Toolkit is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source languages for large scale software systems.

A program structure tree (PST) is a hierarchical diagram that displays the nesting relationship of single-entry single-exit (SESE) fragments/regions, showing the organization of a computer program. Nodes in this tree represent SESE regions of the program, while edges represent nesting regions. The PST is defined for all control flow graphs.

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

Parse Thicket

A Parse Thicket is a graph that represents the syntactic structure of a paragraph of text in natural language processing. A Parse Thicket includes Parse tree for each sentence for this paragraph plus some arcs for other relations between words other than syntactic. Parse thickets can be constructed for both constituency parse trees and dependency parse trees. The relations which link parse trees within a Parse Thicket are:

References

  1. 1 2 Software Science and Engineering edited by Ikuo Nakata 1991 ISBN   981020776X page 168
  2. 1 2 3 Handbook of Graph Grammars and Computing by Graph Transformation: Applications, Languages and Tools by H. Ehrig, G. Engels 1999 ISBN   9810240201 pages 231-232