Semantic dictionary encoding

Last updated

Semantic dictionary encoding (SDE) preserves the full semantic context of source programs while adding further information that can be used for accelerating the speed of code generation. SDE forms a code-generating loader. It is a form of bytecode combined with a JIT compiler. It is code generation at load time.

Contents

In an elementary form, the dictionary entries represent nodes in a directed acyclic graph (DAG), that describes the actions of the program, as an abstract syntax tree (AST) in tabular form.

It uses an intermediate representation (IR), that is based on the encoded abstract syntax tree and symbol table of a program.

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving and computational linguistics.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

In computer science, a compiler-compiler or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a programming language and machine.

<span class="mw-page-title-main">Abstract syntax tree</span> Tree representation of the abstract syntactic structure of source code

An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text written in a formal language. Each node of the tree denotes a construct occurring in the text. It is sometimes called just a syntax tree.

Bytecode is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

In computing, code generation is part of the process chain of a compiler and converts intermediate representation of source code into a form that can be readily executed by the target system.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

In computer science, an abstract semantic graph (ASG) or term graph is a form of abstract syntax in which an expression of a formal or programming language is represented by a graph whose vertices are the expression's subterms. An ASG is at a higher level of abstraction than an abstract syntax tree, which is used to express the syntactic structure of an expression or program.

An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.

<span class="mw-page-title-main">Ada Semantic Interface Specification</span> Interface

The Ada Semantic Interface Specification (ASIS) is a layered, open architecture providing vendor-independent access to the Ada Library Environment. It allows for the static analysis of Ada programs and libraries. It is an open, published interface library that consists of the Ada environment and their tools and applications.

A multi-pass compiler is a type of compiler that processes the source code or abstract syntax tree of a program several times. This is in contrast to a one-pass compiler, which traverses the program only once. Each pass takes the result of the previous pass as the input, and creates an intermediate output. In this way, the (intermediate) code is improved pass by pass, until the final pass produces the final code.

In computer science, higher-order abstract syntax is a technique for the representation of abstract syntax trees for languages with variable binders.

A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and in fewer cases the transformations result in programs that semantically differ from the original in predictable ways.

<span class="mw-page-title-main">Syntax (programming languages)</span> Set of rules defining correctly structured programs

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

A structure editor, also structured editor or projectional editor, is any document editor that is cognizant of the document's underlying structure. Structure editors can be used to edit hierarchical or marked up text, computer programs, diagrams, chemical formulas, and any other type of content with clear and well-defined structure. In contrast, a text editor is any document editor used for editing plain text files.

Translational Backus–Naur Form refers to Backus–Naur form, which is a formal grammar notation used to define the syntax of computer languages, such as Algol, Ada, C++, COBOL, Fortran, Java, Perl, Python, and many others. TBNF goes beyond BNF and extended BNF (EBNF) grammar notation because it not only defines the syntax of a language, but also defines the structure of the abstract syntax tree (AST) to be created in memory and the output intermediate code to be generated. Thus TBNF defines the complete translation process from input source code to intermediate code. Specification of the output intermediate code is optional, in which case you will still get automatic AST creation and have the ability to define its structure in the grammar.

The DMS Software Reengineering Toolkit is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source languages for large scale software systems. DMS was originally motivated by a theory for maintaining designs of software called Design Maintenance Systems. DMS and "Design Maintenance System" are registered trademarks of Semantic Designs.

<span class="mw-page-title-main">Michael Franz</span> American computer scientist

Michael Franz is an American computer scientist best known for his pioneering work on just-in-time compilation and optimisation and on artificial software diversity. He is a Distinguished Professor of Computer Science in the Donald Bren School of Information and Computer Sciences at the University of California, Irvine (UCI), a Professor of Electrical Engineering and Computer Science in the Henry Samueli School of Engineering at UCI, and Director of UCI's Secure Systems and Software Laboratory.

<span class="mw-page-title-main">Semantic parsing</span>

Semantic parsing is the task of converting a natural language utterance to a logical form: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include machine translation, question answering, ontology induction, automated reasoning, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations. Semantic parsing is one of the important tasks in computational linguistics and natural language processing.

References