Semantics encoding

Last updated January 03, 2024

A semantics encoding is a translation between formal languages. For programmers, the most familiar form of encoding is the compilation of a programming language into machine code or byte-code. Conversion between document formats are also forms of encoding. Compilation of TeX or LaTeX documents to PostScript are also commonly encountered encoding processes. Some high-level preprocessors, such as OCaml's Camlp4, also involve encoding of a programming language into another.

Properties

An informal notion of translation is not sufficient to help determine expressivity of languages, as it permits trivial encodings such as mapping all elements of A to the same element of B. Therefore, it is necessary to determine the definition of a "good enough" encoding. This notion varies with the application.

Commonly, an encoding $[\cdot ]:A\longrightarrow B$ is expected to preserve a number of properties.

Preservation of compositions

soundness: For every n-ary operator $op_{A}$ of A, there exists an n-ary operator $op_{B}$ of B such that; $\forall T_{A}^{1},T_{A}^{2},\dots ,T_{A}^{n},[op_{A}(T_{A}^{1},T_{A}^{2},\cdots ,T_{A}^{n})]=op_{B}([T_{A}^{1}],[T_{A}^{2}],\cdots ,[T_{A}^{n}])$
completeness: For every n-ary operator $op_{A}$ of A, there exists an n-ary operator $op_{B}$ of B such that; $\forall T_{B}^{1},T_{B}^{2},\dots ,T_{B}^{n},\exists T_{A}^{1},\dots ,T_{A}^{n},op_{B}(T_{B}^{1},\cdots ,T_{B}^{N})=[op_{A}(T_{A}^{1},T_{A}^{2},\cdots ,T_{A}^{n})]$

(Note: as far as the author is aware of, this criterion of completeness is never used.)

Preservation of compositions is useful insofar as it guarantees that components can be examined either separately or together without "breaking" any interesting property. In particular, in the case of compilations, this soundness guarantees the possibility of proceeding with separate compilation of components, while completeness guarantees the possibility of de-compilation.

Preservation of reductions

This assumes the existence of a notion of reduction on both language A and language B. Typically, in the case of a programming language, reduction is the relation which models the execution of a program.

We write $\longrightarrow$ for one step of reduction and $\longrightarrow ^{*}$ for any number of steps of reduction.

soundness: For every terms $T_{A}^{1},T_{A}^{2}$ of language A, if $T_{A}^{1}\longrightarrow ^{*}T_{A}^{2}$ then $[T_{A}^{1}]\longrightarrow ^{*}[T_{A}^{2}]$ .
completeness: For every term $T_{A}^{1}$ of language A and every terms $T_{B}^{2}$ of language B, if $[T_{A}^{1}]\longrightarrow ^{*}T_{B}^{2}$ then there exists some $T_{A}^{2}$ such that $T_{B}^{2}=[T_{A}^{2}]$ .

This preservation guarantees that both languages behave the same way. Soundness guarantees that all possible behaviours are preserved while completeness guarantees that no behaviour is added by the encoding. In particular, in the case of compilation of a programming language, soundness and completeness together mean that the compiled program behaves accordingly to the high-level semantics of the programming language.

Preservation of termination

This also assumes the existence of a notion of reduction on both language A and language B.

soundness: for any term $T_{A}$ , if all reductions of $T_{A}$ converge, then all reductions of $[T_{A}]$ converge.
completeness: for any term $[T_{A}]$ , if all reductions of $[T_{A}]$ converge, then all reductions of $T_{A}$ converge.

In the case of compilation of a programming language, soundness guarantees that the compilation does not introduce non-termination such as endless loops or endless recursions. The completeness property is useful when language B is used to study or test a program written in language A, possibly by extracting key parts of the code: if this study or test proves that the program terminates in B, then it also terminates in A.

Preservation of observations

This assumes the existence of a notion of observation on both language A and language B. In programming languages, typical observables are results of inputs and outputs, by opposition to pure computation. In a description language such as HTML, a typical observable is the result of page rendering.

soundness: for every observable $obs_{A}$ on terms of A, there exists an observable $obs_{B}$ of terms of B such that for any term $T_{A}$ with observable $obs_{A}$ , $[T_{A}]$ has observable $obs_{B}$ .
completeness: for every observable $obs_{A}$ on terms of A, there exists an observable $obs_{B}$ on terms of B such that for any term $[T_{A}]$ with observable $obs_{B}$ , $T_{A}$ has observable $obs_{A}$ .

Preservation of simulations

This assumes the existence of notion of simulation on both language A and language B. In a programming languages, a program simulates another if it can perform all the same (observable) tasks and possibly some others. Simulations are used typically to describe compile-time optimizations.

soundness: for every terms $T_{A}^{1},T_{A}^{2}$ , if $T_{A}^{2}$ simulates $T_{A}^{1}$ then $[T_{A}^{2}]$ simulates $[T_{A}^{1}]$ .
completeness: for every terms $T_{A}^{1},T_{A}^{2}$ , if $[T_{A}^{2}]$ simulates $[T_{A}^{1}]$ then $T_{A}^{2}$ simulates $T_{A}^{1}$ .

Preservation of simulations is a much stronger property than preservation of observations, which it entails. In turn, it is weaker than a property of preservation of bisimulations. As in previous cases, soundness is important for compilation, while completeness is useful for testing or proving properties.

Preservation of equivalences

This assumes the existence of a notion of equivalence on both language A and language B. Typically, this can be a notion of equality of structured data or a notion of syntactically different yet semantically identical programs, such as structural congruence or structural equivalence.

soundness: if two terms $T_{A}^{1}$ and $T_{A}^{2}$ are equivalent in A, then $[T_{A}^{1}]$ and $[T_{A}^{2}]$ are equivalent in B.
completeness: if two terms $[T_{A}^{1}]$ and $[T_{A}^{2}]$ are equivalent in B, then $T_{A}^{1}$ and $T_{A}^{2}$ are equivalent in A.

Preservation of distribution

This assumes the existence of a notion of distribution on both language A and language B. Typically, for compilation of distributed programs written in Acute, JoCaml or E, this means distribution of processes and data among several computers or CPUs.

soundness: if a term $T_{A}$ is the composition of two agents $T_{A}^{1}~|~T_{A}^{2}$ then $[T_{A}]$ must be the composition of two agents $[T_{A}^{1}]~|~[T_{A}^{2}]$ .
completeness: if a term $[T_{A}]$ is the composition of two agents $T_{B}^{1}~|~T_{B}^{2}$ then $T_{B}$ must be the composition of two agents $T_{A}^{1}~|~T_{A}^{2}$ such that $[T_{A}^{1}]=T_{B}^{1}$ and $[T_{A}^{2}]=T_{B}^{2}$ .

External links

The Program Transformation Wiki

Related Research Articles

Lambda calculus is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation that can be used to simulate any Turing machine. It was introduced by the mathematician Alonzo Church in the 1930s as part of his research into the foundations of mathematics.

Homological algebra is the branch of mathematics that studies homology in a general algebraic setting. It is a relatively young discipline, whose origins can be traced to investigations in combinatorial topology and abstract algebra at the end of the 19th century, chiefly by Henri Poincaré and David Hilbert.

In mathematical logic, a Gödel numbering is a function that assigns to each symbol and well-formed formula of some formal language a unique natural number, called its Gödel number. The concept was developed by Kurt Gödel for the proof of his incompleteness theorems.

The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM).

Operational semantics is a category of formal programming language semantics in which certain desired properties of a program, such as correctness, safety or security, are verified by constructing proofs from logical statements about its execution and procedures, rather than by attaching mathematical meanings to its terms. Operational semantics are classified in two categories: structural operational semantics formally describe how the individual steps of a computation take place in a computer-based system; by opposition natural semantics describe how the overall results of the executions are obtained. Other approaches to providing a formal semantics of programming languages include axiomatic semantics and denotational semantics.

In mathematics, and more specifically in computer algebra, computational algebraic geometry, and computational commutative algebra, a Gröbner basis is a particular kind of generating set of an ideal in a polynomial ring $K [x 1, ..., x n]$ over a field $K$ . A Gröbner basis allows many important properties of the ideal and the associated algebraic variety to be deduced easily, such as the dimension and the number of zeros when it is finite. Gröbner basis computation is one of the main practical tools for solving systems of polynomial equations and computing the images of algebraic varieties under projections or rational maps.

An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints are linear.

In theoretical computer science, the $π$ -calculus is a process calculus. The $π$ -calculus allows channel names to be communicated along the channels themselves, and in this way it is able to describe concurrent computations whose network configuration may change during the computation.

In computer science, parameterized complexity is a branch of computational complexity theory that focuses on classifying computational problems according to their inherent difficulty with respect to multiple parameters of the input or output. The complexity of a problem is then measured as a function of those parameters. This allows the classification of NP-hard problems on a finer scale than in the classical setting, where the complexity of a problem is only measured as a function of the number of bits in the input. This appears to have been first demonstrated in Gurevich, Stockmeyer & Vishkin (1984). The first systematic work on parameterized complexity was done by Downey & Fellows (1999).

In mathematics, the Ext functors are the derived functors of the Hom functor. Along with the Tor functor, Ext is one of the core concepts of homological algebra, in which ideas from algebraic topology are used to define invariants of algebraic structures. The cohomology of groups, Lie algebras, and associative algebras can all be defined in terms of Ext. The name comes from the fact that the first Ext group Ext¹ classifies extensions of one module by another.

In computability theory, a Turing reduction from a decision problem $to a decision problem is an oracle machine which decides problem given an oracle for . It can be understood as an algorithm that could be used to solve if it had available to it a subroutine for solving . The concept can be analogously applied to function problems.$

<i>m</i>-ary tree Tree data structure in which each node has at most m children.

In graph theory, an m-ary tree is an arborescence in which each node has no more than m children. A binary tree is the special case where m = 2, and a ternary tree is another case with m = 3 that limits its children to three.

ID/LP Grammars are a subset of Phrase Structure Grammars, differentiated from other formal grammars by distinguishing between immediate dominance (ID) and linear precedence (LP) constraints. Whereas traditional phrase structure rules incorporate dominance and precedence into a single rule, ID/LP Grammars maintains separate rule sets which need not be processed simultaneously. ID/LP Grammars are used in Computational Linguistics.

In mathematics, an operad is a structure that consists of abstract operations, each one having a fixed finite number of inputs (arguments) and one output, as well as a specification of how to compose these operations. Given an operad $, one defines an algebra over to be a set together with concrete operations on this set which behave just like the abstract operations of . For instance, there is a Lie operad such that the algebras over are precisely the Lie algebras; in a sense abstractly encodes the operations that are common to all Lie algebras. An operad is to its algebras as a group is to its group representations.$

Ramsey sentences are formal logical reconstructions of theoretical propositions attempting to draw a line between science and metaphysics. A Ramsey sentence aims at rendering propositions containing non-observable theoretical terms clear by substituting them with observational terms.

In logic, especially mathematical logic, a signature lists and describes the non-logical symbols of a formal language. In universal algebra, a signature lists the operations that characterize an algebraic structure. In model theory, signatures are used for both purposes. They are rarely made explicit in more philosophical treatments of logic.

In mathematics, and more specifically in homological algebra, a resolution is an exact sequence of modules, which is used to define invariants characterizing the structure of a specific module or object of this category. When, as usually, arrows are oriented to the right, the sequence is supposed to be infinite to the left for (left) resolutions, and to the right for right resolutions. However, a finite resolution is one where only finitely many of the objects in the sequence are non-zero; it is usually represented by a finite exact sequence in which the leftmost object or the rightmost object is the zero-object.

In computational complexity theory, the language TQBF is a formal language consisting of the true quantified Boolean formulas. A (fully) quantified Boolean formula is a formula in quantified propositional logic where every variable is quantified, using either existential or universal quantifiers, at the beginning of the sentence. Such a formula is equivalent to either true or false. If such a formula evaluates to true, then that formula is in the language TQBF. It is also known as QSAT.

In mathematics, a diffiety is a geometrical object which plays the same role in the modern theory of partial differential equations that algebraic varieties play for algebraic equations, that is, to encode the space of solutions in a more conceptual way. The term was coined in 1984 by Alexandre Mikhailovich Vinogradov as portmanteau from differential variety.

In mathematical logic, a term denotes a mathematical object while a formula denotes a mathematical fact. In particular, terms appear as components of a formula. This is analogous to natural language, where a noun phrase refers to an object and a whole sentence refers to a fact.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.