Anti-unification

Last updated

Anti-unification is the process of constructing a generalization common to two given symbolic expressions. As in unification, several frameworks are distinguished depending on which expressions (also called terms) are allowed, and which expressions are considered equal. If variables representing functions are allowed in an expression, the process is called "higher-order anti-unification", otherwise "first-order anti-unification". If the generalization is required to have an instance literally equal to each input expression, the process is called "syntactical anti-unification", otherwise "E-anti-unification", or "anti-unification modulo theory".

Contents

An anti-unification algorithm should compute for given expressions a complete and minimal generalization set, that is, a set covering all generalizations and containing no redundant members, respectively. Depending on the framework, a complete and minimal generalization set may have one, finitely many, or possibly infinitely many members, or may not exist at all; [note 1] it cannot be empty, since a trivial generalization exists in any case. For first-order syntactical anti-unification, Gordon Plotkin [1] [2] gave an algorithm that computes a complete and minimal singleton generalization set containing the so-called "least general generalization" (lgg).

Anti-unification should not be confused with dis-unification. The latter means the process of solving systems of inequations, that is of finding values for the variables such that all given inequations are satisfied. [note 2] This task is quite different from finding generalizations.

Prerequisites

Formally, an anti-unification approach presupposes

First-order term

Given a set of variable symbols, a set of constant symbols and sets of -ary function symbols, also called operator symbols, for each natural number , the set of (unsorted first-order) terms is recursively defined to be the smallest set with the following properties: [3]

For example, if x V is a variable symbol, 1 C is a constant symbol, and add F2 is a binary function symbol, then x T, 1 T, and (hence) add(x,1) T by the first, second, and third term building rule, respectively. The latter term is usually written as x+1, using Infix notation and the more common operator symbol + for convenience.

Higher-order term

Substitution

A substitution is a mapping from variables to terms; the notation refers to a substitution mapping each variable to the term , for , and every other variable to itself. Applying that substitution to a term t is written in postfix notation as ; it means to (simultaneously) replace every occurrence of each variable in the term t by . The result tσ of applying a substitution σ to a term t is called an instance of that term t. As a first-order example, applying the substitution to the term

f(x, a, g(z), y)yields
f(h(a,y), a, g(b), y).

Generalization, specialization

If a term has an instance equivalent to a term , that is, if for some substitution , then is called more general than , and is called more special than, or subsumed by, . For example, is more general than if is commutative, since then .

If is literal (syntactic) identity of terms, a term may be both more general and more special than another one only if both terms differ just in their variable names, not in their syntactic structure; such terms are called variants, or renamings of each other. For example, is a variant of , since and . However, is not a variant of , since no substitution can transform the latter term into the former one, although achieves the reverse direction. The latter term is hence properly more special than the former one.

A substitution is more special than, or subsumed by, a substitution if is more special than for each variable . For example, is more special than , since and is more special than and , respectively.

Anti-unification problem, generalization set

An anti-unification problem is a pair of terms. A term is a common generalization, or anti-unifier, of and if and for some substitutions . For a given anti-unification problem, a set of anti-unifiers is called complete if each generalization subsumes some term ; the set is called minimal if none of its members subsumes another one.

First-order syntactical anti-unification

The framework of first-order syntactical anti-unification is based on being the set of first-order terms (over some given set of variables, of constants and of -ary function symbols) and on being syntactic equality. In this framework, each anti-unification problem has a complete, and obviously minimal, singleton solution set . Its member is called the least general generalization (lgg) of the problem, it has an instance syntactically equal to and another one syntactically equal to . Any common generalization of and subsumes . The lgg is unique up to variants: if and are both complete and minimal solution sets of the same syntactical anti-unification problem, then and for some terms and , that are renamings of each other.

Plotkin [1] [2] has given an algorithm to compute the lgg of two given terms. It presupposes an injective mapping , that is, a mapping assigning each pair of terms an own variable , such that no two pairs share the same variable. [note 4] The algorithm consists of two rules:

if previous rule not applicable

For example, ; this least general generalization reflects the common property of both inputs of being square numbers.

Plotkin used his algorithm to compute the "relative least general generalization (rlgg)" of two clause sets in first-order logic, which was the basis of the Golem approach to inductive logic programming.

First-order anti-unification modulo theory

Equational theories

First-order sorted anti-unification

Nominal anti-unification

Applications

Higher-order anti-unification

Notes

  1. Complete generalization sets always exist, but it may be the case that every complete generalization set is non-minimal.
  2. Comon referred in 1986 to inequation-solving as "anti-unification", which nowadays has become quite unusual. Comon, Hubert (1986). "Sufficient Completeness, Term Rewriting Systems and 'Anti-Unification'". Proc. 8th International Conference on Automated Deduction. LNCS. Vol. 230. Springer. pp. 128–140.
  3. E.g.
  4. From a theoretical viewpoint, such a mapping exists, since both and are countably infinite sets; for practical purposes, can be built up as needed, remembering assigned mappings in a hash table.

Related Research Articles

First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over non-logical objects, and allows the use of sentences that contain variables, so that rather than propositions such as "Socrates is a man", one can have expressions in the form "there exists x such that x is Socrates and x is a man", where "there exists" is a quantifier, while x is a variable. This distinguishes it from propositional logic, which does not use quantifiers or relations; in this sense, propositional logic is the foundation of first-order logic.

<span class="mw-page-title-main">Original proof of Gödel's completeness theorem</span>

The proof of Gödel's completeness theorem given by Kurt Gödel in his doctoral dissertation of 1929 is not easy to read today; it uses concepts and formalisms that are no longer used and terminology that is often obscure. The version given below attempts to represent all the steps in the proof and all the important ideas faithfully, while restating the proof in the modern language of mathematical logic. This outline should not be considered a rigorous proof of the theorem.

In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant. Partial derivatives are used in vector calculus and differential geometry.

In logic and computer science, specifically automated reasoning, unification is an algorithmic process of solving equations between symbolic expressions, each of the form Left-hand side = Right-hand side. For example, using x,y,z as variables, and taking f to be an uninterpreted function, the singleton equation set { f(1,y) = f(x,2) } is a syntactic first-order unification problem that has the substitution { x ↦ 1, y ↦ 2 } as its only solution.

In mathematics, equality is a relationship between two quantities or, more generally, two mathematical expressions, asserting that the quantities have the same value, or that the expressions represent the same mathematical object. Equality between A and B is written A = B, and pronounced "A equals B". The symbol "=" is called an "equals sign". Two objects that are not equal are said to be distinct.

In mathematics, and in other disciplines involving formal languages, including mathematical logic and computer science, a variable may be said to be either free or bound. A free variable is a notation (symbol) that specifies places in an expression where substitution may take place and is not a parameter of this or any container expression. Some older books use the terms real variable and apparent variable for free variable and bound variable, respectively. The idea is related to a placeholder, or a wildcard character that stands for an unspecified symbol.

In mathematics, a function from a set X to a set Y assigns to each element of X exactly one element of Y. The set X is called the domain of the function and the set Y is called the codomain of the function.

In category theory, the coproduct, or categorical sum, is a construction which includes as examples the disjoint union of sets and of topological spaces, the free product of groups, and the direct sum of modules and vector spaces. The coproduct of a family of objects is essentially the "least specific" object to which each object in the family admits a morphism. It is the category-theoretic dual notion to the categorical product, which means the definition is the same as the product but with all arrows reversed. Despite this seemingly innocuous change in the name and notation, coproducts can be and typically are dramatically different from products.

In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems. In their most basic form, they consist of a set of objects, plus relations on how to transform those objects.

In mathematics, specifically in operator theory, each linear operator on an inner product space defines a Hermitian adjoint operator on that space according to the rule

In cryptanalysis, the piling-up lemma is a principle used in linear cryptanalysis to construct linear approximations to the action of block ciphers. It was introduced by Mitsuru Matsui (1993) as an analytical tool for linear cryptanalysis. The lemma states that the bias of a linear Boolean function (XOR-clause) of independent binary random variables is related to the product of the input biases:

In Boolean algebra, the algebraic normal form (ANF), ring sum normal form, Zhegalkin normal form, or Reed–Muller expansion is a way of writing propositional logic formulas in one of three subforms:

In mathematical functional analysis a partial isometry is a linear map between Hilbert spaces such that it is an isometry on the orthogonal complement of its kernel.

In universal algebra and mathematical logic, a term algebra is a freely generated algebraic structure over a given signature. For example, in a signature consisting of a single binary operation, the term algebra over a set X of variables is exactly the free magma generated by X. Other synonyms for the notion include absolutely free algebra and anarchic algebra.

Affine arithmetic (AA) is a model for self-validated numerical analysis. In AA, the quantities of interest are represented as affine combinations of certain primitive variables, which stand for sources of uncertainty in the data or approximations made during the computation.

In mathematical logic, an atomic formula is a formula with no deeper propositional structure, that is, a formula that contains no logical connectives or equivalently a formula that has no strict subformulas. Atoms are thus the simplest well-formed formulas of the logic. Compound formulas are formed by combining the atomic formulas using the logical connectives.

A substitution is a syntactic transformation on formal expressions. To apply a substitution to an expression means to consistently replace its variable, or placeholder, symbols with other expressions.

In mathematical logic, a term denotes a mathematical object while a formula denotes a mathematical fact. In particular, terms appear as components of a formula. This is analogous to natural language, where a noun phrase refers to an object and a whole sentence refers to a fact.

Dependence logic is a logical formalism, created by Jouko Väänänen, which adds dependence atoms to the language of first-order logic. A dependence atom is an expression of the form , where are terms, and corresponds to the statement that the value of is functionally dependent on the values of .

In mathematical analysis and its applications, a function of several real variables or real multivariate function is a function with more than one argument, with all arguments being real variables. This concept extends the idea of a function of a real variable to several variables. The "input" variables take real values, while the "output", also called the "value of the function", may be real or complex. However, the study of the complex-valued functions may be easily reduced to the study of the real-valued functions, by considering the real and imaginary parts of the complex function; therefore, unless explicitly specified, only real-valued functions will be considered in this article.

References

  1. 1 2 Plotkin, Gordon D. (1970). Meltzer, B.; Michie, D. (eds.). "A Note on Inductive Generalization". Machine Intelligence. 5: 153–163.
  2. 1 2 Plotkin, Gordon D. (1971). Meltzer, B.; Michie, D. (eds.). "A Further Note on Inductive Generalization". Machine Intelligence. 6: 101–124.
  3. C.C. Chang; H. Jerome Keisler (1977). A. Heyting; H.J. Keisler; A. Mostowski; A. Robinson; P. Suppes (eds.). Model Theory. Studies in Logic and the Foundation of Mathematics. Vol. 73. North Holland.; here: Sect.1.3