Occurs check

Last updated January 23, 2025

In computer science, the occurs check is a part of algorithms for syntactic unification. It causes unification of a variable V and a structure S to fail if S contains V.

Application in theorem proving

In theorem proving, unification without the occurs check can lead to unsound inference. For example, the Prolog goal $X=f(X)$ will succeed, binding X to a cyclic structure which has no counterpart in the Herbrand universe. As another example,^[1] without occurs-check, a resolution proof can be found for the non-theorem^[2] $(\forall x\exists y.p(x,y))\rightarrow (\exists y\forall x.p(x,y))$ : the negation of that formula has the conjunctive normal form $p(X,f(X))\land \lnot p(g(Y),Y)$ , with $f$ and $g$ denoting the Skolem function for the first and second existential quantifier, respectively. Without occurs check, the literals $p(X,f(X))$ and $p(g(Y),Y)$ are unifiable, producing the refuting empty clause.

Rational tree unification

Prolog implementations usually omit the occurs check for reasons of efficiency, which can lead to circular data structures and looping. By not performing the occurs check, the worst case complexity of unifying a term $t_{1}$ with term $t_{2}$ is reduced in many cases from $O({\text{size}}(t_{1})+{\text{size}}(t_{2}))$ to $O({\text{min}}({\text{size}}(t_{1}),{\text{size}}(t_{2})))$ ; in the particular, frequent case of variable-term unifications, runtime shrinks to $O(1)$ . ^{[nb 1]}

Modern implementations, based on Colmerauer's Prolog II, ^[4]^[5]^[6]^[7] use rational tree unification to avoid looping. However it is difficult to keep the complexity time linear in the presence of cyclic terms. Examples where Colmerauers algorithm becomes quadratic ^[8] can be readily constructed, but refinement proposals exist.

See image for an example run of the unification algorithm given in Unification (computer science)#A unification algorithm, trying to solve the goal $cons(x,y){\stackrel {?}{=}}cons(1,cons(x,cons(2,y)))$ , however without the occurs check rule (named "check" there); applying rule "eliminate" instead leads to a cyclic graph (i.e. an infinite term) in the last step.

Sound unification

ISO Prolog implementations have the built-in predicate unify_with_occurs_check/2 for sound unification but are free to use unsound or even looping algorithms when unification is invoked otherwise, provided the algorithm works correctly for all cases that are "not subject to occurs-check" (NSTO).^[9] The built-in acyclic_term/1 serves to check the finiteness of terms.

Implementations offering sound unification for all unifications are Qu-Prolog and Strawberry Prolog and (optionally, via a runtime flag): XSB, SWI-Prolog, CxProlog, Tau Prolog, Trealla Prolog and Scryer Prolog. A variety ^[10]^[11] of optimizations can render sound unification feasible for common cases.

Notes

↑ Some Prolog manuals state that the complexity of unification without occurs check is $O({\text{min}}({\text{size}}(t_{1}),{\text{size}}(t_{2})))$ (in all cases).^[3] This is incorrect, as it would imply comparing arbitrary ground terms in constant time (by unifying $eq(t_{1},t_{2})$ with $eq(X,X)$ ).

Related Research Articles

In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.

In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and explores the relationships between these classifications. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm.

First-order logic—also called predicate logic, predicate calculus, quantificational logic—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over non-logical objects, and allows the use of sentences that contain variables. Rather than propositions such as "all men are mortal", in first-order logic one can have expressions in the form "for all x, if x is a man, then x is mortal"; where "for all x" is a quantifier, x is a variable, and "... is a man" and "... is mortal" are predicates. This distinguishes it from propositional logic, which does not use quantifiers or relations; in this sense, propositional logic is the foundation of first-order logic.

Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving and computational linguistics.

Presburger arithmetic is the first-order theory of the natural numbers with addition, named in honor of Mojżesz Presburger, who introduced it in 1929. The signature of Presburger arithmetic contains only the addition operation and equality, omitting the multiplication operation entirely. The theory is computably axiomatizable; the axioms include a schema of induction.

In logic and computer science, specifically automated reasoning, unification is an algorithmic process of solving equations between symbolic expressions, each of the form Left-hand side = Right-hand side. For example, using x,y,z as variables, and taking f to be an uninterpreted function, the singleton equation set { f(1,y) = f(x,2) } is a syntactic first-order unification problem that has the substitution { x ↦ 1, y ↦ 2 } as its only solution.

In computer science, bogosort is a sorting algorithm based on the generate and test paradigm. The function successively generates permutations of its input until it finds one that is sorted. It is not considered useful for sorting, but may be used for educational purposes, to contrast it with more efficient algorithms. The algorithm's name is a portmanteau of the words bogus and sort.

In computer science and automata theory, a deterministic Büchi automaton is a theoretical machine which either accepts or rejects infinite inputs. Such a machine has a set of states and a transition function, which determines which state the machine should move to from its current state when it reads the next input character. Some states are accepting states and one state is the start state. The machine accepts an input if and only if it will pass through an accepting state infinitely many times as it reads the input.

Default logic is a non-monotonic logic proposed by Raymond Reiter to formalize reasoning with default assumptions.

Descriptive complexity is a branch of computational complexity theory and of finite model theory that characterizes complexity classes by the type of logic needed to express the languages in them. For example, PH, the union of all complexity classes in the polynomial hierarchy, is precisely the class of languages expressible by statements of second-order logic. This connection between complexity and the logic of finite structures allows results to be transferred easily from one area to the other, facilitating new proof methods and providing additional evidence that the main complexity classes are somehow "natural" and not tied to the specific abstract machines used to define them.

Constraint Handling Rules (CHR) is a declarative, rule-based programming language, introduced in 1991 by Thom Frühwirth at the time with European Computer-Industry Research Centre (ECRC) in Munich, Germany. Originally intended for constraint programming, CHR finds applications in grammar induction, type systems, abductive reasoning, multi-agent systems, natural language processing, compilation, scheduling, spatial-temporal reasoning, testing, and verification.

Finite model theory is a subarea of model theory. Model theory is the branch of logic which deals with the relation between a formal language (syntax) and its interpretations (semantics). Finite model theory is a restriction of model theory to interpretations on finite structures, which have a finite universe.

The simply typed lambda calculus, a form of type theory, is a typed interpretation of the lambda calculus with only one type constructor that builds function types. It is the canonical and simplest example of a typed lambda calculus. The simply typed lambda calculus was originally introduced by Alonzo Church in 1940 as an attempt to avoid paradoxical use of the untyped lambda calculus.

Negation As Failure is a non-monotonic inference rule in logic programming, used to derive $from failure to derive . Note that can be different from the statement of the logical negation of, depending on the completeness of the inference algorithm and thus also on the formal logic system.$

A definite clause grammar (DCG) is a way of expressing grammar, either for natural or formal languages, in a logic programming language such as Prolog. It is closely related to the concept of attribute grammars / affix grammars. DCGs are usually associated with Prolog, but similar languages such as Mercury also include DCGs. They are called definite clause grammars because they represent a grammar as a set of definite clauses in first-order logic.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In combinatorics, a square-free word is a word that does not contain any squares. A square is a word of the form $XX$ , where $X$ is not empty. Thus, a square-free word can also be defined as a word that avoids the pattern $XX$ .

The study of facility location problems (FLP), also known as location analysis, is a branch of operations research and computational geometry concerned with the optimal placement of facilities to minimize transportation costs while considering factors like avoiding placing hazardous materials near housing, and competitors' facilities. The techniques also apply to cluster analysis.

In graph theory, the metric $k$ -center problem or vertex k-center problem is a classical combinatorial optimization problem studied in theoretical computer science that is NP-hard. Given $n$ cities with specified distances, one wants to build $k$ warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of $k$ vertices for which the largest distance of any point to its closest vertex in the $k$ -set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality. It has application in facility location and clustering.

In game theory, a mean payoff game is a zero-sum game played on the vertices of a weighted directed graph. The game is played as follows: at the start of the game, a token is placed on one of the vertices of the graph. Each vertex is assigned to either the Maximizer of the Minimizer. The player that controls the current vertex the token is on, may choose one outgoing edge along which the token moves next. In doing so, the Minimizer pays the maximizer the number that is on the edge. Then, again, the player controlling the next vertex the token gets can choose where it goes, and this continues indefinitely. The objective for the Maximizer is to maximize their long term average payoff, and the Minimizer has the opposite objective.

References

↑ David A. Duffy (1991). Principles of Automated Theorem Proving. Wiley.; here: p.143
↑ Informally, and taking $p(x,y)$ to mean e.g. "x loves y", the formula reads "If everybody loves somebody, then a single person must exist that is loved by everyone."
↑ F. Pereira; D. Warren; D. Bowen; L. Byrd; L. Pereira (1983). C-Prolog's User's Manual Version 1.2 (Technical report). SRI International. Retrieved 21 June 2013.
↑ A. Colmerauer (1982). K.L. Clark; S.-A. Tarnlund (eds.). Prolog and Infinite Trees. Academic Press.
↑ M.H. van Emden; J.W. Lloyd (1984). "A Logical Reconstruction of Prolog II". Journal of Logic Programming. 2: 143–149.
↑ Joxan Jaffar; Peter J. Stuckey (1986). "Semantics of Infinite Tree Logic Programming". Theoretical Computer Science. 46: 141–158. doi: 10.1016/0304-3975(86)90027-7 .
↑ B. Courcelle (1983). "Fundamental Properties of Infinite Trees". Theoretical Computer Science. 25 (2): 95–169. doi: 10.1016/0304-3975(83)90059-2 .
↑ Albertro Martelli; Gianfranco Rossi (1984). Efficient Unification with Infinite Terms in Logic Programming (PDF). The International Conference oj Fifth Generation Computer Systems.
↑ 7.3.4 Normal unification in Prolog of ISO/IEC 13211-1:1995.
↑ Ritu Chadha; David A. Plaisted (1994). "Correctness of unification without occur check in prolog". The Journal of Logic Programming. 18 (2): 99–122. doi: 10.1016/0743-1066(94)90048-5 .
↑ Thomas Prokosch; François Bry (2020). Unification on the Run (PDF). The 34th International Workshop on Unification. pp. 13:1–13:5.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[4] Some Prolog manuals state that the complexity of unification without occurs check is $O({\text{min}}({\text{size}}(t_{1}),{\text{size}}(t_{2})))$ (in all cases).^[3] This is incorrect, as it would imply comparing arbitrary ground terms in constant time (by unifying $eq(t_{1},t_{2})$ with $eq(X,X)$ ).

[1] David A. Duffy (1991). Principles of Automated Theorem Proving. Wiley.; here: p.143

[2] Informally, and taking $p(x,y)$ to mean e.g. "x loves y", the formula reads "If everybody loves somebody, then a single person must exist that is loved by everyone."

[3] F. Pereira; D. Warren; D. Bowen; L. Byrd; L. Pereira (1983). C-Prolog's User's Manual Version 1.2 (Technical report). SRI International. Retrieved 21 June 2013.

[5] A. Colmerauer (1982). K.L. Clark; S.-A. Tarnlund (eds.). Prolog and Infinite Trees. Academic Press.

[6] M.H. van Emden; J.W. Lloyd (1984). "A Logical Reconstruction of Prolog II". Journal of Logic Programming. 2: 143–149.

[7] Joxan Jaffar; Peter J. Stuckey (1986). "Semantics of Infinite Tree Logic Programming". Theoretical Computer Science. 46: 141–158. doi: 10.1016/0304-3975(86)90027-7 .

[8] B. Courcelle (1983). "Fundamental Properties of Infinite Trees". Theoretical Computer Science. 25 (2): 95–169. doi: 10.1016/0304-3975(83)90059-2 .

[9] Albertro Martelli; Gianfranco Rossi (1984). Efficient Unification with Infinite Terms in Logic Programming (PDF). The International Conference oj Fifth Generation Computer Systems.

[10] 7.3.4 Normal unification in Prolog of ISO/IEC 13211-1:1995.

[11] Ritu Chadha; David A. Plaisted (1994). "Correctness of unification without occur check in prolog". The Journal of Logic Programming. 18 (2): 99–122. doi: 10.1016/0743-1066(94)90048-5 .

[12] Thomas Prokosch; François Bry (2020). Unification on the Run (PDF). The 34th International Workshop on Unification. pp. 13:1–13:5.

[1]

[2]

[nb 1]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[3]