Term indexing

Last updated

In computer science, a term index is a data structure to facilitate fast lookup of terms and clauses in a logic program, [1] deductive database, or automated theorem prover.

Contents

Overview

Many operations in automatic theorem provers require search in huge collections of terms and clauses. Such operations typically fall into the following scheme. Given a collection of terms (clauses) and a query term (clause) , find in some/all terms related to according to a certain retrieval condition. Most interesting retrieval conditions are formulated as existence of a substitution that relates in a special way the query and the retrieved objects . Here is a list of retrieval conditions frequently used in provers:

More often than not, we are actually interested in finding the appropriate substitutions explicitly, together with the retrieved terms , rather than just in establishing existence of such substitutions.

Very often the sizes of term sets to be searched are large, the retrieval calls are frequent and the retrieval condition test is rather complex. In such situations linear search in , when the retrieval condition is tested on every term from , becomes prohibitively costly. To overcome this problem, special data structures, called indexes, are designed in order to support fast retrieval. Such data structures, together with the accompanying algorithms for index maintenance and retrieval, are called term indexing techniques.

Classic indexing techniques

Substitution trees outperform path indexing, discrimination tree indexing, and abstraction trees. [2]

A discrimination tree term index stores its information in a trie data structure. [3]

Indexing techniques used in logic programming

First-argument indexing is the most common strategy where the first argument is used as index. It distinguishes atomic values and the principal functor of compound terms.

Nonfirst argument indexing is a variation of first-argument indexing that uses the same or similar techniques as first-argument indexing on one or more alternative arguments. For instance, if a predicate call uses variables for the first argument, the system may choose to use the second argument as the index instead.

Multiargument indexing creates a combined index over multiple instantiated arguments if there is not a sufficiently selective single argument index.

Deep indexing is used when multiple clauses use the same principal functor for some argument. It recursively uses the same or similar indexing techniques on the arguments of the compound terms.

Trie indexing uses a prefix tree to find applicable clauses. [4]

Related Research Articles

<span class="mw-page-title-main">Trie</span> K-ary search tree data structure

In computer science, a trie, also called digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes defined not by the entire key, but by individual characters. In order to access a key, the trie is traversed depth-first, following the links between nodes, which represent each character in the key.

<span class="mw-page-title-main">Snell's law</span> Formula for refraction angles

Snell's law is a formula used to describe the relationship between the angles of incidence and refraction, when referring to light or other waves passing through a boundary between two different isotropic media, such as water, glass, or air. In optics, the law is used in ray tracing to compute the angles of incidence or refraction, and in experimental optics to find the refractive index of a material. The law is also satisfied in meta-materials, which allow light to be bent "backward" at a negative angle of refraction with a negative refractive index.

In logic and computer science, unification is an algorithmic process of solving equations between symbolic expressions. For example, using x,y,z as variables, the singleton equation set { cons(x,cons(x,nil)) = cons(2,y) } is a syntactic first-order unification problem that has the substitution { x ↦ 2, ycons(2,nil) } as its only solution.

<span class="mw-page-title-main">Winding number</span> Number of times a curve wraps around a point in the plane

In mathematics, the winding number or winding index of a closed curve in the plane around a given point is an integer representing the total number of times that curve travels counterclockwise around the point, i.e., the curve's number of turns. For certain open plane curves, the number of turns may be non-integer. The winding number depends on the orientation of the curve, and it is negative if the curve travels around the point clockwise.

A nonholonomic system in physics and mathematics is a physical system whose state depends on the path taken in order to achieve it. Such a system is described by a set of parameters subject to differential constraints and non-linear constraints, such that when the system evolves along a path in its parameter space but finally returns to the original set of parameter values at the start of the path, the system itself may not have returned to its original state. Nonholonomic mechanics is autonomous division of Newtonian mechanics.

In particle and condensed matter physics, Goldstone bosons or Nambu–Goldstone bosons (NGBs) are bosons that appear necessarily in models exhibiting spontaneous breakdown of continuous symmetries. They were discovered by Yoichiro Nambu in particle physics within the context of the BCS superconductivity mechanism, and subsequently elucidated by Jeffrey Goldstone, and systematically generalized in the context of quantum field theory. In condensed matter physics such bosons are quasiparticles and are known as Anderson–Bogoliubov modes.

Vampire is an automatic theorem prover for first-order classical logic developed in the Department of Computer Science at the University of Manchester. Up to Version 3, it was developed by Andrei Voronkov together with Kryštof Hoder and previously with Alexandre Riazanov. Since Version 4, the development has involved a wider international team including Laura Kovacs, Giles Reger, and Martin Suda. Since 1999 it has won at least 53 trophies in the CADE ATP System Competition, the "world cup for theorem provers", including the most prestigious FOF division and the theory-reasoning TFA division.

In the statistical analysis of time series, autoregressive–moving-average (ARMA) models provide a parsimonious description of a (weakly) stationary stochastic process in terms of two polynomials, one for the autoregression (AR) and the second for the moving average (MA). The general ARMA model was described in the 1951 thesis of Peter Whittle, Hypothesis testing in time series analysis, and it was popularized in the 1970 book by George E. P. Box and Gwilym Jenkins.

<span class="mw-page-title-main">Suffix tree</span> Tree containing all suffixes of a given text

In computer science, a suffix tree is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations.

<span class="mw-page-title-main">Method of analytic tableaux</span>

In proof theory, the semantic tableau is a decision procedure for sentential and related logics, and a proof procedure for formulae of first-order logic. An analytic tableau is a tree structure computed for a logical formula, having at each node a subformula of the original formula to be proved or refuted. Computation constructs this tree and uses it to prove or refute the whole formula. The tableau method can also determine the satisfiability of finite sets of formulas of various logics. It is the most popular proof procedure for modal logics.

In mathematical statistics, the Kullback–Leibler divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. While it is a measure of how different two distributions are, and in some sense is thus a "distance", it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.

In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).

In computer science, run-time algorithm specialization is a methodology for creating efficient algorithms for costly computation tasks of certain kinds. The methodology originates in the field of automated theorem proving and, more specifically, in the Vampire theorem prover project.

A definite clause grammar (DCG) is a way of expressing grammar, either for natural or formal languages, in a logic programming language such as Prolog. It is closely related to the concept of attribute grammars / affix grammars from which Prolog was originally developed. DCGs are usually associated with Prolog, but similar languages such as Mercury also include DCGs. They are called definite clause grammars because they represent a grammar as a set of definite clauses in first-order logic.

In mathematics, the term undefined is often used to refer to an expression which is not assigned an interpretation or a value. The term can take on several different meanings depending on the context. For example:

SLD resolution is the basic inference rule used in logic programming. It is a refinement of resolution, which is both sound and refutation complete for Horn clauses.

<span class="mw-page-title-main">Gravitational lensing formalism</span>

In general relativity, a point mass deflects a light ray with impact parameter by an angle approximately equal to

<span class="mw-page-title-main">Kepler orbit</span> Celestial orbit whose trajectory is a conic section in the orbital plane

In celestial mechanics, a Kepler orbit is the motion of one body relative to another, as an ellipse, parabola, or hyperbola, which forms a two-dimensional orbital plane in three-dimensional space. A Kepler orbit can also form a straight line. It considers only the point-like gravitational attraction of two bodies, neglecting perturbations due to gravitational interactions with other objects, atmospheric drag, solar radiation pressure, a non-spherical central body, and so on. It is thus said to be a solution of a special case of the two-body problem, known as the Kepler problem. As a theory in classical mechanics, it also does not take into account the effects of general relativity. Keplerian orbits can be parametrized into six orbital elements in various ways.

Mechanics of planar particle motion is the analysis of the motion of particles gravitationally attracted to one another observed from non-inertial reference frames and the generalization of this problem to planetary motion. This type of analysis is closely related to centrifugal force, two-body problem, orbit and Kepler's laws of planetary motion. The mechanics of planar particle motion fall in the general field of analytical dynamics, and helps determining orbits from the given force laws. This article is focused more on the kinematic issues surrounding planar motion, which are the determination of the forces necessary to result in a certain trajectory given the particle trajectory.

Vector space model or term vector model is an algebraic model for representing text documents as vectors of identifiers. It is used in information filtering, information retrieval, indexing and relevancy rankings. Its first use was in the SMART Information Retrieval System.

References

  1. Colomb, Robert M. (1991). "Enhancing unification in PROLOG through clause indexing". The Journal of Logic Programming. 10: 23–44. doi:10.1016/0743-1066(91)90004-9.
  2. Peter Graf. "Substitution Tree Indexing". 1994.
  3. John W. Wheeler; Guarionex Jordan. "An Empirical Study of Term Indexing in the Darwin Implementation of the Model Evolution Calculus". 2004. p. 5.
  4. Körner, Philipp; Leuschel, Michael; Barbosa, João; Costa, Vítor Santos; Dahl, Verónica; Hermenegildo, Manuel V.; Morales, Jose F.; Wielemaker, Jan; Diaz, Daniel; Abreu, Salvador; Ciatto, Giovanni (2022). "Fifty Years of Prolog and Beyond". Theory and Practice of Logic Programming. 22 (6): 776–858. doi:10.1017/S1471068422000102. hdl: 10174/33387 . ISSN   1471-0684. Creative Commons by small.svg  This article incorporates textfrom this source, which is available under the CC BY 4.0 license.

Further reading