Thousands of Problems for Theorem Provers

Last updated

TPTP (Thousands of Problems for Theorem Provers) [1] is a freely available collection of problems for automated theorem proving. It is used to evaluate the efficacy of automated reasoning algorithms. [2] [3] [4] Problems are expressed in a simple text-based format for first order logic or higher-order logic. [5] TPTP is used as the source of some problems in CASC.

Related Research Articles

Automated theorem proving is a subfield of automated reasoning and mathematical logic dealing with proving mathematical theorems by computer programs. Automated reasoning over mathematical proof was a major impetus for the development of computer science.

Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. The term "inductive" here refers to philosophical rather than mathematical induction. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

Logic for Computable Functions (LCF) is an interactive automated theorem prover developed at Stanford and Edinburgh by Robin Milner and collaborators in early 1970s, based on the theoretical foundation of logic of computable functions previously proposed by Dana Scott. Work on the LCF system introduced the general-purpose programming language ML to allow users to write theorem-proving tactics, supporting algebraic data types, parametric polymorphism, abstract data types, and exceptions.

<span class="mw-page-title-main">Mizar system</span>

The Mizar system consists of a formal language for writing mathematical definitions and proofs, a proof assistant, which is able to mechanically check proofs written in this language, and a library of formalized mathematics, which can be used in the proof of new theorems. The system is maintained and developed by the Mizar Project, formerly under the direction of its founder Andrzej Trybulec.

E is a high-performance theorem prover for full first-order logic with equality. It is based on the equational superposition calculus and uses a purely equational paradigm. It has been integrated into other theorem provers and it has been among the best-placed systems in several theorem proving competitions. E is developed by Stephan Schulz, originally in the Automated Reasoning Group at TU Munich, now at Baden-Württemberg Cooperative State University Stuttgart.

<span class="mw-page-title-main">Model checking</span> Computer science field

In computer science, model checking or property checking is a method for checking whether a finite-state model of a system meets a given specification. This is typically associated with hardware or software systems, where the specification contains liveness requirements as well as safety requirements.

<span class="mw-page-title-main">Coq (software)</span> Proof assistant

Coq is an interactive theorem prover first released in 1989. It allows for expressing mathematical assertions, mechanically checks proofs of these assertions, helps find formal proofs, and extracts a certified program from the constructive proof of its formal specification. Coq works within the theory of the calculus of inductive constructions, a derivative of the calculus of constructions. Coq is not an automated theorem prover but includes automatic theorem proving tactics (procedures) and various decision procedures.

<span class="mw-page-title-main">Proof assistant</span> Software tool to assist with the development of formal proofs by human-machine collaboration

In computer science and mathematical logic, a proof assistant or interactive theorem prover is a software tool to assist with the development of formal proofs by human-machine collaboration. This involves some sort of interactive proof editor, or other interface, with which a human can guide the search for proofs, the details of which are stored in, and some steps provided by, a computer.

In computer science, in particular in knowledge representation and reasoning and metalogic, the area of automated reasoning is dedicated to understanding different aspects of reasoning. The study of automated reasoning helps produce computer programs that allow computers to reason completely, or nearly completely, automatically. Although automated reasoning is considered a sub-field of artificial intelligence, it also has connections with theoretical computer science and philosophy.

<span class="mw-page-title-main">Lawrence Paulson</span> American computer scientist

Lawrence Charles Paulson is an American computer scientist. He is a Professor of Computational Logic at the University of Cambridge Computer Laboratory and a Fellow of Clare College, Cambridge.

In computer science and mathematical logic, satisfiability modulo theories (SMT) is the problem of determining whether a mathematical formula is satisfiable. It generalizes the Boolean satisfiability problem (SAT) to more complex formulas involving real numbers, integers, and/or various data structures such as lists, arrays, bit vectors, and strings. The name is derived from the fact that these expressions are interpreted within ("modulo") a certain formal theory in first-order logic with equality. SMT solvers are tools that aim to solve the SMT problem for a practical subset of inputs. SMT solvers such as Z3 and cvc5 have been used as a building block for a wide range of applications across computer science, including in automated theorem proving, program analysis, program verification, and software testing.

<span class="mw-page-title-main">Mutilated chessboard problem</span> On domino tiling after removing two corners

The mutilated chessboard problem is a tiling puzzle posed by Max Black in 1946 that asks:

Suppose a standard 8×8 chessboard has two diagonally opposite corners removed, leaving 62 squares. Is it possible to place 31 dominoes of size 2×1 so as to cover all of these squares?

In mathematical logic, monadic second-order logic (MSO) is the fragment of second-order logic where the second-order quantification is limited to quantification over sets. It is particularly important in the logic of graphs, because of Courcelle's theorem, which provides algorithms for evaluating monadic second-order formulas over graphs of bounded treewidth. It is also of fundamental importance in automata theory, where the Büchi–Elgot–Trakhtenbrot theorem gives a logical characterization of the regular languages.

In computer science and formal methods, a SAT solver is a computer program which aims to solve the Boolean satisfiability problem. On input a formula over Boolean variables, such as "(x or y) and (x or not y)", a SAT solver outputs whether the formula is satisfiable, meaning that there are possible values of x and y which make the formula true, or unsatisfiable, meaning that there are no such values of x and y. In this case, the formula is satisfiable when x is true, so the solver should return "satisfiable". Since the introduction of algorithms for SAT in the 1960s, modern SAT solvers have grown into complex software artifacts involving a large number of heuristics and program optimizations to work efficiently.

Donald W. Loveland is a professor emeritus of computer science at Duke University who specializes in artificial intelligence. He is well known for the Davis–Putnam–Logemann–Loveland algorithm.

In the study of graph algorithms, Courcelle's theorem is the statement that every graph property definable in the monadic second-order logic of graphs can be decided in linear time on graphs of bounded treewidth. The result was first proved by Bruno Courcelle in 1990 and independently rediscovered by Borie, Parker & Tovey (1992). It is considered the archetype of algorithmic meta-theorems.

The Mexican International Conference on Artificial Intelligence (MICAI) is the name of an annual conference covering all areas of Artificial Intelligence (AI), held in Mexico. The first MICAI conference was held in 2000. The conference is attended every year by about two hundred of AI researchers and PhD students and 500−1000 local graduate students.

The size-change termination principle (SCT) guarantees termination for a computer program by proving that infinite computations always trigger infinite descent in data values that are well-founded. Size-change termination analysis utilizes this principle in order to solve the universal halting problem for a certain class of programs. When applied to general programs, the principle is intended to be used conservatively, which means that if the analysis determines that a program is terminating, the answer is sound, but a negative answer means "don't know". The decision problem for SCT is PSPACE-complete; however, there exists an algorithm that computes an approximation of the decision problem in polynomial time. Size-change analysis is applicable to both first-order and higher-order functional programs, as well as imperative programs and logic programs. The latter application preceded by four years the general formulation of the principle by Lee et al.

Deepak Kapur is a Distinguished Professor in the Department of Computer Science at the University of New Mexico.

Counterexample-guided abstraction refinement (CEGAR) is a technique for symbolic model checking. It is also applied in modal logic tableau calculi algorithms to optimise their efficiency.

References

  1. "The TPTP Problem Library for Automated Theorem Proving".
  2. Hoder, Kryštof; Voronkov, Andrei (2009). "Comparing Unification Algorithms in First-Order Theorem Proving". KI 2009: Advances in Artificial Intelligence. Lecture Notes in Computer Science. Vol. 5803. pp. 435–443. CiteSeerX   10.1.1.329.1809 . doi:10.1007/978-3-642-04617-9_55. ISBN   978-3-642-04616-2.
  3. Hurd, Joe (2003). "First-Order Proof Tactics in Higher-Order Logic Theorem Provers". Semantic Scholar. S2CID   11201048.
  4. Segre, Alberto Maria; Sturgill, David B. (1994). "Using Hundreds of Workstations to Solve First-Order Logic Problems" (PDF). AAAI-94 Proceedings.
  5. Benzmüller, Christoph; Rabe, Florian; Sutcliffe, Geoff (2008). "THF0 – The Core of the TPTP Language for Higher-Order Logic". Automated Reasoning. Lecture Notes in Computer Science. Vol. 5195. pp. 491–506. doi:10.1007/978-3-540-71070-7_41. ISBN   978-3-540-71069-1.