POPLmark challenge

Last updated

In programming language theory, the POPLmark challenge (from "Principles of Programming Languages benchmark", formerly Mechanized Metatheory for the Masses!) (Aydemir, 2005) is a set of benchmarks designed to evaluate the state of automated reasoning (or mechanization) in the metatheory of programming languages, and to stimulate discussion and collaboration among a diverse cross section of the formal methods community. Very loosely speaking, the challenge is about measurement of how well programs may be proven to match a specification of how they are intended to behave (and the many complex issues that this involves). The challenge was initially proposed by the members of the PL club at the University of Pennsylvania, in association with collaborators around the world. The Workshop on Mechanized Metatheory is the main meeting of researchers participating in the challenge.

Contents

The design of the POPLmark benchmark is guided by features common to reasoning about programming languages. The challenge problems do not require the formalisation of large programming languages, but they do require sophistication in reasoning about:

Binding
Most programming languages have some form of binding, ranging in complexity from the simple binders of simply typed lambda calculus to complex, potentially infinite binders needed in the treatment of record patterns.
Induction
Properties such as subject reduction and strong normalisation often require complex induction arguments.
Reuse
Furthering collaboration being a key aim of the challenge, the solutions are expected to contain reusable components that would allow researchers to share language features and designs without requiring them to start from scratch every time.

The problems

As of 2007, the POPLmark challenge is composed of three parts. Part 1 concerns solely the types of System F<: (System F with subtyping), and has problems such as:

  1. Checking that the type system admits transitivity of subtyping.
  2. Checking the transitivity of subtyping in the presence of records

Part 2 concerns the syntax and semantics of System F<:. It concerns proofs of

  1. Type safety for the pure fragment
  2. Type safety in the presence of pattern matching

Part 3 concerns the usability of the formalisation of System F<:. In particular, the challenge asks for:

  1. Simulating and animating the operational semantics
  2. Extracting useful algorithms from the formalisations

Several solutions have been proposed for parts of the POPLmark challenge, using following tools: Isabelle/HOL, Twelf, Coq, αProlog, ATS, Abella and Matita.

See also

Related Research Articles

Logic programming is a programming, database and knowledge-representation and reasoning paradigm which is based on formal logic. A program, database or knowledge base in a logic programming language is a set of sentences in logical form, expressing facts and rules about some problem domain. Major logic programming language families include Prolog, Answer Set Programming (ASP) and Datalog. In all of these languages, rules are written in the form of clauses:

In computer science, denotational semantics is an approach of formalizing the meanings of programming languages by constructing mathematical objects that describe the meanings of expressions from the languages. Other approaches providing formal semantics of programming languages include axiomatic semantics and operational semantics.

Proof theory is a major branch of mathematical logic and theoretical computer science within which proofs are treated as formal mathematical objects, facilitating their analysis by mathematical techniques. Proofs are typically presented as inductively-defined data structures such as lists, boxed lists, or trees, which are constructed according to the axioms and rules of inference of a given logical system. Consequently, proof theory is syntactic in nature, in contrast to model theory, which is semantic in nature.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term. Type systems formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components.

In programming language theory, subtyping is a form of type polymorphism. A subtype is a datatype that is related to another datatype by some notion of substitutability, meaning that program elements, written to operate on elements of the supertype, can also operate on elements of the subtype.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

In programming language theory and type theory, polymorphism is the use of a single symbol to represent multiple different types.

Metalogic is the study of the metatheory of logic. Whereas logic studies how logical systems can be used to construct valid and sound arguments, metalogic studies the properties of logical systems. Logic concerns the truths that may be derived using a logical system; metalogic concerns the truths that may be derived about the languages and systems that are used to express truths.

A formal system is an abstract structure or formalization of an axiomatic system used for inferring theorems from axioms by a set of inference rules.

In artificial intelligence (AI), commonsense reasoning is a human-like ability to make presumptions about the type and essence of ordinary situations humans encounter every day. These assumptions include judgments about the nature of physical objects, taxonomic properties, and peoples' intentions. A device that exhibits commonsense reasoning might be capable of drawing conclusions that are similar to humans' folk psychology and naive physics.

Constraint Handling Rules (CHR) is a declarative, rule-based programming language, introduced in 1991 by Thom Frühwirth at the time with European Computer-Industry Research Centre (ECRC) in Munich, Germany. Originally intended for constraint programming, CHR finds applications in grammar induction, type systems, abductive reasoning, multi-agent systems, natural language processing, compilation, scheduling, spatial-temporal reasoning, testing, and verification.

Twelf is an implementation of the logical framework LF developed by Frank Pfenning and Carsten Schürmann at Carnegie Mellon University. It is used for logic programming and for the formalization of programming language theory.

In object-oriented programming, inheritance is the mechanism of basing an object or class upon another object or class, retaining similar implementation. Also defined as deriving new classes from existing ones such as super class or base class and then forming them into a hierarchy of classes. In most class-based object-oriented languages like C++, an object created through inheritance, a "child object", acquires all the properties and behaviors of the "parent object", with the exception of: constructors, destructors, overloaded operators and friend functions of the base class. Inheritance allows programmers to create classes that are built upon existing classes, to specify a new implementation while maintaining the same behaviors, to reuse code and to independently extend original software via public classes and interfaces. The relationships of objects or classes through inheritance give rise to a directed acyclic graph.

In computer science, higher-order abstract syntax is a technique for the representation of abstract syntax trees for languages with variable binders.

Executable UML is both a software development method and a highly abstract software language. It was described for the first time in 2002 in the book "Executable UML: A Foundation for Model-Driven Architecture". The language "combines a subset of the UML graphical notation with executable semantics and timing rules." The Executable UML method is the successor to the Shlaer–Mellor method.

In generic programming, a concept is a description of supported operations on a type, including syntax and semantics. In this way, concepts are related to abstract types but concepts do not require a subtype relationship.

In mathematical logic, the De Bruijn index is a tool invented by the Dutch mathematician Nicolaas Govert de Bruijn for representing terms of lambda calculus without naming the bound variables. Terms written using these indices are invariant with respect to α-conversion, so the check for α-equivalence is the same as that for syntactic equality. Each De Bruijn index is a natural number that represents an occurrence of a variable in a λ-term, and denotes the number of binders that are in scope between that occurrence and its corresponding binder. The following are some examples:

A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with. The inference rules are commonly specified by means of an ontology language, and often a description logic language. Many reasoners use first-order predicate logic to perform reasoning; inference commonly proceeds by forward chaining and backward chaining. There are also examples of probabilistic reasoners, including non-axiomatic reasoning systems, and probabilistic logic networks.

Stephanie Weirich is an American computer scientist specializing in type theory, type inference, dependent types, and functional programming. She is a professor of computer science at the University of Pennsylvania.

Tabling is a technique first developed for natural language processing, where it was called Earley parsing. It consists of storing in a table partial successful analyses that might come in handy for future reuse.

References