Prolog syntax and semantics

Last updated June 12, 2023

The syntax and semantics of Prolog, a programming language, are the sets of rules that define how a Prolog program is written and how it is interpreted, respectively. The rules are laid out in ISO standard ISO/IEC 13211 ^[1] although there are differences in the Prolog implementations.

Data types

Prolog is dynamically typed. It has a single data type, the term, which has several subtypes: atoms, numbers, variables and compound terms.

An atom is a general-purpose name with no inherent meaning. It is composed of a sequence of characters that is parsed by the Prolog reader as a single unit. Atoms are usually bare words in Prolog code, written with no special syntax. However, atoms containing spaces or certain other special characters must be surrounded by single quotes. Atoms beginning with a capital letter must also be quoted, to distinguish them from variables. The empty list, written [], is also an atom. Other examples of atoms include x, blue, 'Taco', and 'some atom'.

Numbers can be floats or integers. Many Prolog implementations also provide unbounded integers and rational numbers.

Variables are denoted by a string consisting of letters, numbers and underscore characters, and beginning with an upper-case letter or underscore. Variables closely resemble variables in logic in that they are placeholders for arbitrary terms. A variable can become instantiated (bound to equal a specific term) via unification. A single underscore (_) denotes an anonymous variable and means "any term". Unlike other variables, the underscore does not represent the same value everywhere it occurs within a predicate definition.

A compound term is composed of an atom called a "functor" and a number of "arguments", which are again terms. Compound terms are ordinarily written as a functor followed by a comma-separated list of argument terms, which is contained in parentheses. The number of arguments is called the term's arity. An atom can be regarded as a compound term with arity zero.

Examples of compound terms are truck_year('Mazda', 1986) and 'Person_Friends'(zelda,[tom,jim]). Compound terms with functors that are declared as operators can be written in prefix or infix notation. For example, the terms -(z), +(a,b) and =(X,Y) can also be written as -z, a+b and X=Y, respectively. Users can declare arbitrary functors as operators with different precedences to allow for domain-specific notations. The notation f/n is commonly used to denote a term with functor f and arity n.

Special cases of compound terms:

Lists are defined inductively: The atom [] is a list. A compound term with functor . (dot) and arity 2, whose second argument is a list, is itself a list. There exists special syntax for denoting lists: .(A, B) is equivalent to [A|B]. For example, the list .(1, .(2, .(3, []))) can also be written as [1 | [2 | [3 | []]]], or even more compactly as [1,2,3].
Strings: A sequence of characters surrounded by quotes is equivalent to a list of (numeric) character codes, generally in the local character encoding or Unicode if the system supports Unicode.

Prolog programs

Prolog programs describe relations, defined by means of clauses. Pure Prolog is restricted to Horn clauses, a Turing-complete subset of first-order predicate logic. There are two types of clauses: Facts and rules. A rule is of the form

Head:-Body.

and is read as "Head is true if Body is true". A rule's body consists of calls to predicates, which are called the rule's goals. The built-in predicate ,/2 (meaning a 2-arity operator with name ,) denotes conjunction of goals, and ;/2 denotes disjunction. Conjunctions and disjunctions can only appear in the body, not in the head of a rule.

Clauses with empty bodies are called facts. An example of a fact is:

cat(tom).

which is equivalent to the rule:

cat(tom):-true.

another example is:

Xis3+2.

and when you run it, the result will be

X=5Yes.

The built-in predicate true/0 is always true.

Evaluation

Execution of a Prolog program is initiated by the user's posting of a single goal, called the query. Logically, the Prolog engine tries to find a resolution refutation of the negated query. The resolution method used by Prolog is called SLD resolution. If the negated query can be refuted, it follows that the query, with the appropriate variable bindings in place, is a logical consequence of the program. In that case, all generated variable bindings are reported to the user, and the query is said to have succeeded. Operationally, Prolog's execution strategy can be thought of as a generalization of function calls in other languages, one difference being that multiple clause heads can match a given call. In that case, the system creates a choice-point, unifies the goal with the clause head of the first alternative, and continues with the goals of that first alternative. If any goal fails in the course of executing the program, all variable bindings that were made since the most recent choice-point was created are undone, and execution continues with the next alternative of that choice-point. This execution strategy is called chronological backtracking.

mother_child(trude,sally).father_child(tom,sally).father_child(tom,erica).father_child(mike,tom).sibling(X,Y):-parent_child(Z,X),parent_child(Z,Y).parent_child(X,Y):-father_child(X,Y).parent_child(X,Y):-mother_child(X,Y).

This results in the following query being evaluated as true:

?-sibling(sally,erica).Yes

This is obtained as follows: Initially, the only matching clause-head for the query sibling(sally, erica) is the first one, so proving the query is equivalent to proving the body of that clause with the appropriate variable bindings in place, i.e., the conjunction (parent_child(Z,sally), parent_child(Z,erica)). The next goal to be proved is the leftmost one of this conjunction, i.e., parent_child(Z, sally). Two clause heads match this goal. The system creates a choice-point and tries the first alternative, whose body is father_child(Z, sally). This goal can be proved using the fact father_child(tom, sally), so the binding Z = tom is generated, and the next goal to be proved is the second part of the above conjunction: parent_child(tom, erica). Again, this can be proved by the corresponding fact. Since all goals could be proved, the query succeeds. Since the query contained no variables, no bindings are reported to the user. A query with variables, like:

?-father_child(Father,Child).

enumerates all valid answers on backtracking.

Notice that with the code as stated above, the query ?- sibling(sally, sally). also succeeds. One would insert additional goals to describe the relevant restrictions, if desired.

Loops and recursion

Iterative algorithms can be implemented by means of recursive predicates. Prolog systems typically implement a well-known optimization technique called tail call optimization (TCO) for deterministic predicates exhibiting tail recursion or, more generally, tail calls: A clause's stack frame is discarded before performing a call in a tail position. Therefore, deterministic tail-recursive predicates are executed with constant stack space, like loops in other languages.

Cuts

A cut (!) inside a rule will prevent Prolog from backtracking any predicates behind the cut:

predicate(X):-one(X),!,two(X).

will fail if the first-found value of X for which one(X) is true leads to two(X) being false.

Anonymous variables

Anonymous variables _ are never bound to a value and can be used multiple times in a predicate.

For instance searching a list for a given value:

contains(V,[V|_]).contains(V,[_|T]):-contains(V,T).

Negation

The built-in Prolog predicate \+/1 provides negation as failure, which allows for non-monotonic reasoning. The goal \+ illegal(X) in the rule

legal(X):-\+illegal(X).

is evaluated as follows: Prolog attempts to prove the illegal(X). If a proof for that goal can be found, the original goal (i.e., \+ illegal(X)) fails. If no proof can be found, the original goal succeeds. Therefore, the \+/1 prefix operator is called the "not provable" operator, since the query ?- \+ Goal. succeeds if Goal is not provable. This kind of negation is sound if its argument is "ground" (i.e. contains no variables). Soundness is lost if the argument contains variables. In particular, the query ?- legal(X). can now not be used to enumerate all things that are legal.

Semantics

Under a declarative reading, the order of rules, and of goals within rules, is irrelevant since logical disjunction and conjunction are commutative. Procedurally, however, it is often important to take into account Prolog's execution strategy, either for efficiency reasons, or due to the semantics of impure built-in predicates for which the order of evaluation matters. Also, as Prolog interpreters try to unify clauses in the order they're provided, failing to give a correct ordering can lead to infinite recursion, as in:

predicate1(X):-predicate2(X,X).predicate2(X,Y):-predicate1(X),X\=Y.

Given this ordering, any query of the form

?-predicate1(atom).

will recur until the stack is exhausted. If, however, the last 3 lines were changed to:

predicate2(X,Y):-X\=Y,predicate1(X).

the same query would lead to a No. outcome in a very short time.

Definite clause grammars

There is a special notation called definite clause grammars (DCGs). A rule defined via -->/2 instead of :-/2 is expanded by the preprocessor (expand_term/2, a facility analogous to macros in other languages) according to a few straightforward rewriting rules, resulting in ordinary Prolog clauses. Most notably, the rewriting equips the predicate with two additional arguments, which can be used to implicitly thread state around, analogous to monads in other languages. DCGs are often used to write parsers or list generators, as they also provide a convenient interface to list differences.

Parser example

A larger example will show the potential of using Prolog in parsing.

Given the sentence expressed in Backus–Naur form:

<sentence>::=<stat_part><stat_part>::=<statement> | <stat_part><statement><statement>::=<id> = <expression> ; <expression>::=<operand> | <expression><operator><operand><operand>::=<id> | <digit><id>::=  a | b <digit>::=  0..9 <operator>::=  + | - | *

This can be written in Prolog using DCGs, corresponding to a predictive parser with one token look-ahead:

sentence(S)-->statement(S0),sentence_r(S0,S).sentence_r(S,S)-->[].sentence_r(S0,seq(S0,S))-->statement(S1),sentence_r(S1,S).statement(assign(Id,E))-->id(Id),[=],expression(E),[;].expression(E)-->term(T),expression_r(T,E).expression_r(E,E)-->[].expression_r(E0,E)-->[+],term(T),expression_r(plus(E0,T),E).expression_r(E0,E)-->[-],term(T),expression_r(minus(E0,T),E).term(T)-->factor(F),term_r(F,T).term_r(T,T)-->[].term_r(T0,T)-->[*],factor(F),term_r(times(T0,F),T).factor(id(ID))-->id(ID).factor(digit(D))-->[D],{(number(D);var(D)),between(0,9,D)}.id(a)-->[a].id(b)-->[b].

This code defines a relation between a sentence (given as a list of tokens) and its abstract syntax tree (AST). Example query:

?-phrase(sentence(AST),[a,=,1,+,3,*,b,;,b,=,0,;]).AST=seq(assign(a,plus(digit(1),times(digit(3),id(b)))),assign(b,digit(0)));

The AST is represented using Prolog terms and can be used to apply optimizations, to compile such expressions to machine-code, or to directly interpret such statements. As is typical for the relational nature of predicates, these definitions can be used both to parse and generate sentences, and also to check whether a given tree corresponds to a given list of tokens. Using iterative deepening for fair enumeration, each arbitrary but fixed sentence and its corresponding AST will be generated eventually:

?-length(Tokens,_),phrase(sentence(AST),Tokens).Tokens=[a,=,a,(;)],AST=assign(a,id(a));Tokens=[a,=,b,(;)],AST=assign(a,id(b))etc.

Related Research Articles

First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over non-logical objects, and allows the use of sentences that contain variables, so that rather than propositions such as "Socrates is a man", one can have expressions in the form "there exists x such that x is Socrates and x is a man", where "there exists" is a quantifier, while x is a variable. This distinguishes it from propositional logic, which does not use quantifiers or relations; in this sense, propositional logic is the foundation of first-order logic.

Logic programming is a programming paradigm which is largely based on formal logic. Any program written in a logic programming language is a set of sentences in logical form, expressing facts and rules about some problem domain. Major logic programming language families include Prolog, answer set programming (ASP) and Datalog. In all of these languages, rules are written in the form of clauses:

Prolog is a logic programming language associated with artificial intelligence and computational linguistics.

A proposition is a central concept in semantics, logic, philosophy of language and related fields, often characterized as the primary bearer of truth or falsity. Propositions are also often characterized as being the kind of thing that declarative sentences denote. For instance the sentence "The sky is blue" denotes the proposition that the sky is blue. However, crucially, propositions are not themselves linguistic expressions. For instance, the English sentence "Snow is white" denotes the same proposition as the German sentence "Schnee ist weiß" even though the two sentences are not the same. Similarly, propositions can also be characterized as the objects of belief and other propositional attitudes. For instance if one believes that the sky is blue, what one believes is the proposition that the sky is blue. A proposition can also be thought of as a kind of idea: Collins Dictionary has a definition for proposition as "a statement or an idea that people can consider or discuss whether it is true."

In 1983, David H. D. Warren designed an abstract machine for the execution of Prolog consisting of a memory architecture and an instruction set. This design became known as the Warren Abstract Machine (WAM) and has become the de facto standard target for Prolog compilers.

Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part.

In mathematical logic and logic programming, a Horn clause is a logical formula of a particular rule-like form which gives it useful properties for use in logic programming, formal specification, and model theory. Horn clauses are named for the logician Alfred Horn, who first pointed out their significance in 1951.

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties from Prolog. It is often used as a query language for deductive databases. Datalog has been applied to problems in data integration, networking, program analysis, and more.

In computer science, Van Wijngaarden grammars are a formalism for defining formal languages invented by Adriaan van Wijngaarden for the purpose of defining the ALGOL 68 programming language. The resulting specification remains its most notable application.

In mathematical logic, a ground term of a formal system is a term that does not contain any variables. Similarly, a ground formula is a formula that does not contain any variables.

In universal algebra and mathematical logic, a term algebra is a freely generated algebraic structure over a given signature. For example, in a signature consisting of a single binary operation, the term algebra over a set X of variables is exactly the free magma generated by X. Other synonyms for the notion include absolutely free algebra and anarchic algebra.

Answer set programming (ASP) is a form of declarative programming oriented towards difficult search problems. It is based on the stable model semantics of logic programming. In ASP, search problems are reduced to computing stable models, and answer set solvers—programs for generating stable models—are used to perform search. The computational process employed in the design of many answer set solvers is an enhancement of the DPLL algorithm and, in principle, it always terminates.

Fril is a programming language for first-order predicate calculus. It includes the semantics of Prolog as a subset, but takes its syntax from the micro-PROLOG of Logic Programming Associates and adds support for fuzzy sets, support logic, and metaprogramming.

In computer science, the syntax of a computer language is the rules that define the combinations of symbols that are considered to be correctly structured statements or expressions in that language. This applies both to programming languages, where the document represents source code, and to markup languages, where the document represents data.

A definite clause grammar (DCG) is a way of expressing grammar, either for natural or formal languages, in a logic programming language such as Prolog. It is closely related to the concept of attribute grammars / affix grammars from which Prolog was originally developed. DCGs are usually associated with Prolog, but similar languages such as Mercury also include DCGs. They are called definite clause grammars because they represent a grammar as a set of definite clauses in first-order logic.

Constraint logic programming is a form of constraint programming, in which logic programming is extended to include concepts from constraint satisfaction. A constraint logic program is a logic program that contains constraints in the body of clauses. An example of a clause including a constraint is A(X,Y):-X+Y>0,B(X),C(Y). In this clause, X+Y>0 is a constraint; A(X,Y), B(X), and C(Y) are literals as in regular logic programming. This clause states one condition under which the statement A(X,Y) holds: X+Y is greater than zero and both B(X) and C(Y) are true.

In mathematical logic, an atomic formula is a formula with no deeper propositional structure, that is, a formula that contains no logical connectives or equivalently a formula that has no strict subformulas. Atoms are thus the simplest well-formed formulas of the logic. Compound formulas are formed by combining the atomic formulas using the logical connectives.

A deductive language is a computer programming language in which the program is a collection of predicates ('facts') and rules that connect them. Such a language is used to create knowledge based systems or expert systems which can deduce answers to problem sets by applying the rules to the facts they have been given. An example of a deductive language is Prolog, or its database-query cousin, Datalog.

B-Prolog was a high-performance implementation of the standard Prolog language with several extended features including matching clauses, action rules for event handling, finite-domain constraint solving, arrays and hash tables, declarative loops, and tabling. First released in 1994, B-Prolog is now a widely used CLP system. The constraint solver of B-Prolog was ranked top in two categories in the Second International Solvers Competition, and it also took the second place in P class in the second ASP solver competition and the second place overall in the third ASP solver competition. B-Prolog underpins the PRISM system, a logic-based probabilistic reasoning and learning system. B-Prolog is a commercial product, but it can be used for learning and non-profit research purposes free of charge. B-Prolog is not anymore actively developed, but it forms the basis for the Picat programming language.

Logic programming is a programming paradigm that includes languages based on formal logic, including Datalog and Prolog. This article describes the syntax and semantics of the purely declarative subset of these languages. Confusingly, the name "logic programming" also refers to a specific programming language that roughly corresponds to the declarative subset of Prolog. Unfortunately, the term must be used in both senses in this article.

References

↑ ISO/IEC 13211: Information technology — Programming languages — Prolog. International Organization for Standardization, Geneva.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] ISO/IEC 13211: Information technology — Programming languages — Prolog. International Organization for Standardization, Geneva.

[1]