Provenance Markup Language

Last updated

The Provenance Markup Language (abbreviated PML; originally called Proof Markup Language) is an interlingua for representing and sharing knowledge about how information published on the Web was asserted from information sources and/or derived from Web information by intelligent agents. The language was initially developed in support of DARPA Agent Markup Language with a goal of explaining how automated theorem provers (ATP) derive conclusions from a set of axioms. Information, inference steps, inference rules, and agents are the three main building blocks of the language. In the context of an inference step, information can play the role of antecedent (also called premise) and conclusion. Information can also play the role of axiom that is basically a conclusion with no antecedents. PML uses the broad philosophical definition of agent as opposed to any other more specific definition of agent.

Contents

The use of PML in subsequent projects evolved the language in new directions broadening its capability to represent provenance knowledge beyond the realm of ATPs and automated reasoning. The original set of requirements were relaxed to include the following: information originally represented as logical sentences in the Knowledge Interchange Format were allowed to be information written in any language including the English language; and inference rules originally defined as patterns over antecedents and conclusions of inference steps were allowed to be underspecified as long as they were identified and named. These relaxations were essential to explain how knowledge is extracted from text through the use of information extraction components. Enhancements were also required to further understand motivation behind the need of automated theorem provers to derive conclusions: new capabilities were added to annotate how information playing the role of axioms were attributes as assertions from information sources; and the notion of questions and answers were introduced to the language to explain to a third-party agent why an automated theorem prover was used to prove a theorem (i.e., an answer) from a given set of axioms.

Development history

The first version of PML (PML1) was developed at Stanford University's Knowledge Systems Laboratory in 2003 and was originally co-authored by Paulo Pinheiro, Deborah McGuinness, and Richard Fikes. [1] The second version of PML (PML2) developed in 2007 modularized PML1 into three modules to reduce maintenance and reuse cost: provenance, justification, and trust relations. [2] A new version of PML (PML3) based on World Wide Web Consortium's PROV is under development. [3]

Related Research Articles

Automated theorem proving is a subfield of automated reasoning and mathematical logic dealing with proving mathematical theorems by computer programs. Automated reasoning over mathematical proof was a major impetus for the development of computer science.

Knowledge representation and reasoning is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medical condition or having a dialog in a natural language. Knowledge representation incorporates findings from psychology about how humans solve problems and represent knowledge in order to design formalisms that will make complex systems easier to design and build. Knowledge representation and reasoning also incorporates findings from logic to automate various kinds of reasoning, such as the application of rules or the relations of sets and subsets.

Theorem In mathematics, a statement that has been proved

In mathematics, a theorem is a statement that has been proved, or can be proved. The proof of a theorem is a logical argument that uses the inference rules of a deductive system to establish that the theorem is a logical consequence of the axioms and previously proved theorems.

Planner is a programming language designed by Carl Hewitt at MIT, and first published in 1969. First, subsets such as Micro-Planner and Pico-Planner were implemented, and then essentially the whole language was implemented as Popler by Julian Davies at the University of Edinburgh in the POP-2 programming language. Derivations such as QA4, Conniver, QLISP and Ether were important tools in artificial intelligence research in the 1970s, which influenced commercial developments such as Knowledge Engineering Environment (KEE) and Automated Reasoning Tool (ART).

Gödel's incompleteness theorems are two theorems of mathematical logic that are concerned with the limits of provability in formal axiomatic theories. These results, published by Kurt Gödel in 1931, are important both in mathematical logic and in the philosophy of mathematics. The theorems are widely, but not universally, interpreted as showing that Hilbert's program to find a complete and consistent set of axioms for all mathematics is impossible.

OIL can be regarded as an ontology infrastructure for the Semantic Web. OIL is based on concepts developed in Description Logic (DL) and frame-based systems and is compatible with RDFS.

The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.

In the philosophy of logic, a rule of inference, inference rule or transformation rule is a logical form consisting of a function which takes premises, analyzes their syntax, and returns a conclusion. For example, the rule of inference called modus ponens takes two premises, one in the form "If p then q" and another in the form "p", and returns the conclusion "q". The rule is valid with respect to the semantics of classical logic, in the sense that if the premises are true, then so is the conclusion.

Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word infer means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that in Europe dates at least to Aristotle. Deduction is inference deriving logical conclusions from premises known or assumed to be true, with the laws of valid inference being studied in logic. Induction is inference from particular evidence to a universal conclusion. A third type of inference is sometimes distinguished, notably by Charles Sanders Peirce, contradistinguishing abduction from induction.

In the field of artificial intelligence, an inference engine is a component of the system that applies logical rules to the knowledge base to deduce new information. The first inference engines were components of expert systems. The typical expert system consisted of a knowledge base and an inference engine. The knowledge base stored facts about the world. The inference engine applies logical rules to the knowledge base and deduced new knowledge. This process would iterate as each new fact in the knowledge base could trigger additional rules in the inference engine. Inference engines work primarily in one of two modes either special rule or facts: forward chaining and backward chaining. Forward chaining starts with the known facts and asserts new facts. Backward chaining starts with goals, and works backward to determine what facts must be asserted so that the goals can be achieved.

Backward chaining is an inference method described colloquially as working backward from the goal. It is used in automated theorem provers, inference engines, proof assistants, and other artificial intelligence applications.

Logic in computer science Academic discipline

Logic in computer science covers the overlap between the field of logic and that of computer science. The topic can essentially be divided into three main areas:

Condensed detachment is a method of finding the most general possible conclusion given two formal logical statements. It was developed by the Irish logician Carew Meredith in the 1950s and inspired by the work of Łukasiewicz.

Deborah Louise McGuinness is an American computer scientist and researcher at Rensselaer Polytechnic Institute (RPI). She is a Professor of Computer, Cognitive and Web Sciences, Industrial and Systems Engineering, and an endowed chair in the Tetherless World Constellation, a multidisciplinary research institution within RPI that focuses on the study of theories, methods and applications of the World Wide Web. Her fields of expertise include interdisciplinary data integration, artificial intelligence, specifically in knowledge representation and reasoning, description logics, the semantic web, explanation, and trust.

In computer science, in particular in knowledge representation and reasoning and metalogic, the area of automated reasoning is dedicated to understanding different aspects of reasoning. The study of automated reasoning helps produce computer programs that allow computers to reason completely, or nearly completely, automatically. Although automated reasoning is considered a sub-field of artificial intelligence, it also has connections with theoretical computer science and philosophy.

Paulo Pinheiro is a Brazilian American computer scientist working in the areas of provenance and semantic web in support of sciences. Pinheiro has been a research scientist at the Rensselaer Polytechnic Institute's Tetherless World Constellation since 2013. Between 2011 and 2013, he was a staff scientist at the U.S. Department of Energy's Pacific Northwest National Laboratory. Between 2006 and 2012, he was an associate professor of computer science at the University of Texas at El Paso. Pinheiro is from a long line of scientists and engineers: his father is a retired professor of material sciences at the Universidade Federal de Minas Gerais; his paternal grandfather was the founding Mine Superintendent of Vale S.A., the second largest mining company in the world and Israel Pinheiro da Silva, a great-grand-uncle, was the chief engineer responsible for the construction of Brasilia, the capital of Brazil.

In mathematical logic, a judgment or assertion is a statement or enunciation in the metalanguage. For example, typical judgments in first-order logic would be that a string is a well-formed formula, or that a proposition is true. Similarly, a judgment may assert the occurrence of a free variable in an expression of the object language, or the provability of a proposition. In general, a judgment may be any inductively definable assertion in the metatheory.

In information technology a reasoning system is a software system that generates conclusions from available knowledge using logical techniques such as deduction and induction. Reasoning systems play an important role in the implementation of artificial intelligence and knowledge-based systems.

The U.S. Global Change Research Program (USGCRP) develops and curates the Global Change Information System (GCIS) to establish "data interfaces and interoperable repositories of climate and global change data which can be easily and efficiently accessed, integrated with other data sets, maintained and expanded over time." The initial focus of GCIS is to support the United States Third National Climate Assessment (NCA3), which is to publish reports that enhance the transparency and ability of decision-makers to understand the conclusions and use of the underlying data for their own purposes.

References

  1. Paulo Pinheiro da Silva, Deborah L. McGuinness, Richard Fikes. A Proof Markup Language for Semantic Web Services. Information Systems 31(4-5): 381–395, 2006
  2. Deborah L. McGuinness, Li Ding, Paulo Pinheiro da Silva, Cynthia Chang. PML2: A Modular Explanation Interlingua. In Proceedings of the AAAI 2007 Workshop on Explanation-aware Computing, Vancouver, British Columbia, Canada, July 22–23, 2007
  3. "inference-web.org-main page Resources and Information". inference-web.org.