A discovery system is an artificial intelligence system that attempts to discover new scientific concepts or laws. The aim of discovery systems is to automate scientific data analysis and the scientific discovery process. Ideally, an artificial intelligence system should be able to search systematically through the space of all possible hypotheses and yield the hypothesis - or set of equally likely hypotheses - that best describes the complex patterns in data. [1] [2]
During the era known as the second AI summer (approximately 1978-1987), various systems akin to the era's dominant expert systems were developed to tackle the problem of extracting scientific hypotheses from data, with or without interacting with a human scientist. These systems included Autoclass, [3] Automated Mathematician, [4] [5] Eurisko, [6] which aimed at general-purpose hypothesis discovery, and more specific systems such as Dalton, which uncovers molecular properties from data.
The dream of building systems that discover scientific hypotheses was pushed to the background with the second AI winter and the subsequent resurgence of subsymbolic methods such as neural networks. Subsymbolic methods emphasize prediction over explanation, and yield models which works well but are difficult or impossible to explain which has earned them the name black box AI. A black-box model cannot be considered a scientific hypothesis, and this development has even led some researchers to suggest that the traditional aim of science - to uncover hypotheses and theories about the structure of reality - is obsolete. [7] [8] Other researchers disagree and argue that subsymbolic methods are useful in many cases, just not for generating scientific theories. [9] [10] [11]
After a couple of decades with little interest in discovery systems, the interest in using AI to uncover natural laws and scientific explanations was renewed by the work of Michael Schmidt, then a PhD student in Computational Biology at Cornell University. Schmidt and his advisor, Hod Lipson, invented Eureqa, which they described as a symbolic regression approach to "distilling free-form natural laws from experimental data". [12] This work effectively demonstrated that symbolic regression was a promising way forward for AI-driven scientific discovery.
Since 2009, symbolic regression has matured further, and today, various commercial and open source systems are actively used in scientific research. Notable examples include Eureqa, now a part of DataRobot AI Cloud Platform, AI Feynman, [13] and QLattice. [14]
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by humans or by other animals. "Intelligence" encompasses the ability to learn and to reason, to generalize, and to infer meaning. Example tasks in which this is done include speech recognition, computer vision, translation between (natural) languages, as well as other mappings of inputs.
Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.
Douglas Bruce Lenat is the CEO of Cycorp, Inc. of Austin, Texas, and has been a prominent researcher in artificial intelligence. Lenat was awarded the biannual IJCAI Computers and Thought Award in 1976 for creating the machine-learning program AM. He has worked on machine learning, knowledge representation, "cognitive economy", blackboard systems, and what he dubbed in 1984 "ontological engineering". He has also worked in military simulations, and numerous projects for US government, military, intelligence, and scientific organizations. In 1980, he published a critique of conventional random-mutation Darwinism. He authored a series of articles in the Journal of Artificial Intelligence exploring the nature of heuristic rules.
Machine learning (ML) is a field devoted to understanding and building methods that let machines "learn" – that is, methods that leverage data to improve computer performance on some set of tasks.
The Automated Mathematician (AM) is one of the earliest successful discovery systems. It was created by Douglas Lenat in Lisp, and in 1977 led to Lenat being awarded the IJCAI Computers and Thought Award.
In artificial intelligence, symbolic artificial intelligence is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic and search. Symbolic AI used tools such as logic programming, production rules, semantic nets and frames, and it developed applications such as knowledge-based systems, symbolic mathematics, automated theorem provers, ontologies, the semantic web, and automated planning and scheduling systems. The Symbolic AI paradigm led to seminal ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems.
Eurisko is a discovery system written by Douglas Lenat in RLL-1, a representation language itself written in the Lisp programming language. A sequel to Automated Mathematician, it consists of heuristics, i.e. rules of thumb, including heuristics describing how to use and change its own heuristics. Lenat was frustrated by Automated Mathematician's constraint to a single domain and so developed Eurisko; his frustration with the effort of encoding domain knowledge for Eurisko led to Lenat's subsequent development of Cyc. Lenat envisions ultimately coupling the Cyc knowledgebase with the Eurisko discovery engine.
Logic in computer science covers the overlap between the field of logic and that of computer science. The topic can essentially be divided into three main areas:
Woodrow Wilson "Woody" Bledsoe was an American mathematician, computer scientist, and prominent educator. He is one of the founders of artificial intelligence (AI), making early contributions in pattern recognition and automated theorem proving. He continued to make significant contributions to AI throughout his long career.
Version space learning is a logical approach to machine learning, specifically binary classification. Version space learning algorithms search a predefined space of hypotheses, viewed as a set of logical sentences. Formally, the hypothesis space is a disjunction
Artificial intelligence (AI) has been used in applications to alleviate certain problems throughout industry and academia. AI, like electricity or computers, is a general purpose technology that has a multitude of applications. It has been used in fields of language translation, image recognition, credit scoring, e-commerce and other domains.
An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated, the original data set is too large to process or the characteristics of the data change over time.
Eric Joel Horvitz is an American computer scientist, and Technical Fellow at Microsoft, where he serves as the company's first Chief Scientific Officer. He was previously the director of Microsoft Research Labs, including research centers in Redmond, WA, Cambridge, MA, New York, NY, Montreal, Canada, Cambridge, UK, and Bangalore, India.
Ross Donald King is a Professor of Machine Intelligence at Chalmers University of Technology.
Inductive programming (IP) is a special area of automatic programming, covering research from artificial intelligence and programming, which addresses learning of typically declarative and often recursive programs from incomplete specifications, such as input/output examples or constraints.
Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity.
GOFAI is an acronym for "Good Old-Fashioned Artificial Intelligence" invented by the philosopher John Haugeland in his 1985 book Artificial Intelligence: The Very Idea. Technically, GOFAI refers only to a restricted kind of symbolic AI, namely rule-based or logical agents. This approach was popular in the 1980s, especially as an approach to implementing expert systems, but symbolic AI has since been extended in many ways to better handle uncertain reasoning and more open-ended systems. Some of these extensions include probabilistic reasoning, non-monotonic reasoning, multi-agent systems, and neuro-symbolic systems. Significant contributions of symbolic AI, not encompassed by the GOFAI view, include search algorithms; automated planning and scheduling; constraint-based reasoning; the semantic web; ontologies; knowledge graphs; non-monotonic logic; circumscription; automated theorem proving; and symbolic mathematics. For a more complete list, see the main article on symbolic AI.
Explainable AI (XAI), also known as Interpretable AI, or Explainable Machine Learning (XML), is artificial intelligence (AI) in which humans can understand the reasoning behind decisions or predictions made by the AI. It contrasts with the "black box" concept in machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.
Neuro-symbolic AI integrates neural and symbolic AI architectures to address complementary strengths and weaknesses of each, providing a robust AI capable of reasoning, learning, and cognitive modeling. As argued by Valiant and many others, the effective construction of rich computational cognitive models demands the combination of sound symbolic reasoning and efficient machine learning models. Gary Marcus, argues that: "We cannot construct rich cognitive models in an adequate, automated way without the triumvirate of hybrid architecture, rich prior knowledge, and sophisticated techniques for reasoning." Further, "To build a robust, knowledge-driven approach to AI we must have the machinery of symbol manipulation in our toolkit. Too much useful knowledge is abstract to make do without tools that represent and manipulate abstraction, and to date, the only machinery that we know of that can manipulate such abstract knowledge reliably is the apparatus of symbol manipulation."
The QLattice is a software library which provides a framework for symbolic regression in Python. It works on Linux, Windows, and macOS. The QLattice algorithm is developed by the Danish/Spanish AI research company Abzu. Since its creation, the QLattice has attracted significant attention, mainly for the inherent explainability of the models it produces.