Discovery system (AI research)

Last updated

A discovery system is an artificial intelligence system that attempts to discover new scientific concepts or laws. The aim of discovery systems is to automate scientific data analysis and the scientific discovery process. Ideally, an artificial intelligence system should be able to search systematically through the space of all possible hypotheses and yield the hypothesis - or set of equally likely hypotheses - that best describes the complex patterns in data. [1] [2]

Contents

During the era known as the second AI summer (approximately 1978-1987), various systems akin to the era's dominant expert systems were developed to tackle the problem of extracting scientific hypotheses from data, with or without interacting with a human scientist. These systems included Autoclass, [3] Automated Mathematician, [4] [5] Eurisko, [6] which aimed at general-purpose hypothesis discovery, and more specific systems such as Dalton, which uncovers molecular properties from data.

The dream of building systems that discover scientific hypotheses was pushed to the background with the second AI winter and the subsequent resurgence of subsymbolic methods such as neural networks. Subsymbolic methods emphasize prediction over explanation, and yield models which works well but are difficult or impossible to explain which has earned them the name black box AI. A black-box model cannot be considered a scientific hypothesis, and this development has even led some researchers to suggest that the traditional aim of science - to uncover hypotheses and theories about the structure of reality - is obsolete. [7] [8] Other researchers disagree and argue that subsymbolic methods are useful in many cases, just not for generating scientific theories. [9] [10] [11]

Discovery systems from the 1970s and 1980s

Modern discovery systems (2009–present)

After a couple of decades with little interest in discovery systems, the interest in using AI to uncover natural laws and scientific explanations was renewed by the work of Michael Schmidt, then a PhD student in Computational Biology at Cornell University. Schmidt and his advisor, Hod Lipson, invented Eureqa, which they described as a symbolic regression approach to "distilling free-form natural laws from experimental data". [12] This work effectively demonstrated that symbolic regression was a promising way forward for AI-driven scientific discovery.

Since 2009, symbolic regression has matured further, and today, various commercial and open source systems are actively used in scientific research. Notable examples include Eureqa, now a part of DataRobot AI Cloud Platform, AI Feynman, [13] and QLattice. [14]

Related Research Articles

<span class="mw-page-title-main">Artificial intelligence</span> Ability of systems to perceive, synthesize, and infer information

Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by humans or by other animals. "Intelligence" encompasses the ability to learn and to reason, to generalize, and to infer meaning. Example tasks in which this is done include speech recognition, computer vision, translation between (natural) languages, as well as other mappings of inputs.

Inductive logic programming (ILP) is a subfield of symbolic artificial intelligence which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

<span class="mw-page-title-main">Douglas Lenat</span> American entrepreneur and researcher in artificial intelligence

Douglas Bruce Lenat is the CEO of Cycorp, Inc. of Austin, Texas, and has been a prominent researcher in artificial intelligence. Lenat was awarded the biannual IJCAI Computers and Thought Award in 1976 for creating the machine-learning program AM. He has worked on machine learning, knowledge representation, "cognitive economy", blackboard systems, and what he dubbed in 1984 "ontological engineering". He has also worked in military simulations, and numerous projects for US government, military, intelligence, and scientific organizations. In 1980, he published a critique of conventional random-mutation Darwinism. He authored a series of articles in the Journal of Artificial Intelligence exploring the nature of heuristic rules.

<span class="mw-page-title-main">Machine learning</span> Study of algorithms that improve automatically through experience

Machine learning (ML) is a field devoted to understanding and building methods that let machines "learn" – that is, methods that leverage data to improve computer performance on some set of tasks.

The Automated Mathematician (AM) is one of the earliest successful discovery systems. It was created by Douglas Lenat in Lisp, and in 1977 led to Lenat being awarded the IJCAI Computers and Thought Award.

<span class="mw-page-title-main">Symbolic artificial intelligence</span> Methods in artificial intelligence research

In artificial intelligence, symbolic artificial intelligence is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic and search. Symbolic AI used tools such as logic programming, production rules, semantic nets and frames, and it developed applications such as knowledge-based systems, symbolic mathematics, automated theorem provers, ontologies, the semantic web, and automated planning and scheduling systems. The Symbolic AI paradigm led to seminal ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems.

Eurisko is a discovery system written by Douglas Lenat in RLL-1, a representation language itself written in the Lisp programming language. A sequel to Automated Mathematician, it consists of heuristics, i.e. rules of thumb, including heuristics describing how to use and change its own heuristics. Lenat was frustrated by Automated Mathematician's constraint to a single domain and so developed Eurisko; his frustration with the effort of encoding domain knowledge for Eurisko led to Lenat's subsequent development of Cyc. Lenat envisions ultimately coupling the Cyc knowledgebase with the Eurisko discovery engine.

<span class="mw-page-title-main">Logic in computer science</span> Academic discipline

Logic in computer science covers the overlap between the field of logic and that of computer science. The topic can essentially be divided into three main areas:

<span class="mw-page-title-main">Woody Bledsoe</span> American mathematician and computer scientist

Woodrow Wilson "Woody" Bledsoe was an American mathematician, computer scientist, and prominent educator. He is one of the founders of artificial intelligence (AI), making early contributions in pattern recognition and automated theorem proving. He continued to make significant contributions to AI throughout his long career.

<span class="mw-page-title-main">Version space learning</span>

Version space learning is a logical approach to machine learning, specifically binary classification. Version space learning algorithms search a predefined space of hypotheses, viewed as a set of logical sentences. Formally, the hypothesis space is a disjunction

<span class="mw-page-title-main">Applications of artificial intelligence</span> Applications of intelligence exhibited by machines

Artificial intelligence (AI) has been used in applications to alleviate certain problems throughout industry and academia. AI, like electricity or computers, is a general purpose technology that has a multitude of applications. It has been used in fields of language translation, image recognition, credit scoring, e-commerce and other domains.

An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated, the original data set is too large to process or the characteristics of the data change over time.

<span class="mw-page-title-main">Eric Horvitz</span> American computer scientist, and Technical Fellow at Microsoft

Eric Joel Horvitz is an American computer scientist, and Technical Fellow at Microsoft, where he serves as the company's first Chief Scientific Officer. He was previously the director of Microsoft Research Labs, including research centers in Redmond, WA, Cambridge, MA, New York, NY, Montreal, Canada, Cambridge, UK, and Bangalore, India.

<span class="mw-page-title-main">Ross D. King</span> Professor at the University of Manchester

Ross Donald King is a Professor of Machine Intelligence at Chalmers University of Technology.

Inductive programming (IP) is a special area of automatic programming, covering research from artificial intelligence and programming, which addresses learning of typically declarative and often recursive programs from incomplete specifications, such as input/output examples or constraints.

<span class="mw-page-title-main">Symbolic regression</span> Type of regression analysis

Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity.

GOFAI is an acronym for "Good Old-Fashioned Artificial Intelligence" invented by the philosopher John Haugeland in his 1985 book Artificial Intelligence: The Very Idea. Technically, GOFAI refers only to a restricted kind of symbolic AI, namely rule-based or logical agents. This approach was popular in the 1980s, especially as an approach to implementing expert systems, but symbolic AI has since been extended in many ways to better handle uncertain reasoning and more open-ended systems. Some of these extensions include probabilistic reasoning, non-monotonic reasoning, multi-agent systems, and neuro-symbolic systems. Significant contributions of symbolic AI, not encompassed by the GOFAI view, include search algorithms; automated planning and scheduling; constraint-based reasoning; the semantic web; ontologies; knowledge graphs; non-monotonic logic; circumscription; automated theorem proving; and symbolic mathematics. For a more complete list, see the main article on symbolic AI.

<span class="mw-page-title-main">Explainable artificial intelligence</span> AI in which the results of the solution can be understood by humans

Explainable AI (XAI), also known as Interpretable AI, or Explainable Machine Learning (XML), is artificial intelligence (AI) in which humans can understand the reasoning behind decisions or predictions made by the AI. It contrasts with the "black box" concept in machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.

Neuro-symbolic AI integrates neural and symbolic AI architectures to address complementary strengths and weaknesses of each, providing a robust AI capable of reasoning, learning, and cognitive modeling. As argued by Valiant and many others, the effective construction of rich computational cognitive models demands the combination of sound symbolic reasoning and efficient machine learning models. Gary Marcus, argues that: "We cannot construct rich cognitive models in an adequate, automated way without the triumvirate of hybrid architecture, rich prior knowledge, and sophisticated techniques for reasoning." Further, "To build a robust, knowledge-driven approach to AI we must have the machinery of symbol manipulation in our toolkit. Too much useful knowledge is abstract to make do without tools that represent and manipulate abstraction, and to date, the only machinery that we know of that can manipulate such abstract knowledge reliably is the apparatus of symbol manipulation."

The QLattice is a software library which provides a framework for symbolic regression in Python. It works on Linux, Windows, and macOS. The QLattice algorithm is developed by the Danish/Spanish AI research company Abzu. Since its creation, the QLattice has attracted significant attention, mainly for the inherent explainability of the models it produces.

References

  1. Shen, Wei-Min (1990). "Functional transformations in AI discovery systems". Artificial Intelligence. 41 (3): 257–272. doi:10.1016/0004-3702(90)90045-2. S2CID   7219589.
  2. Gil, Yolanda; Greaves, Mark; Hendler, James; Hirsh, Haym (2014-10-10). "Amplify scientific discovery with artificial intelligence". Science. 346 (6206): 171–172. Bibcode:2014Sci...346..171G. doi:10.1126/science.1259439. PMID   25301606. S2CID   206561353.
  3. 1 2 Cheeseman, Peter; Kelly, James; Self, Matthew; Stutz, John; Taylor, Will; Freeman, Don (1988-01-01). Laird, John (ed.). AutoClass: A Bayesian Classification System. Machine Learning Proceedings 1988. San Francisco: Morgan Kaufmann. pp. 54–64. doi:10.1016/b978-0-934613-64-4.50011-6. ISBN   978-0-934613-64-4 . Retrieved 2022-07-24.
  4. Ritchie, G.D.; Hanna, F.K. (August 1984). "AM: A case study in AI methodology". Artificial Intelligence. 23 (3): 249–268. doi:10.1016/0004-3702(84)90015-8.
  5. Lenat, Douglas Bruce (1976). Am: An artificial intelligence approach to discovery in mathematics as heuristic search (Thesis).
  6. Henderson, Harry (2007). "The Automated Mathematician". Artificial Intelligence: Mirrors for the Mind. Milestones in Discovery and Invention. Infobase Publishing. pp. 93–94. ISBN   9781604130591.
  7. Anderson, Chris. "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete". Wired. Retrieved 2022-07-24.
  8. Vutha, Amar. "Could machine learning mean the end of understanding in science?". The Conversation. Retrieved 2022-07-24.
  9. Canca, Cansu (2018-08-28). "Machine Learning as the Enemy of Science? Not Really". Bill of Health. Retrieved 2022-07-24.
  10. Wilstrup, Casper Skern (2022-01-30). "Are we replacing science with an AI oracle?". Medium. Retrieved 2022-07-24.
  11. Christiansen, Michael; Wilstrup, Casper; Hedley, Paula L. (2022-06-28). "Explainable "white-box" machine learning is the way forward in preeclampsia screening". American Journal of Obstetrics & Gynecology. 227 (5): 791. doi: 10.1016/j.ajog.2022.06.057 . PMID   35779588. S2CID   250160871.
  12. Schmidt, Michael; Lipson, Hod (2009-04-03). "Distilling Free-Form Natural Laws from Experimental Data". Science. 324 (5923): 81–85. Bibcode:2009Sci...324...81S. doi:10.1126/science.1165893. PMID   19342586. S2CID   7366016.
  13. Udrescu, Silviu-Marian; Tegmark, Max (2020-04-17). "AI Feynman: A physics-inspired method for symbolic regression". Science Advances. 6 (16): eaay2631. arXiv: 1905.11481 . Bibcode:2020SciA....6.2631U. doi:10.1126/sciadv.aay2631. PMC   7159912 . PMID   32426452.
  14. Broløs, Kevin René; Machado, Meera Vieira; Cave, Chris; Kasak, Jaan; Stentoft-Hansen, Valdemar; Batanero, Victor Galindo; Jelen, Tom; Wilstrup, Casper (2021-04-12). "An Approach to Symbolic Regression Using Feyn". arXiv: 2104.05417 [cs.LG].