Dendral

Last updated

Dendral was a project in artificial intelligence (AI) of the 1960s, and the computer software expert system that it produced. Its primary aim was to study hypothesis formation and discovery in science. For that, a specific task in science was chosen: help organic chemists in identifying unknown organic molecules, by analyzing their mass spectra and using knowledge of chemistry. [1] It was done at Stanford University by Edward Feigenbaum, Bruce G. Buchanan, [2] Joshua Lederberg, and Carl Djerassi, along with a team of highly creative research associates and students. [3] It began in 1965 and spans approximately half the history of AI research. [4]

Contents

The software program Dendral is considered the first expert system because it automated the decision-making process and problem-solving behavior of organic chemists. [1] The project consisted of research on two main programs Heuristic Dendral and Meta-Dendral, [4] and several sub-programs. It was written in the Lisp programming language, which was considered the language of AI because of its flexibility. [1]

Many systems were derived from Dendral, including MYCIN, MOLGEN, PROSPECTOR, XCON, and STEAMER. There are many other programs today for solving the mass spectrometry inverse problem, see List of mass spectrometry software, but they are no longer described as 'artificial intelligence', just as structure searchers.

The name Dendral is an acronym of the term "Dendritic Algorithm". [4]

Heuristic Dendral

Heuristic Dendral is a program that uses mass spectra or other experimental data together with a knowledge base of chemistry to produce a set of possible chemical structures that may be responsible for producing the data. [4] A mass spectrum of a compound is produced by a mass spectrometer, and is used to determine its molecular weight, the sum of the masses of its atomic constituents. For example, the compound water (H2O), has a molecular weight of 18 since hydrogen has a mass of 1.01 and oxygen 16.00, and its mass spectrum has a peak at 18 units. Heuristic Dendral would use this input mass and the knowledge of atomic mass numbers and valence rules, to determine the possible combinations of atomic constituents whose mass would add up to 18. [1] As the weight increases and the molecules become more complex, the number of possible compounds increases drastically. Thus, a program that is able to reduce this number of candidate solutions through the process of hypothesis formation is essential.

New graph-theoretic algorithms were invented by Lederberg, Harold Brown, and others that generate all graphs with a specified set of nodes and connection-types (chemical atoms and bonds) -- with or without cycles. Moreover, the team was able to prove mathematically that the generator is complete, in that it produces all graphs with the specified nodes and edges, and that it is non-redundant, in that the output contains no equivalent graphs (e.g., mirror images). The CONGEN program, as it became known, was developed largely by computational chemists Ray Carhart, Jim Nourse, and Dennis Smith. It was useful to chemists as a stand-alone program to generate chemical graphs showing a complete list of structures that satisfy the constraints specified by a user.

Meta-Dendral

Meta-Dendral is a machine learning system that receives the set of possible chemical structures and corresponding mass spectra as input, and proposes a set of rules of mass spectrometry that correlate structural features with processes that produce the mass spectrum. [4] These rules would be fed back to Heuristic Dendral (in the planning and testing programs described below) to test their applicability. [1] Thus, "Heuristic Dendral is a performance system and Meta-Dendral is a learning system". [4] The program is based on two important features: the plan-generate-test paradigm and knowledge engineering. [4]

Plan-generate-test paradigm

The plan-generate-test paradigm is the basic organization of the problem-solving method, and is a common paradigm used by both Heuristic Dendral and Meta-Dendral systems. [4] The generator (later named CONGEN) generates potential solutions for a particular problem, which are then expressed as chemical graphs in Dendral. [4] However, this is feasible only when the number of candidate solutions is minimal. When there are large numbers of possible solutions, Dendral has to find a way to put constraints that rules out large sets of candidate solutions. [4] This is the primary aim of Dendral planner, which is a “hypothesis-formation” program that employs “task-specific knowledge to find constraints for the generator”. [4] Last but not least, the tester analyzes each proposed candidate solution and discards those that fail to fulfill certain criteria. [4] This mechanism of plan-generate-test paradigm is what holds Dendral together. [4]

Knowledge Engineering

The primary aim of knowledge engineering is to attain a productive interaction between the available knowledge base and problem solving techniques. [4] This is possible through development of a procedure in which large amounts of task-specific information is encoded into heuristic programs. [4] Thus, the first essential component of knowledge engineering is a large “knowledge base.” Dendral has specific knowledge about the mass spectrometry technique, a large amount of information that forms the basis of chemistry and graph theory, and information that might be helpful in finding the solution of a particular chemical structure elucidation problem. [4] This “knowledge base” is used both to search for possible chemical structures that match the input data, and to learn new “general rules” that help prune searches. The benefit Dendral provides the end user, even a non-expert, is a minimized set of possible solutions to check manually.

Heuristics

A heuristic is a rule of thumb, an algorithm that does not guarantee a solution, but reduces the number of possible solutions by discarding unlikely and irrelevant solutions. [1] The use of heuristics to solve problems is called "heuristics programming", and was used in Dendral to allow it to replicate in machines the process through which human experts induce the solution to problems via rules of thumb and specific information.

Heuristics programming was a major approach and a giant step forward in artificial intelligence, [4] as it allowed scientists to finally automate certain traits of human intelligence. It became prominent among scientists in the late 1940s through George Polya’s book, How to Solve It: A New Aspect of Mathematical Method. [1] As Herbert A. Simon said in The Sciences of the Artificial, "if you take a heuristic conclusion as certain, you may be fooled and disappointed; but if you neglect heuristic conclusions altogether you will make no progress at all."

History

During the mid 20th century, the question "can machines think?" became intriguing and popular among scientists, primarily to add humanistic characteristics to machine behavior. John McCarthy, who was one of the prime researchers of this field, termed this concept of machine intelligence as "artificial intelligence" (AI) during the Dartmouth summer in 1956. AI is usually defined as the capacity of a machine to perform operations that are analogous to human cognitive capabilities. [5] Much research to create AI was done during the 20th century.

Also around the mid 20th century, science, especially biology, faced a fast-increasing need to develop a "man-computer symbiosis", to aid scientists in solving problems. [6] For example, the structural analysis of myogoblin, hemoglobin, and other proteins relentlessly needed instrumentation development due to its complexity.

In the early 1960s, Joshua Lederberg started working with computers and quickly became tremendously interested in creating interactive computers to help him in his exobiology research. [1] Specifically, he was interested in designing computing systems to help him study alien organic compounds. [1] As he was not an expert in either chemistry or computer programming, he collaborated with Stanford chemist Carl Djerassi to help him with chemistry, and Edward Feigenbaum with programming, to automate the process of determining chemical structures from raw mass spectrometry data. [1] Feigenbaum was an expert in programming languages and heuristics, and helped Lederberg design a system that replicated the way Djerassi solved structure elucidation problems. [1] They devised a system called Dendritic Algorithm (Dendral) that was able to generate possible chemical structures corresponding to the mass spectrometry data as an output. [1]

Dendral then was still very inaccurate in assessing spectra of ketones, alcohols, and isomers of chemical compounds. [1] Thus, Djerassi "taught" general rules to Dendral that could help eliminate most of the "chemically implausible" structures, and produce a set of structures that could now be analyzed by a "non-expert" user to determine the right structure. [1]

The Dendral team recruited Bruce Buchanan to extend the Lisp program initially written by Georgia Sutherland. [1] Buchanan had similar ideas to Feigenbaum and Lederberg, but his special interests were scientific discovery and hypothesis formation. [1] As Joseph November said in Digitizing Life: The Introduction of Computers to Biology and Medicine, "(Buchanan) wanted the system (Dendral) to make discoveries on its own, not just help humans make them". Buchanan, Lederberg and Feigenbaum designed "Meta-Dendral", which was a "hypothesis maker". [1] Heuristic Dendral "would serve as a template for similar knowledge-based systems in other areas" rather than just concentrating in the field of organic chemistry. Meta-Dendral was a model for knowledge-rich learning systems that was later codified in Tom Mitchell's influential Version Space Model of learning. [1]

Notes

Related Research Articles

<span class="mw-page-title-main">Expert system</span> Computer system emulating the decision-making ability of a human expert

In artificial intelligence (AI), an expert system is a computer system emulating the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning through bodies of knowledge, represented mainly as if–then rules rather than through conventional procedural code. The first expert systems were created in the 1970s and then proliferated in the 1980s. Expert systems were among the first truly successful forms of AI software. An expert system is divided into two subsystems: the inference engine and the knowledge base. The knowledge base represents facts and rules. The inference engine applies the rules to the known facts to deduce new facts. Inference engines can also include explanation and debugging abilities.

Knowledge representation and reasoning is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medical condition or having a dialog in a natural language. Knowledge representation incorporates findings from psychology about how humans solve problems, and represent knowledge in order to design formalisms that will make complex systems easier to design and build. Knowledge representation and reasoning also incorporates findings from logic to automate various kinds of reasoning.

<span class="mw-page-title-main">Search algorithm</span> Any algorithm which solves the search problem

In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with either discrete or continuous values.

<span class="mw-page-title-main">Greedy algorithm</span> Sequence of locally optimal choices

A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time.

<span class="mw-page-title-main">Douglas Lenat</span> Computer scientist and AI pioneer

Douglas Bruce Lenat was an American computer scientist and researcher in artificial intelligence who was the founder and CEO of Cycorp, Inc. in Austin, Texas.

A* is a graph traversal and pathfinding algorithm, which is used in many fields of computer science due to its completeness, optimality, and optimal efficiency. Given a weighted graph, a source node and a goal node, the algorithm finds the shortest path from source to goal.

<span class="mw-page-title-main">Edward Feigenbaum</span> American computer scientist

Edward Albert Feigenbaum is a computer scientist working in the field of artificial intelligence, and joint winner of the 1994 ACM Turing Award. He is often called the "father of expert systems."

<span class="mw-page-title-main">Symbolic artificial intelligence</span> Methods in artificial intelligence research

In artificial intelligence, symbolic artificial intelligence is the term for the collection of all methods in artificial intelligence research that are based on high-level symbolic (human-readable) representations of problems, logic and search. Symbolic AI used tools such as logic programming, production rules, semantic nets and frames, and it developed applications such as knowledge-based systems, symbolic mathematics, automated theorem provers, ontologies, the semantic web, and automated planning and scheduling systems. The Symbolic AI paradigm led to seminal ideas in search, symbolic programming languages, agents, multi-agent systems, the semantic web, and the strengths and limitations of formal knowledge and reasoning systems.

MYCIN was an early backward chaining expert system that used artificial intelligence to identify bacteria causing severe infections, such as bacteremia and meningitis, and to recommend antibiotics, with the dosage adjusted for patient's body weight — the name derived from the antibiotics themselves, as many antibiotics have the suffix "-mycin". The Mycin system was also used for the diagnosis of blood clotting diseases. MYCIN was developed over five or six years in the early 1970s at Stanford University. It was written in Lisp as the doctoral dissertation of Edward Shortliffe under the direction of Bruce G. Buchanan, Stanley N. Cohen and others.

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic that may provide a sufficiently good solution to an optimization problem or a machine learning problem, especially with incomplete or imperfect information or limited computation capacity. Metaheuristics sample a subset of solutions which is otherwise too large to be completely enumerated or otherwise explored. Metaheuristics may make relatively few assumptions about the optimization problem being solved and so may be usable for a variety of problems.

A knowledge-based system (KBS) is a computer program that reasons and uses a knowledge base to solve complex problems. The term is broad and refers to many different kinds of systems. The one common theme that unites all knowledge based systems is an attempt to represent knowledge explicitly and a reasoning system that allows it to derive new knowledge. Thus, a knowledge-based system has two distinguishing features: a knowledge base and an inference engine.

A physical symbol system takes physical patterns (symbols), combining them into structures (expressions) and manipulating them to produce new expressions.

<span class="mw-page-title-main">Outline of thought</span> Overview of and topical guide to thought

The following outline is provided as an overview of and topical guide to thought (thinking):

In mathematical optimization and computer science, heuristic is a technique designed for problem solving more quickly when classic methods are too slow for finding an exact or approximate solution, or when classic methods fail to find any exact solution in a search space. This is achieved by trading optimality, completeness, accuracy, or precision for speed. In a way, it can be considered a shortcut.

A hyper-heuristic is a heuristic search method that seeks to automate, often by the incorporation of machine learning techniques, the process of selecting, combining, generating or adapting several simpler heuristics to efficiently solve computational search problems. One of the motivations for studying hyper-heuristics is to build systems which can handle classes of problems rather than solving just one problem.

Knowledge-based configuration, also referred to as product configuration or product customization, is an activity of customising a product to meet the needs of a particular customer. The product in question may consist of mechanical parts, services, and software. Knowledge-based configuration is a major application area for artificial intelligence (AI), and it is based on modelling of the configurations in a manner that allows the utilisation of AI techniques for searching for a valid configuration to meet the needs of a particular customer.

Computational heuristic intelligence (CHI) refers to specialized programming techniques in computational intelligence. These techniques have the express goal of avoiding complexity issues, also called NP-hard problems, by using human-like techniques. They are best summarized as the use of exemplar-based methods (heuristics), rather than rule-based methods (algorithms). Hence the term is distinct from the more conventional computational algorithmic intelligence, or symbolic AI. An example of a CHI technique is the encoding specificity principle of Tulving and Thompson. In general, CHI principles are problem solving techniques used by people, rather than programmed into machines. It is by drawing attention to this key distinction that the use of this term is justified in a field already replete with confusing neologisms. Note that the legal systems of all modern human societies employ both heuristics from individual trial records as well as legislated statutes (rules) as regulatory guides.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

Memetic computing is a novel computational paradigm that incorporates the notion of meme(s) as basic units of transferable information encoded in computational representations for boosting the performance of artificial evolutionary systems in the domain of search and optimization.

A chemical graph generator is a software package to generate computer representations of chemical structures adhering to certain boundary conditions. The development of such software packages is a research topic of cheminformatics. Chemical graph generators are used in areas such as virtual library generation in drug design, in molecular design with specified properties, called inverse QSAR/QSPR, as well as in organic synthesis design, retrosynthesis or in systems for computer-assisted structure elucidation (CASE). CASE systems again have regained interest for the structure elucidation of unknowns in computational metabolomics, a current area of computational biology.

References

  1. Berk, A A. LISP: the Language of Artificial Intelligence. New York: Van Nostrand Reinhold Company, 1985. 1-25.
  2. Lederberg, Joshua. An Instrumentation Crisis in Biology. Stanford University Medical School. Palo Alto, 1963.
  3. Lederberg, Joshua. How Dendral Was Conceived and Born. ACM Symposium on the History of Medical Informatics, 5 November 1987, Rockefeller University. New York: National Library of Medicine, 1987.
  4. Lindsay, Robert K., Bruce G. Buchanan, Edward A. Feigenbaum, and Joshua Lederberg. Applications of Artificial Intelligence for Organic Chemistry: The Dendral Project. McGraw-Hill Book Company, 1980.
  5. Lindsay, Robert K., Bruce G. Buchanan, E. A. Feigenbaum, and Joshua Lederberg. DENDRAL: A Case Study of the First Expert System for Scientific Hypothesis Formation. Artificial Intelligence 61, 2 (1993): 209-261.
  6. November, Joseph A. “Digitizing Life: The Introduction of Computers to Biology and Medicine.” Doctoral dissertation, Princeton University, 2006