Decision tree

Last updated
Traditionally, decision trees have been created manually. Manual decision tree.jpg
Traditionally, decision trees have been created manually.

A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

Decision support system computer-based information system that supports business or organizational decision-making

A decision support system (DSS) is an information system that supports business or organizational decision-making activities. DSSs serve the management, operations and planning levels of an organization and help people make decisions about problems that may be rapidly changing and not easily specified in advance—i.e. unstructured and semi-structured decision problems. Decision support systems can be either fully computerized or human-powered, or a combination of both.

Tree (graph theory) undirected, connected and acyclic graph

In graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path, or equivalently a connected acyclic undirected graph. A forest is an undirected graph in which any two vertices are connected by at most one path, or equivalently an acyclic undirected graph, or equivalently a disjoint union of trees.

Causal model abstract model that describes the causal mechanisms of a system. The model must express more than correlation because correlation does not imply causation

In philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

Contents

Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal, but are also a popular tool in machine learning.

Operations research (OR) is a discipline that deals with the application of advanced analytical methods to help make better decisions. Further, the term operational analysis is used in the British military as an intrinsic part of capability development, management and assurance. In particular, operational analysis forms part of the Combined Operational Effectiveness and Investment Appraisals, which support British defense capability acquisition decision-making.

Decision analysis (DA) is the discipline comprising the philosophy, methodology, and professional practice necessary to address important decisions in a formal manner. Decision analysis includes many procedures, methods, and tools for identifying, clearly representing, and formally assessing important aspects of a decision, for prescribing a recommended course of action by applying the maximum expected utility action axiom to a well-formed representation of the decision, and for translating the formal representation of a decision and its corresponding recommendation into insight for the decision maker and other stakeholders.

Goal desired result or outcome

A goal is an idea of the future or desired result that a person or a group of people envisions, plans and commits to achieve. People endeavor to reach goals within a finite time by setting deadlines.

Overview

A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.

Flowchart diagram representing the flow of logic, algorithms, or data through the use of boxes, lines, and arrows

A flowchart is a type of diagram that represents a workflow or process. A flowchart can also be defined as a diagrammatic representation of an algorithm, a step-by-step approach to solving a task.

In decision analysis, a decision tree and the closely related influence diagram are used as a visual and analytical decision support tool, where the expected values (or expected utility) of competing alternatives are calculated.

An influence diagram (ID) is a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems can be modeled and solved.

In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the same experiment it represents. For example, the expected value in rolling a six-sided die is 3.5, because the average of all the numbers that come up is 3.5 as the number of rolls approaches infinity. In other words, the law of large numbers states that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions approaches infinity. The expected value is also known as the expectation, mathematical expectation, EV, average, mean value, mean, or first moment.

A decision tree consists of three types of nodes: [1]

  1. Decision nodes – typically represented by squares
  2. Chance nodes – typically represented by circles
  3. End nodes – typically represented by triangles

Decision trees are commonly used in operations research and operations management. If, in practice, decisions have to be taken online with no recall under incomplete knowledge, a decision tree should be paralleled by a probability model as a best choice model or online selection model algorithm. Another use of decision trees is as a descriptive means for calculating conditional probabilities.

Operations management An area of management concerned with designing and controlling the process of production and redesigning business operations

Operations management is an area of management concerned with designing and controlling the process of production and redesigning business operations in the production of goods or services. It involves the responsibility of ensuring that business operations are efficient in terms of using as few resources as needed and effective in terms of meeting customer requirements. Operations management is primarily concerned with planning, organizing and supervising in the contexts of production, manufacturing or the provision of services.

Probability measure of the expectation that an event will occur or a statement is true

Probability is a measure quantifying the likelihood that events will occur. See glossary of probability and statistics. Probability quantifies as a number between 0 and 1, where, roughly speaking, 0 indicates impossibility and 1 indicates certainty. The higher the probability of an event, the more likely it is that the event will occur. A simple example is the tossing of a fair (unbiased) coin. Since the coin is fair, the two outcomes are both equally probable; the probability of "heads" equals the probability of "tails"; and since no other outcomes are possible, the probability of either "heads" or "tails" is 1/2.

Algorithm An unambiguous specification of how to solve a class of problems

In mathematics and computer science, an algorithm is a set of instructions, typically to solve a class of problems or perform a computation. Algorithms are unambiguous specifications for performing calculation, data processing, automated reasoning, and other tasks.

Decision trees, influence diagrams, utility functions, and other decision analysis tools and methods are taught to undergraduate students in schools of business, health economics, and public health, and are examples of operations research or management science methods.

Decision tree building blocks

Decision tree elements

Decision-Tree-Elements.png

Drawn from left to right, a decision tree has only burst nodes (splitting paths) but no sink nodes (converging paths). Therefore, used manually, they can grow very big and are then often hard to draw fully by hand. Traditionally, decision trees have been created manually — as the aside example shows — although increasingly, specialized software is employed.

Decision rules

The decision tree can be linearized into decision rules, [2] where the outcome is the contents of the leaf node, and the conditions along the path form a conjunction in the if clause. In general, the rules have the form:

if condition1 and condition2 and condition3 then outcome.

Decision rules can be generated by constructing association rules with the target variable on the right. They can also denote temporal or causal relations. [3]

Decision tree using flowchart symbols

Commonly a decision tree is drawn using flowchart symbols as it is easier for many to read and understand.

DecisionCalcs.jpg

Analysis example

Analysis can take into account the decision maker's (e.g., the company's) preference or utility function, for example:

RiskPrefSensitivity2Threshold.png

The basic interpretation in this situation is that the company prefers B's risk and payoffs under realistic risk preference coefficients (greater than $400K—in that range of risk aversion, the company would need to model a third strategy, "Neither A nor B").

Another example, commonly used in operations research courses, is the distribution of lifeguards on beaches (a.k.a. the "Life's a Beach" example). [4] The example describes two beaches with lifeguards to be distributed on each beach. There is maximum budget B that can be distributed among the two beaches (in total), and using a marginal returns table, analysts can decide how many lifeguards to allocate to each beach.

Lifeguards on each beachDrownings prevented in total, beach #1Drownings prevented in total, beach #2
113
240

In this example, a decision tree can be drawn to illustrate the principles of diminishing returns on beach #2.

Beach Decision Tree Beachdecisiontree.png
Beach Decision Tree

The decision tree illustrates that when sequentially distributing lifeguards, placing a first lifeguard on beach #1 would be optimal if there is only the budget for 1 lifeguard. But if there is a budget for two guards, then placing both on beach #2 would prevent more overall drownings.

Lifeguards Lifeguards.png
Lifeguards

Influence diagram

Much of the information in a decision tree can be represented more compactly as an influence diagram, focusing attention on the issues and relationships between events.

The rectangle on the left represents a decision, the ovals represent actions, and the diamond represents results. Factory2 InfluenceDiagram.png
The rectangle on the left represents a decision, the ovals represent actions, and the diamond represents results.

Association rule induction

Decision trees can also be seen as generative models of induction rules from empirical data. An optimal decision tree is then defined as a tree that accounts for most of the data, while minimizing the number of levels (or "questions"). [5] Several algorithms to generate such optimal trees have been devised, such as ID3/4/5, [6] CLS, ASSISTANT, and CART.

Advantages and disadvantages

Among decision support tools, decision trees (and influence diagrams) have several advantages. Decision trees:

Disadvantages of decision trees:

See also

Related Research Articles

Machine learning Scientific study of algorithms and statistical models that computer systems use to perform tasks without explicit instructions

Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop a conventional algorithm for effectively performing the task.

Branch and bound is an algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solutions by means of state space search: the set of candidate solutions is thought of as forming a rooted tree with the full set at the root. The algorithm explores branches of this tree, which represent subsets of the solution set. Before enumerating the candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on the optimal solution, and is discarded if it cannot produce a better solution than the best one found so far by the algorithm.

In computer science, a binary decision diagram (BDD) or branching program is a data structure that is used to represent a Boolean function. On a more abstract level, BDDs can be considered as a compressed representation of sets or relations. Unlike other compressed representations, operations are performed directly on the compressed representation, i.e. without decompression. Other data structures used to represent Boolean functions include negation normal form (NNF), Zhegalkin polynomials, and propositional directed acyclic graphs (PDAG).

Decision tree learning Machine Learning Algorithm

In computer science, Decision tree learning uses a decision tree to go from observations about an item to conclusions about the item's target value. It is one of the predictive modeling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values are called regression trees.

In computer programming, gene expression programming (GEP) is an evolutionary algorithm that creates computer programs or models. These computer programs are complex tree structures that learn and adapt by changing their sizes, shapes, and composition, much like a living organism. And like living organisms, the computer programs of GEP are also encoded in simple linear chromosomes of fixed length. Thus, GEP is a genotype–phenotype system, benefiting from a simple genome to keep and transmit the genetic information and a complex phenotype to explore the environment and adapt to it.

The expectiminimax algorithm is a variation of the minimax algorithm, for use in artificial intelligence systems that play two-player zero-sum games, such as backgammon, in which the outcome depends on a combination of the player's skill and chance elements such as dice rolls. In addition to "min" and "max" nodes of the traditional minimax tree, this variant has "chance" nodes, which take the expected value of a random event occurring. In game theory terms, an expectiminimax tree is the game tree of an extensive-form game of perfect, but incomplete information.

Statistical classification in supervised learning

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations whose category membership is known. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagnosis to a given patient based on observed characteristics of the patient. Classification is an example of pattern recognition.

Ben Shneiderman American computer scientist

Ben Shneiderman is an American computer scientist, a Distinguished University Professor in the University of Maryland Department of Computer Science, which is part of the University of Maryland College of Computer, Mathematical, and Natural Sciences at the University of Maryland, College Park, and the founding director (1983-2000) of the University of Maryland Human-Computer Interaction Lab. He conducted fundamental research in the field of human–computer interaction, developing new ideas, methods, and tools such as the direct manipulation interface, and his eight rules of design.

ID3 algorithm decision tree algorithm

In decision tree learning, ID3 is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is typically used in the machine learning and natural language processing domains.

C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. In 2011, authors of the Weka machine learning software described the C4.5 algorithm as "a landmark decision tree program that is probably the machine learning workhorse most widely used in practice to date".

Weka (machine learning) suite of machine learning software written in Java

Waikato Environment for Knowledge Analysis (Weka) is a suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. It is free software licensed under the GNU General Public License.

Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modelling, and machine learning, that analyze current and historical facts to make predictions about future or otherwise unknown events.

DRAKON Algorithm mapping tool

DRAKON is an algorithmic visual programming and modeling language developed within the Buran space project following ergonomic design principles. The language provides a uniform way to represent flowcharts of any complexity that are easy to read and understand.

Decision intelligence

Decision intelligence is an engineering discipline that augments data science with theory from social science, decision theory, and managerial science. Its application provides a framework for best practices in organizational decision-making and processes for applying machine learning at scale. The basic idea is that decisions are based on our understanding of how actions lead to outcomes. Decision intelligence is a discipline for analyzing this chain of cause and effect, and decision modeling is a visual language for representing these chains.

Information fuzzy networks (IFN) is a greedy machine learning algorithm for supervised learning. The data structure produced by the learning algorithm is also called Info Fuzzy Network. IFN construction is quite similar to decision trees' construction. However, IFN constructs a directed graph and not a tree. IFN also uses the conditional mutual information metric in order to choose features during the construction stage while decision trees usually use other metrics like entropy or gini.

Contrast set learning is a form of association rule learning that seeks to identify meaningful differences between separate groups by reverse-engineering the key predictors that identify for each particular group. For example, given a set of attributes for a pool of students, a contrast set learner would identify the contrasting features between students seeking bachelor's degrees and those working toward PhD degrees.

DEX is a qualitative multi-criteria decision analysis (MCDA) method for decision making and is implemented in DEXi software. This method was developed by a research team led by Bohanec, Bratko, and Rajkovič. The method supports decision makers in making complex decisions based on multiple, possibly conflicting, attributes. In DEX, all attributes are qualitative and can take values represented by words, such as “low” or “excellent”. Attributes are generally organized in a hierarchy. The evaluation of decision alternatives is carried out by utility functions, which are represented in the form of decision rules. All attributes are assumed to be discrete. Additionally, they can be preferentially ordered, so that a higher ordinal value represents a better preference.

References

  1. Kamiński, B.; Jakubczyk, M.; Szufel, P. (2017). "A framework for sensitivity analysis of decision trees". Central European Journal of Operations Research. 26 (1): 135–159. doi:10.1007/s10100-017-0479-6. PMC   5767274 . PMID   29375266.
  2. Quinlan, J. R. (1987). "Simplifying decision trees". International Journal of Man-Machine Studies. 27 (3): 221–234. CiteSeerX   10.1.1.18.4267 . doi:10.1016/S0020-7373(87)80053-6.
  3. K. Karimi and H.J. Hamilton (2011), "Generation and Interpretation of Temporal Decision Rules", International Journal of Computer Information Systems and Industrial Management Applications, Volume 3
  4. Wagner, Harvey M. (1 September 1975). Principles of Operations Research: With Applications to Managerial Decisions (2nd ed.). Englewood Cliffs, NJ: Prentice Hall. ISBN   9780137095926.
  5. R. Quinlan, "Learning efficient classification procedures", Machine Learning: an artificial intelligence approach, Michalski, Carbonell & Mitchell (eds.), Morgan Kaufmann, 1983, p. 463-482. doi : 10.1007/978-3-662-12405-5_15
  6. Utgoff, P. E. (1989). Incremental induction of decision trees. Machine learning, 4(2), 161-186. doi : 10.1023/A:1022699900025
  7. Deng,H.; Runger, G.; Tuv, E. (2011). Bias of importance measures for multi-valued attributes and solutions (PDF). Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN).