Fast-and-frugal trees

Last updated

In the study of decision-making, a fast-and-frugal tree is a simple graphical structure that categorizes objects by asking one question at a time. These decision trees are used in a range of fields: psychology, artificial intelligence, and management science. Unlike other decision or classification trees, such as Leo Breiman's CART, [1] fast-and-frugal trees are intentionally simple, both in their construction as well as their execution, and operate speedily with little information. For this reason, fast-and-frugal-trees are potentially attractive when designing resource-constrained tasks. [2]

Contents

Laura Martignon, Vitouch, Takezawa and Forster first introduced both the concept and the term in 2003; [3] similar heuristics for other tasks had been used before, building on the formal models created by Gerd Gigerenzer and Herbert A. Simon.

In categorization tasks with two options and m cues—also known as features or attributes—available for making such a decision, an FFT is defined as follows:

A fast-and-frugal tree is a classification or a decision tree that has m+1 exits, with one exit for each of the first m −1 cues and two exits for the last cue.

Mathematically, fast-and-frugal trees can be viewed as lexicographic heuristics or as linear classification models with non-compensatory weights and a threshold. [MKW] Their formal properties and construction have also been analyzed using signal detection theory by Luan, Schooler and Gigerenzer in 2011. [4] [LSG]

Basic organization

Construction

The basic elements are the cues. The cues are ranked, with one cue at each level of the tree and an exit node at each level (except for two exit nodes for the last cue at the last level of the tree). Whenever a cue is used, a question is asked about the value of the cue. The answers to the questions might immediately lead to an exit, or they might lead to a further question (and eventually to an exit). A characteristic property of fast-and-frugal trees is that, for each question, there is at least one possible answer that leads to an exit.

In the literature on fast-and-frugal trees, many different algorithms have been proposed [3] [MKW] [LSG] [5] for (1) ordering cues and (2) deciding which possible answer to a question about a cue leads immediately to an exit. A fast-and-frugal tree is fully defined if both the following conditions are met. Often, in order to keep construction simple and intuitive, the algorithms use (1) simple measures of cue "goodness" (e.g., correlation between cue and category, considering each cue independently of the other cues) and (2) make simple choices about exits (e.g., decide on each exit independently of the other exits), but more complex algorithms have been proposed as well.

Execution

To use a fast-and-frugal tree, begin at the root and check one cue at a time. At each step, one of the possible outcomes is an exit node which allows for a decision (or action)—if an exit is reached, stop; otherwise, continue until an exit is reached. Take an exit, stop; otherwise, continue and ask more questions until an exit is reached.

Figure 1. A fast-and-frugal tree that helps emergency room doctors decide whether to send a patient to a regular nursing bed or the coronary care unit (Green & Mehr, 1997). FFT-FIG01.png
Figure 1. A fast-and-frugal tree that helps emergency room doctors decide whether to send a patient to a regular nursing bed or the coronary care unit (Green & Mehr, 1997).

Figure 1 illustrates a fast-and-frugal tree for classifying a patient as "high risk" of having a heart stroke and thus having to be sent to the "coronary care unit" or as "low risk" and thus having to be sent to a "regular nursing bed" (Green & Mehr, 1997). [GM]

Consider three patients, John, Mary, and Jack:

Performance

The accuracy and robustness of fast-and-frugal trees has been shown to be comparable to that of Bayesian benchmarks in studies by Laskey and Martignon (2014). [LM] Extensive studies comparing the performance of fast-and-frugal trees to that of classification algorithms used in statistics and machine learning, such as naive Bayes, CART, random forests, and logistic regression, have also been carried out by using dozens of real-world datasets. [WHM] [MKW] [5]

Signal detection analysis

Fast-and-frugal trees are used to perform binary classifications or decisions. In psychology, medicine, and other fields, signal detection theory (or detection theory) has been the classic theory under which such tasks are analyzed.

The theory assumes that there are two categories of events or people (e.g., people with and without heart problems), of which the category more relevant to us is referred as "signal" while the other is referred to as "noise". The two differ in their distribution on an observation scale that we may call "evidence", with the signal distribution having a higher mean. One can make two possible classifications, namely "signal" or "noise", upon gathering the evidence. This leads to four possible outcomes: hit (classify as "signal" when it is indeed a signal), correct rejection (classify as "noise" when it is indeed a noise), miss (classify as "noise" when it is actually a signal), and false alarm (classify as "signal" when it is actually a noise). To maximize overall accuracy or the expected value of a classification, the theory posits that we need to carefully select the classification criterion on the evidence scale, above which we make a "signal" decision and below which "noise". Specially, when the cost of a miss is very high (i.e., classifying a patient with heart problem as normal), a lower, more "liberal" criterion (i.e., toward the left in the evidence scale) needs to be selected, whereas when the cost of a false alarm is very high (e.g., classifying an innocent person as guilty of a murder), a higher, more "conservative" criterion will be better. This implies that a good decision-maker needs to be properly biased in most real-world situations; this is the most critical and relevant insight from signal detection theory on classification and decision making.

Figure 2. The higher section of the figure illustrates the assumptions of signal-detection theory in a binary decision task. The three vertical lines represent three decision criteria the agent and the decision-maker may adopt. The lower section illustrates the four possible FFTs that can be constructed when three features are consulted in a fixed order. Based on the classifications pointed to by the first two exits, the trees are named from left to right FFTss, FFTsn, FFTns, and FFTnn. The arrows connecting the figure parts indicate roughly the locations of the four FFTs' decision criteria when they are used to make a binary s/n (for signal and noise, respectively) classification or decision. Among the four, FFTss has the most liberal decision criterion and FFTnn the most conservative one. The decision criteria of FFTsn and FFTns are less extreme than the other two, with FFTsn being more liberal than FFTns. FIGURE2-FFT-ByCesare.png
Figure 2. The higher section of the figure illustrates the assumptions of signal-detection theory in a binary decision task. The three vertical lines represent three decision criteria the agent and the decision-maker may adopt. The lower section illustrates the four possible FFTs that can be constructed when three features are consulted in a fixed order. Based on the classifications pointed to by the first two exits, the trees are named from left to right FFTss, FFTsn, FFTns, and FFTnn. The arrows connecting the figure parts indicate roughly the locations of the four FFTs' decision criteria when they are used to make a binary s/n (for signal and noise, respectively) classification or decision. Among the four, FFTss has the most liberal decision criterion and FFTnn the most conservative one. The decision criteria of FFTsn and FFTns are less extreme than the other two, with FFTsn being more liberal than FFTns.

In 2011, Luan, Schooler, and Gigerenzer analyzed characteristics of fast-and-frugal trees from the perspective of signal detection theory. There are several key findings from this analysis. First, the choice of the exit structure of a fast-and-frugal tree corresponds to the setting of the decision criterion in signal detection. In a nutshell, the earlier a "signal exit" appears in a fast-and-frugal tree, the more liberally biased is the tree. The relative biases of two fast-and-frugal trees are determined by the first exit in which the two differ, with the one having the "signal exit" – denoted by "s" – always being more liberal as the one having the "noise exit" – denoted by "n" (Figure 2). For example, an FFTsnnn (here again s = "Signal exit", n = "noise exit") is more liberally biased than an FFTnsss. This principle is referred to as the "lexicographic decision bias" of fast-and-frugal trees.

Second, a series of simulations show that fast-and-frugal trees with different exit structures will lead to different—sometimes drastically different—expected value of a decision when the consequences of a miss and a false alarm differ. Therefore, when constructing and applying a fast-and-frugal tree, one needs to choose an exit structure that matches well the decision payoff structure of a task.

Third, the overall sensitivity of a fast-and-frugal tree—that is, how well the tree can discriminate a signal from a noise and which can be measured by d' or A' from signal detection theory—is affected by properties of the cues that make up the tree, such as the mean and variance of the cues' sensitivities and the inter-cue correlations among the cues, but not much by the exit structure of the tree. And finally, the performance of fast-and-frugal trees is robust and comparable to much more sophisticated decision algorithms developed in signal detection theory, including the ideal observer analysis model and the optimal sequential sampling model. In the context of out-of-sample predictions, fast-and-frugal trees perform the best relative to other models when the learning sample size is relatively small (e.g., less than 80 trials).

Figure 3. A fast-and-frugal tree that can help soldiers stationed in Afghanistan distinguish whether a car approaching a check-point is driven by civilians or potential suicide bombers (Keller & Katsikopoulos, 2016). FIG3-FFT.png
Figure 3. A fast-and-frugal tree that can help soldiers stationed in Afghanistan distinguish whether a car approaching a check-point is driven by civilians or potential suicide bombers (Keller & Katsikopoulos, 2016).
Figure 4. Fast-and-frugal trees that describe how a person decides whether to forgive another person for an offense the latter committed during social interactions (left; Tan, Luan, & Katsikopoulos, 2017) and how British judges decide whether to make a punitive bail decision (right. Dhami, 2003). FIG4-FFT.png
Figure 4. Fast-and-frugal trees that describe how a person decides whether to forgive another person for an offense the latter committed during social interactions (left; Tan, Luan, & Katsikopoulos, 2017) and how British judges decide whether to make a punitive bail decision (right. Dhami, 2003).

Computing support

In 2017, Phillips, Neth, Woike and Gaissmaier [PNWG] introduced the R package FFTrees, [6] hosted on CRAN (with an accompanying app [7] ), which constructs, depicts graphically, and evaluates quantitatively fast and frugal trees in user-friendly ways.

More examples

There have been many applications of fast-and-frugal trees in both prescribing how a decision should be made and describing how people actually make decisions. Beyond the medical field, an example of their prescriptive applications is instructing soldiers stationed in Afghanistan how to distinguish whether a car approaching a check-point is driven by civilians or potential suicide bombers; [8] [KK] the tree is illustrated in Figure 3. Two examples of fast-and-frugal trees' descriptive uses are shown in Figure 4. The trees on the left and right describe, respectively, how a person decides whether to forgive another person for an offense the latter committed during social interactions [TLK] and how British judges make a bail-or-jail decision. [D] In general, fast-and-frugal trees can be applied to help or model any binary decision-making processes that involve multiple cues.

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> A paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

<span class="mw-page-title-main">Cognitive bias</span> Systematic pattern of deviation from norm or rationality in judgment

A cognitive bias is a systematic pattern of deviation from norm or rationality in judgment. Individuals create their own "subjective reality" from their perception of the input. An individual's construction of reality, not the objective input, may dictate their behavior in the world. Thus, cognitive biases may sometimes lead to perceptual distortion, inaccurate judgment, illogical interpretation, and irrationality.

A heuristic, or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, short-term goal or approximation. Where finding an optimal solution is impossible or impractical, heuristic methods can be used to speed up the process of finding a satisfactory solution. Heuristics can be mental shortcuts that ease the cognitive load of making a decision.

Bounded rationality is the idea that rationality is limited when individuals make decisions, and under these limitations, rational individuals will select a decision that is satisfactory rather than optimal.

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the question posed by Kearns and Valiant : "Can a set of weak learners create a single strong learner?" A weak learner is defined to be a classifier that is only slightly correlated with the true classification. In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.

<span class="mw-page-title-main">Machine learning</span> Study of algorithms that improve automatically through experience

Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions. Recently, generative artificial neural networks have been able to surpass many previous approaches in performance. Machine learning approaches have been applied to large language models, computer vision, speech recognition, email filtering, agriculture, and medicine, where it is too costly to develop algorithms to perform the needed tasks.

The recognition heuristic, originally termed the recognition principle, has been used as a model in the psychology of judgment and decision making and as a heuristic in artificial intelligence. The goal is to make inferences about a criterion that is not directly accessible to the decision maker, based on recognition retrieved from memory. This is possible if recognition of alternatives has relevance to the criterion. For two alternatives, the heuristic is defined as:

If one of two objects is recognized and the other is not, then infer that the recognized object has the higher value with respect to the criterion.

<span class="mw-page-title-main">Gerd Gigerenzer</span> German cognitive psychologist

Gerd Gigerenzer is a German psychologist who has studied the use of bounded rationality and heuristics in decision making. Gigerenzer is director emeritus of the Center for Adaptive Behavior and Cognition (ABC) at the Max Planck Institute for Human Development and director of the Harding Center for Risk Literacy, both in Berlin.

Detection theory or signal detection theory is a means to measure the ability to differentiate between information-bearing patterns and random patterns that distract from the information.

In psychology, the take-the-best heuristic is a heuristic which decides between two alternatives by choosing based on the first cue that discriminates them, where cues are ordered by cue validity. In the original formulation, the cues were assumed to have binary values or have an unknown value. The logic of the heuristic is that it bases its choice on the best cue (reason) only and ignores the rest.

Cascading is a particular case of ensemble learning based on the concatenation of several classifiers, using all information collected from the output from a given classifier as additional information for the next classifier in the cascade. Unlike voting or stacking ensembles, which are multiexpert systems, cascading is a multistage one.

Heuristics is the process by which humans use mental shortcuts to arrive at decisions. Heuristics are simple strategies that humans, animals, organizations, and even machines use to quickly form judgments, make decisions, and find solutions to complex problems. Often this involves focusing on the most relevant aspects of a problem or situation to formulate a solution. While heuristic processes are used to find the answers and solutions that are most likely to work or be correct, they are not always right or the most accurate. Judgments and decisions based on heuristics are simply good enough to satisfy a pressing need in situations of uncertainty, where information is incomplete. In that sense they can differ from answers given by logic and probability.

Heuristics are simple decision making strategies used to achieve a specific goal quickly and efficiently, and are commonly implemented in sports. Many sports require the ability to make fast decisions under time pressure, and the proper use of heuristics is essential for many of these decisions.

Social heuristics are simple decision making strategies that guide people's behavior and decisions in the social environment when time, information, or cognitive resources are scarce. Social environments tend to be characterised by complexity and uncertainty, and in order to simplify the decision-making process, people may use heuristics, which are decision making strategies that involve ignoring some information or relying on simple rules of thumb.

Ecological rationality is a particular account of practical rationality, which in turn specifies the norms of rational action – what one ought to do in order to act rationally. The presently dominant account of practical rationality in the social and behavioral sciences such as economics and psychology, rational choice theory, maintains that practical rationality consists in making decisions in accordance with some fixed rules, irrespective of context. Ecological rationality, in contrast, claims that the rationality of a decision depends on the circumstances in which it takes place, so as to achieve one's goals in this particular context. What is considered rational under the rational choice account thus might not always be considered rational under the ecological rationality account. Overall, rational choice theory puts a premium on internal logical consistency whereas ecological rationality targets external performance in the world. The term ecologically rational is only etymologically similar to the biological science of ecology.

In behavioural sciences, social rationality is a type of decision strategy used in social contexts, in which a set of simple rules is applied in complex and uncertain situations.

The less-is-more effect refers to the finding that heuristic decision strategies can yield more accurate judgments than alternative strategies that use more pieces of information. Understanding these effects is part of the study of ecological rationality.

<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning:

<span class="mw-page-title-main">Laura Martignon</span> Colombian and Italian mathematician; lives in Germany

Laura Martignon is a Colombian and Italian professor and scientist. From 2003 until 2020 she served as a Professor of Mathematics and Mathematical Education at the Ludwigsburg University of Education. Until 2017 she was an Adjunct Scientist of the Max Planck Institute for Human Development in Berlin, where she previously worked as Senior Researcher. She also worked for ten years as a Mathematics Professor at the University of Brasilia and spent a period of one and a half years, as visiting scholar, at the Hebrew University of Jerusalem.

Intuitive statistics, or folk statistics, is the cognitive phenomenon where organisms use data to make generalizations and predictions about the world. This can be a small amount of sample data or training instances, which in turn contribute to inductive inferences about either population-level properties, future data, or both. Inferences can involve revising hypotheses, or beliefs, in light of probabilistic data that inform and motivate future predictions. The informal tendency for cognitive animals to intuitively generate statistical inferences, when formalized with certain axioms of probability theory, constitutes statistics as an academic discipline.

References

  1. Leo Breiman (2017). Classification and Regression Trees. Routledge. doi:10.1201/9781315139470. ISBN   9781315139470. S2CID   129307201 . Retrieved 2019-08-30.
  2. Martignon, Laura F.; Katsikopoulos, Konstantinos V.; Woike, Jan K. (2012), "Naïve, Fast, and Frugal Trees for Classification", Ecological Rationality, Oxford University Press, doi:10.1093/acprof:oso/9780195315448.001.0001, ISBN   978-0-19-531544-8 , retrieved 2022-02-28
  3. 1 2 Martignon, Laura; Vitouch, Oliver; Takezawa, Masanori; Forster, Malcolm. "Naive and Yet Enlightened: From Natural Frequencies to Fast and Frugal Decision Trees", published in Thinking : Psychological perspectives on reasoning, judgement and decision making (David Hardman and Laura Macchi; editors), Chichester: John Wiley & Sons, 2003.
  4. Luan, Schooler and Gigerenzer, 2011 A signal-detection analysis of fast-and-frugal trees.
  5. 1 2 Şimşek, Özgür; Buckmann, Marcus (2015), Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M. (eds.), "Learning From Small Samples: An Analysis of Simple Decision Heuristics" (PDF), Advances in Neural Information Processing Systems 28, Curran Associates, Inc., pp. 3159–3167, retrieved 2019-09-01
  6. https://CRAN.R-project.org/package=FFTrees
  7. https://econpsychbasel.shinyapps.io/shinyfftrees/
  8. Keller, N., & Katsikopoulos, K. V. (2016) – On the role of psychological heuristics in operational research; and a demonstration in military stability operations. European Journal of Operational Research, 249, 1063–1073.