Grafting (decision trees)

Last updated

Grafting is the process of adding nodes to inferred decision trees to improve the predictive accuracy.[ clarification needed ] A decision tree is a graphical model that is used as a support tool for decision process.

Contents

Introduction

Once the decision tree is constructed, then the new branches that can be added productively to the tree are identified. Then they are grafted to the existing tree to improve the decision making process. Pruning and Grafting are complementary methods to improve the decision tree in supporting the decision. Pruning allows cutting parts of decision trees to give more clarity and Grafting adds nodes to the decision trees to increase the predictive accuracy. To achieve grafting new branches can be added in the place of a single leaf or graft within leaves.

Illustration

The information required is given in the form of a chart as,

Information Chart Chart of random data.jpg
Information Chart

The nodes and leaves can be identified from the given information and the decision trees are constructed. One such decision tree is as follows,

Decision Tree branch for the information Decision tree chart from random data.jpg
Decision Tree branch for the information

Here the X-axis is represented as A and Y-axis as B. There are two cuts in the decision trees – nodes at 11 and 5 respective to A.

  A > 11   A <= 11   |  A >= 5   |  A < 5

Using Grafting, new branches are added to the above classification.

Grafting Branches Decision-tree grafting.jpg
Grafting Branches

Here B is also taken into consideration for the nodes and leaves. There are two more cuts at B – 7 and 2.

  A > 11   A <= 11   |  A >= 5   |  A < 5      |  B > 7      |  B <= 7         |  B > 2         |  B <= 2

Thus the branching has increased due to the grafting technique.

This is the simplest form of illustration to represent grafting techniques.

Conclusion

Grafting can identify regions where there are no occupancy and correct the poor class assignments which increases the accuracy. The extension to graft multiple branches at each leaf reduces the number of errors.

However, the potential new branches have to be selected carefully to avoid increasing the error and failure cases.

Future Study

Improving multicast tree construction [1]

Problem of missing value in decision tree grafting [2] Optimal grafting and appropriate selection of branches to be added [3]

See also

Related Research Articles

Minimax is a decision rule used in artificial intelligence, decision theory, game theory, statistics, and philosophy for minimizing the possible loss for a worst case scenario. When dealing with gains, it is referred to as "maximin"—to maximize the minimum gain. Originally formulated for n-player zero-sum game theory, covering both the cases where players take alternate moves and those where they make simultaneous moves, it has also been extended to more complex games and to general decision-making in the presence of uncertainty.

Phylogenetic tree Branching diagram of evolutionary relationships between organisms

A phylogenetic tree is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating common ancestry.

Alpha–beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree. It is an adversarial search algorithm used commonly for machine playing of two-player games. It stops evaluating a move when at least one possibility has been found that proves the move to be worse than a previously examined move. Such moves need not be evaluated further. When applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision.

Decision tree Decision support tool

A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

In the context of Combinatorial game theory, which typically studies sequential games with perfect information, a Game tree is a graph representing all possible game states within such a game. Such games include well-known ones such as chess, checkers, Go, and tic-tac-toe. This can be used to measure the complexity of a game, as it represents all the possible ways a game can pan out. Due to the large game trees of complex games such as chess, algorithms that are designed to play this class of games will use partial game trees, which makes computation feasible on modern computers. Various methods exist to solve game trees. If a complete game tree can be generated, a Deterministic algorithm, such as backward induction or retrograde analysis can be used. Randomized algorithms and Minimax algorithms such as MCTS can be used in cases where a complete game tree is not feasible.

In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as the value itself, for example in defining a type for representing trees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

Decision tree learning Machine learning algorithm

Decision tree learning or induction of decision trees is one of the predictive modelling approaches used in statistics, data mining and machine learning. It uses a decision tree to go from observations about an item to conclusions about the item's target value. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values are called regression trees. Decision trees are among the most popular machine learning algorithms given their intelligibility and simplicity.

Adaptive Huffman coding is an adaptive coding technique based on Huffman coding. It permits building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to changing conditions in data.

In computer programming, gene expression programming (GEP) is an evolutionary algorithm that creates computer programs or models. These computer programs are complex tree structures that learn and adapt by changing their sizes, shapes, and composition, much like a living organism. And like living organisms, the computer programs of GEP are also encoded in simple linear chromosomes of fixed length. Thus, GEP is a genotype–phenotype system, benefiting from a simple genome to keep and transmit the genetic information and a complex phenotype to explore the environment and adapt to it.

B+ tree

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

Random forest Binary search tree based ensemble machine learning method

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned. Random decision forests correct for decision trees' habit of overfitting to their training set. Random forests generally outperform decision trees, but their accuracy is lower than gradient boosted trees. However, data characteristics can affect their performance.

In data processing R*-trees are a variant of R-trees used for indexing spatial information. R*-trees have slightly higher construction cost than standard R-trees, as the data may need to be reinserted; but the resulting tree will usually have a better query performance. Like the standard R-tree, it can store both point and spatial data. It was proposed by Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger in 1990.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key and creating point clouds. k-d trees are a special case of binary space partitioning trees.

C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. In 2011, authors of the Weka machine learning software described the C4.5 algorithm as "a landmark decision tree program that is probably the machine learning workhorse most widely used in practice to date".

Decision tree pruning

Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.

In computer science, B* is a best-first graph search algorithm that finds the least-cost path from a given initial node to any goal node. First published by Hans Berliner in 1979, it is related to the A* search algorithm.

This glossary of viticultural terms list some of terms and definitions involved in growing grapes for use in winemaking.

Bonsai cultivation and care

Bonsai cultivation and care involves the long-term cultivation of small trees in containers, called bonsai in the Japanese tradition of this art form. Similar practices exist in other Japanese art forms and in other cultures, including saikei (Japanese), penjing (Chinese), and hòn non bộ (Vietnamese). Trees are difficult to cultivate in containers, which restrict root growth, nutrition uptake, and resources for transpiration. In addition to the root constraints of containers, bonsai trunks, branches, and foliage are extensively shaped and manipulated to meet aesthetic goals. Specialized tools and techniques are used to protect the health and vigor of the subject tree. Over time, the artistic manipulation of small trees in containers has led to a number of cultivation and care approaches that successfully meet the practical and the artistic requirements of bonsai and similar traditions.

The fields of marketing and artificial intelligence converge in systems which assist in areas such as market forecasting, and automation of processes and decision making, along with increased efficiency of tasks which would usually be performed by humans. The science behind these systems can be explained through neural networks and expert systems, computer programs that process input and provide valuable output for marketers.

Multicast lightpaths

A multicast session requires a "point-to-multipoint" connection from a source node to multiple destination nodes. The source node is known as the root. The destination nodes are known as leaves. In the modern era, it is important to protect multicast connections in an optical mesh network. Recently, multicast applications have gained popularity as they are important to protecting critical sessions against failures such as fiber cuts, hardware faults, and natural disasters.

References

  1. "" Multicast Trees.
  2. Advanced Topics in Artificial Intelligence by Grigoris Antoniou, John K. Slaney
  3. "" Decision Tree Grafting