Tree traversal

Last updated

In computer science, tree traversal (also known as tree search and walking the tree) is a form of graph traversal and refers to the process of visiting (e.g. retrieving, updating, or deleting) each node in a tree data structure, exactly once. Such traversals are classified by the order in which the nodes are visited. The following algorithms are described for a binary tree, but they may be generalized to other trees as well.

Contents

Types

Unlike linked lists, one-dimensional arrays and other linear data structures, which are canonically traversed in linear order, trees may be traversed in multiple ways. They may be traversed in depth-first or breadth-first order. There are three common ways to traverse them in depth-first order: in-order, pre-order and post-order. [1] Beyond these basic traversals, various more complex or hybrid schemes are possible, such as depth-limited searches like iterative deepening depth-first search. The latter, as well as breadth-first search, can also be used to traverse infinite trees, see below.

Data structures for tree traversal

Traversing a tree involves iterating over all nodes in some manner. Because from a given node there is more than one possible next node (it is not a linear data structure), then, assuming sequential computation (not parallel), some nodes must be deferred—stored in some way for later visiting. This is often done via a stack (LIFO) or queue (FIFO). As a tree is a self-referential (recursively defined) data structure, traversal can be defined by recursion or, more subtly, corecursion, in a natural and clear fashion; in these cases the deferred nodes are stored implicitly in the call stack.

Depth-first search is easily implemented via a stack, including recursively (via the call stack), while breadth-first search is easily implemented via a queue, including corecursively. [2] :45−61

Depth-first traversal (dotted path) of a binary tree:
.mw-parser-output .plainlist ol,.mw-parser-output .plainlist ul{line-height:inherit;list-style:none;margin:0;padding:0}.mw-parser-output .plainlist ol li,.mw-parser-output .plainlist ul li{margin-bottom:0}
Pre-order (node visited at position red *):
F, B, A, D, C, E, G, I, H;
In-order (node visited at position green *):
A, B, C, D, E, F, G, H, I;
Post-order (node visited at position blue *):
A, C, E, D, B, H, I, G, F. Sorted binary tree ALL RGB.svg
Depth-first traversal (dotted path) of a binary tree:
  • Pre-order (node visited at position red ):
       F, B, A, D, C, E, G, I, H;
  • In-order (node visited at position green ):
       A, B, C, D, E, F, G, H, I;
  • Post-order (node visited at position blue ):
       A, C, E, D, B, H, I, G, F.

In depth-first search (DFS), the search tree is deepened as much as possible before going to the next sibling.

To traverse binary trees with depth-first search, perform the following operations at each node: [3] [4]

  1. If the current node is empty then return.
  2. Execute the following three operations in a certain order: [5]
    N: Visit the current node.
    L: Recursively traverse the current node's left subtree.
    R: Recursively traverse the current node's right subtree.

The trace of a traversal is called a sequentialisation of the tree. The traversal trace is a list of each visited node. No one sequentialisation according to pre-, in- or post-order describes the underlying tree uniquely. Given a tree with distinct elements, either pre-order or post-order paired with in-order is sufficient to describe the tree uniquely. However, pre-order with post-order leaves some ambiguity in the tree structure. [6]

There are three methods at which position of the traversal relative to the node (in the figure: red, green, or blue) the visit of the node shall take place. The choice of exactly one color determines exactly one visit of a node as described below. Visit at all three colors results in a threefold visit of the same node yielding the “all-order” sequentialisation:

F-B-A-A-A-B-D-C-C-C-D-E-E-E-D-B-F-G-G-I-H-H-H-I-I-G-F

Pre-order, NLR

  1. Visit the current node (in the figure: position red).
  2. Recursively traverse the current node's left subtree.
  3. Recursively traverse the current node's right subtree.

The pre-order traversal is a topologically sorted one, because a parent node is processed before any of its child nodes is done.

Post-order, LRN

  1. Recursively traverse the current node's left subtree.
  2. Recursively traverse the current node's right subtree.
  3. Visit the current node (in the figure: position blue).

Post-order traversal can be useful to get postfix expression of a binary expression tree.

In-order, LNR

  1. Recursively traverse the current node's left subtree.
  2. Visit the current node (in the figure: position green).
  3. Recursively traverse the current node's right subtree.

In a binary search tree ordered such that in each node the key is greater than all keys in its left subtree and less than all keys in its right subtree, in-order traversal retrieves the keys in ascending sorted order. [7]

Reverse pre-order, NRL

  1. Visit the current node.
  2. Recursively traverse the current node's right subtree.
  3. Recursively traverse the current node's left subtree.

Reverse post-order, RLN

  1. Recursively traverse the current node's right subtree.
  2. Recursively traverse the current node's left subtree.
  3. Visit the current node.

Reverse in-order, RNL

  1. Recursively traverse the current node's right subtree.
  2. Visit the current node.
  3. Recursively traverse the current node's left subtree.

In a binary search tree ordered such that in each node the key is greater than all keys in its left subtree and less than all keys in its right subtree, reverse in-order traversal retrieves the keys in descending sorted order.

Arbitrary trees

To traverse arbitrary trees (not necessarily binary trees) with depth-first search, perform the following operations at each node:

  1. If the current node is empty then return.
  2. Visit the current node for pre-order traversal.
  3. For each i from 1 to the current node's number of subtrees − 1, or from the latter to the former for reverse traversal, do:
    1. Recursively traverse the current node's i-th subtree.
    2. Visit the current node for in-order traversal.
  4. Recursively traverse the current node's last subtree.
  5. Visit the current node for post-order traversal.

Depending on the problem at hand, pre-order, post-order, and especially one of the number of subtrees − 1 in-order operations may be optional. Also, in practice more than one of pre-order, post-order, and in-order operations may be required. For example, when inserting into a ternary tree, a pre-order operation is performed by comparing items. A post-order operation may be needed afterwards to re-balance the tree.

Level-order: F, B, G, A, D, I, C, E, H. Sorted binary tree breadth-first traversal.svg
Level-order: F, B, G, A, D, I, C, E, H.

In breadth-first search (BFS) or level-order search, the search tree is broadened as much as possible before going to the next depth.

Other types

There are also tree traversal algorithms that classify as neither depth-first search nor breadth-first search. One such algorithm is Monte Carlo tree search, which concentrates on analyzing the most promising moves, basing the expansion of the search tree on random sampling of the search space.

Applications

Tree representing the arithmetic expression: A * (B - C) + (D + E) AST binary tree arith variables.svg
Tree representing the arithmetic expression: A * (BC) + (D + E)

Pre-order traversal can be used to make a prefix expression (Polish notation) from expression trees: traverse the expression tree pre-orderly. For example, traversing the depicted arithmetic expression in pre-order yields "+ * ABC + DE". In prefix notation, there is no need for any parentheses as long as each operator has a fixed number of operands. Pre-order traversal is also used to create a copy of the tree.

Post-order traversal can generate a postfix representation (Reverse Polish notation) of a binary tree. Traversing the depicted arithmetic expression in post-order yields "ABC − * DE + +"; the latter can easily be transformed into machine code to evaluate the expression by a stack machine. Post-order traversal is also used to delete the tree. Each node is freed after freeing its children.

In-order traversal is very commonly used on binary search trees because it returns values from the underlying set in order, according to the comparator that set up the binary search tree.

Implementations

Depth-first search implementation

Pre-order implementation

procedure preorder(node)     if node = nullreturn     visit(node)     preorder(node.left)     preorder(node.right) 
procedure iterativePreorder(node)     if node = nullreturn     stack ← empty stack     stack.push(node)     whilenot stack.isEmpty()         node ← stack.pop()         visit(node)         // right child is pushed first so that left is processed first         if node.right ≠ null             stack.push(node.right)         if node.left ≠ null             stack.push(node.left)

Post-order implementation

procedure postorder(node)     if node = nullreturn     postorder(node.left)     postorder(node.right)     visit(node)
procedure iterativePostorder(node)     stack ← empty stack     lastNodeVisited ← nullwhilenot stack.isEmpty() or node ≠ nullif node ≠ null             stack.push(node)             node ← node.left         else             peekNode ← stack.peek()             // if right child exists and traversing node             // from left child, then move right             if peekNode.right ≠ nulland lastNodeVisited ≠ peekNode.right                 node ← peekNode.right             else                 visit(peekNode)                 lastNodeVisited ← stack.pop()

In-order implementation

procedure inorder(node)     if node = nullreturn     inorder(node.left)     visit(node)     inorder(node.right)
procedure iterativeInorder(node)     stack ← empty stackwhilenot stack.isEmpty() or node ≠ nullif node ≠ null             stack.push(node)             node ← node.left         else             node ← stack.pop()             visit(node)             node ← node.right

Another variant of Pre-order

If the tree is represented by an array (first index is 0), it is possible to calculate the index of the next element: [8] [ clarification needed ]

procedure bubbleUp(array, i, leaf)     k ← 1     i ← (i - 1)/2     while (leaf + 1) % (k * 2) ≠ k         i ← (i - 1)/2         k ← 2 * k     return i  procedure preorder(array)     i ← 0     while i ≠ array.size         visit(array[i])         if i = size - 1             i ← size         else if i < size/2             i ← i * 2 + 1         else             leaf ← i - size/2             parent ← bubble_up(array, i, leaf)             i ← parent * 2 + 2

Advancing to the next or previous node

The node to be started with may have been found in the binary search tree bst by means of a standard search function, which is shown here in an implementation without parent pointers, i.e. it uses a stack for holding the ancestor pointers.

procedure search(bst, key)     // returns a (node, stack)     node ← bst.root     stack ← empty stackwhile node ≠ null         stack.push(node)         if key = node.key             return (node, stack)         if key < node.key             node ← node.left             else             node ← node.right     return (null, empty stack)

The function inorderNext [2] :60 returns an in-order-neighbor of node, either the in-order-successor (for dir=1) or the in-order-predecessor (for dir=0), and the updated stack, so that the binary search tree may be sequentially in-order-traversed and searched in the given direction dir further on.

procedure inorderNext(node, dir, stack)     newnode ← node.child[dir]     if newnode ≠ nulldo             node ← newnode             stack.push(node)             newnode ← node.child[1-dir]         until newnode = nullreturn (node, stack)     // node does not have a dir-child:     doif stack.isEmpty()             return (null, empty stack)         oldnode ← node         node ← stack.pop()   // parent of oldnode     until oldnode ≠ node.child[dir]     // now oldnode = node.child[1-dir],     // i.e. node = ancestor (and predecessor/successor) of original node     return (node, stack)

Note that the function does not use keys, which means that the sequential structure is completely recorded by the binary search tree’s edges. For traversals without change of direction, the (amortised) average complexity is because a full traversal takes steps for a BST of size 1 step for edge up and 1 for edge down. The worst-case complexity is with as the height of the tree.

All the above implementations require stack space proportional to the height of the tree which is a call stack for the recursive and a parent (ancestor) stack for the iterative ones. In a poorly balanced tree, this can be considerable. With the iterative implementations we can remove the stack requirement by maintaining parent pointers in each node, or by threading the tree (next section).

Morris in-order traversal using threading

A binary tree is threaded by making every left child pointer (that would otherwise be null) point to the in-order predecessor of the node (if it exists) and every right child pointer (that would otherwise be null) point to the in-order successor of the node (if it exists).

Advantages:

  1. Avoids recursion, which uses a call stack and consumes memory and time.
  2. The node keeps a record of its parent.

Disadvantages:

  1. The tree is more complex.
  2. We can make only one traversal at a time.
  3. It is more prone to errors when both the children are not present and both values of nodes point to their ancestors.

Morris traversal is an implementation of in-order traversal that uses threading: [9]

  1. Create links to the in-order successor.
  2. Print the data using these links.
  3. Revert the changes to restore original tree.

Breadth-first search

Also, listed below is pseudocode for a simple queue based level-order traversal, and will require space proportional to the maximum number of nodes at a given depth. This can be as much as half the total number of nodes. A more space-efficient approach for this type of traversal can be implemented using an iterative deepening depth-first search.

procedure levelorder(node)     queue ← empty queue     queue.enqueue(node)     whilenot queue.isEmpty()         node ← queue.dequeue()         visit(node)         if node.left ≠ null             queue.enqueue(node.left)         if node.right ≠ null             queue.enqueue(node.right)

If the tree is represented by an array (first index is 0), it is sufficient iterating through all elements:

procedure levelorder(array)     for i from 0 to array.size         visit(array[i])

Infinite trees

While traversal is usually done for trees with a finite number of nodes (and hence finite depth and finite branching factor) it can also be done for infinite trees. This is of particular interest in functional programming (particularly with lazy evaluation), as infinite data structures can often be easily defined and worked with, though they are not (strictly) evaluated, as this would take infinite time. Some finite trees are too large to represent explicitly, such as the game tree for chess or go, and so it is useful to analyze them as if they were infinite.

A basic requirement for traversal is to visit every node eventually. For infinite trees, simple algorithms often fail this. For example, given a binary tree of infinite depth, a depth-first search will go down one side (by convention the left side) of the tree, never visiting the rest, and indeed an in-order or post-order traversal will never visit any nodes, as it has not reached a leaf (and in fact never will). By contrast, a breadth-first (level-order) traversal will traverse a binary tree of infinite depth without problem, and indeed will traverse any tree with bounded branching factor.

On the other hand, given a tree of depth 2, where the root has infinitely many children, and each of these children has two children, a depth-first search will visit all nodes, as once it exhausts the grandchildren (children of children of one node), it will move on to the next (assuming it is not post-order, in which case it never reaches the root). By contrast, a breadth-first search will never reach the grandchildren, as it seeks to exhaust the children first.

A more sophisticated analysis of running time can be given via infinite ordinal numbers; for example, the breadth-first search of the depth 2 tree above will take ω·2 steps: ω for the first level, and then another ω for the second level.

Thus, simple depth-first or breadth-first searches do not traverse every infinite tree, and are not efficient on very large trees. However, hybrid methods can traverse any (countably) infinite tree, essentially via a diagonal argument ("diagonal"—a combination of vertical and horizontal—corresponds to a combination of depth and breadth).

Concretely, given the infinitely branching tree of infinite depth, label the root (), the children of the root (1), (2), ..., the grandchildren (1, 1), (1, 2), ..., (2, 1), (2, 2), ..., and so on. The nodes are thus in a one-to-one correspondence with finite (possibly empty) sequences of positive numbers, which are countable and can be placed in order first by sum of entries, and then by lexicographic order within a given sum (only finitely many sequences sum to a given value, so all entries are reached—formally there are a finite number of compositions of a given natural number, specifically 2n−1 compositions of n ≥ 1), which gives a traversal. Explicitly:

  1. ()
  2. (1)
  3. (1, 1) (2)
  4. (1, 1, 1) (1, 2) (2, 1) (3)
  5. (1, 1, 1, 1) (1, 1, 2) (1, 2, 1) (1, 3) (2, 1, 1) (2, 2) (3, 1) (4)

etc.

This can be interpreted as mapping the infinite depth binary tree onto this tree and then applying breadth-first search: replace the "down" edges connecting a parent node to its second and later children with "right" edges from the first child to the second child, from the second child to the third child, etc. Thus at each step one can either go down (append a (, 1) to the end) or go right (add one to the last number) (except the root, which is extra and can only go down), which shows the correspondence between the infinite binary tree and the above numbering; the sum of the entries (minus one) corresponds to the distance from the root, which agrees with the 2n−1 nodes at depth n − 1 in the infinite binary tree (2 corresponds to binary).

Related Research Articles

<span class="mw-page-title-main">AVL tree</span> Self-balancing binary search tree

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

<span class="mw-page-title-main">Binary search tree</span> Rooted binary tree data structure

In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure with the key of each internal node being greater than all the keys in the respective node's left subtree and less than the ones in its right subtree. The time complexity of operations on the binary search tree is linear with respect to the height of the tree.

<span class="mw-page-title-main">Binary tree</span> Limited form of tree data structure

In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.

<span class="mw-page-title-main">Flood fill</span> Algorithm in computer graphics to add color or texture

Flood fill, also called seed fill, is a flooding algorithm that determines and alters the area connected to a given node in a multi-dimensional array with some matching attribute. It is used in the "bucket" fill tool of paint programs to fill connected, similarly-colored areas with a different color, and in games such as Go and Minesweeper for determining which pieces are cleared. A variant called boundary fill uses the same algorithms but is defined as the area connected to a given node that does not have a particular attribute.

In computer science, a red–black tree is a specialised binary search tree data structure noted for fast storage and retrieval of ordered information, and a guarantee that operations will complete within a known time. Compared to other self-balancing binary search trees, the nodes in a red-black tree hold an extra bit called "color" representing "red" and "black" which is used when re-organising the tree to ensure that it is always approximately balanced.

<span class="mw-page-title-main">Tree (data structure)</span> Abstract data type simulating a hierarchical tree structure and represented as a set of linked nodes

In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children, but must be connected to exactly one parent, except for the root node, which has no parent. These constraints mean there are no cycles or "loops", and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line.

<span class="mw-page-title-main">Breadth-first search</span> Algorithm to search the nodes of a graph

Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Extra memory, usually a queue, is needed to keep track of the child nodes that were encountered but not yet explored.

<span class="mw-page-title-main">Depth-first search</span> Search algorithm

Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node and explores as far as possible along each branch before backtracking. Extra memory, usually a stack, is needed to keep track of the nodes discovered so far along a specified branch which helps in backtracking of the graph.

<span class="mw-page-title-main">Smoothsort</span> Comparison-based sorting algorithm

In computer science, smoothsort is a comparison-based sorting algorithm. A variant of heapsort, it was invented and published by Edsger Dijkstra in 1981. Like heapsort, smoothsort is an in-place algorithm with an upper bound of O(n log n) operations (see big O notation), but it is not a stable sort. The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.

In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and less than any keys in subtrees on the right.

<span class="mw-page-title-main">Rope (data structure)</span> Data structure for storing strings

In computer programming, a rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.

In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is generative recursion which may lack a definite "direction" inherent in corecursion and recursion.

<i>m</i>-ary tree Tree data structure in which each node has at most m children.

In graph theory, an m-ary tree is an arborescence in which each node has no more than m children. A binary tree is the special case where m = 2, and a ternary tree is another case with m = 3 that limits its children to three.

In computer science, a leftist tree or leftist heap is a priority queue implemented with a variant of a binary heap. Every node x has an s-value which is the distance to the nearest leaf in subtree rooted at x. In contrast to a binary heap, a leftist tree attempts to be very unbalanced. In addition to the heap property, leftist trees are maintained so the right descendant of each node has the lower s-value.

In computer science, a ternary search tree is a type of trie where nodes are arranged in a manner similar to a binary search tree, but with up to three children rather than the binary tree's limit of two. Like other prefix trees, a ternary search tree can be used as an associative map structure with the ability for incremental string search. However, ternary search trees are more space efficient compared to standard prefix trees, at the cost of speed. Common applications for ternary search trees include spell-checking and auto-completion.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

<span class="mw-page-title-main">Threaded binary tree</span> Binary tree variant

In computing, a threaded binary tree is a binary tree variant that facilitates traversal in a particular order.

In computer science, graph traversal refers to the process of visiting each vertex in a graph. Such traversals are classified by the order in which the vertices are visited. Tree traversal is a special case of graph traversal.

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

<span class="mw-page-title-main">Fenwick tree</span> Data structure

A Fenwick tree or binary indexed tree(BIT) is a data structure that can efficiently update values and calculate prefix sums in an array of values.

References

  1. "Lecture 8, Tree Traversal" . Retrieved 2 May 2015.
  2. 1 2 Pfaff, Ben (2004). An Introduction to Binary Search Trees and Balanced Trees. Free Software Foundation, Inc.
  3. Binary Tree Traversal Methods
  4. "Preorder Traversal Algorithm" . Retrieved 2 May 2015.
  5. L before R means the (standard) counter-clockwise traversal—as in the figure.
    The execution of N before, between, or after L and R determines one of the described methods.
    If the traversal is taken the other way around (clockwise) then the traversal is called reversed. This is described in particular for reverse in-order, when the data are to be retrieved in descending order.
  6. "Algorithms, Which combinations of pre-, post- and in-order sequentialisation are unique?, Computer Science Stack Exchange" . Retrieved 2 May 2015.
  7. Wittman, Todd. "Tree Traversal" (PDF). UCLA Math. Archived from the original (PDF) on February 13, 2015. Retrieved January 2, 2016.
  8. "constexpr tree structures". Fekir's Blog. 9 August 2021. Retrieved 2021-08-15.
  9. Morris, Joseph M. (1979). "Traversing binary trees simply and cheaply". Information Processing Letters . 9 (5): 197–200. doi:10.1016/0020-0190(79)90068-1.

Sources