Distributed tree search

Last updated October 20, 2021

Distributed tree search (DTS) algorithm is a class of algorithms for searching values in an efficient and distributed manner. Their purpose is to iterate through a tree by working along multiple branches in parallel and merging the results of each branch into one common solution, in order to minimize time spent searching for a value in a tree-like data structure.

Overview

The Distributed Tree Search Algorithm (also known as Korf–Ferguson algorithm) was created to solve the following problem: "Given a tree with non-uniform branching factor and depth, search it in parallel with an arbitrary number of processors as fast as possible."

The top-level part of this algorithm is general and does not use a particular existing type of tree-search, but it can be easily specialized to fit any type of non-distributed tree-search.

DTS consists of using multiple processes, each with a node and a set of processors attached, with the goal of searching the sub-tree below the said node. Each process then divides itself into multiple coordinated sub-processes which recursively divide themselves again until an optimal way to search the tree has been found based on the number of processors available to each process. Once a process finishes, DTS dynamically reassigns the processors to other processes as to keep the efficiency to a maximum through good load-balancing, especially in irregular trees.

Once a process finishes searching, it recursively sends and merges a resulting signal to its parent-process, until all the different sub-answers have been merged and the entire problem has been solved.^[1]

Applications

DTS is only applicable under two major conditions: the data structure to search through is a tree, and the algorithm can make use of at least one computation unit (Although it cannot be considered as distributed if there is only one).

One major example of the everyday use of DTS is network routing. The Internet can be seen as a tree of IP addresses, and an analogy to a routing protocol could be how post offices work in the real world. Since there are over 4.3 billion IP addresses currently, society heavily depends on the time the data takes to find its way to its destination. As such, IP-routing divides the work into multiple sub-units which each have different scales of calculation capabilities and use each other's result to find the route in a very efficient manner. This is an instance of DTS that affects over 43% of the world's population, for reasons going from entertainment to national security.^[2]

Alternatives

Although DTS is currently one of the most widely used algorithms, many of its applications have alternatives to them which could potentially develop into more efficient, less resource-demanding solutions, were they more researched.

One of the more controversial examples is Big-Data processing. In applications like Google Search Engine, Facebook, YouTube, search needs to be optimized to keep waiting time inside a reasonable window. This could be achieved through the plain use of DTS, but other algorithms are used in place (for example data-hashing in SQL databases), or in conjunction (Facebook's Haystack algorithm groups parallel tree-search, data-hashing and memory-ordering/sorting).^[3]

One of the more important limits of DTS is the fact that it requires a tree as input. Trees are a sub-instance of a data structure known as Graphs, which means every Graph can be converted into a tree. Although there currently exists no better way to search through trees than Korf-Ferguson's algorithm, each task has different particularities and in most cases, there will exist more efficient data structures to represent the problem and solve it than through tree-search. And so there exist instances of tree structures with cycles that cannot possibly be faster than a graph-search on the same structure with the same processing power.

Controversy

There are few controversies around Korf-Ferguson's DTS algorithm, since it is recognized as very complete, but simple. It is very often used as a stepping stone for students to discover the fundamentals and key concepts of distributed problem-solving.

The most important challenge to this algorithmic concept was an article by Kröll B, « Balanced Distributed Search Trees Do Not Exist », which does not attack the veracity or current efficiency of the algorithm, but rather the fact that DTS itself, no matter how many improvements are made to it (for example balancing the input tree before-hand), will never be able to reach optimal resolution-time.^[4] This opens a new view point: are too many resources used into the completion of DTS, which blocks new algorithms with higher efficiency-potential from getting researched and developed? Another limit of DTS is the fact that no matter how efficient the division, coordination and merging of the solutions is, it will always be limited by the material number or processors and their processing power.

Related Research Articles

Distributed computing is a field of computer science that studies distributed systems. A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. The components interact with one another in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. It deals with a central challenge that, when components of a system fails, it doesn't imply the entire system fails. Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications.

In computer science, a priority queue is an abstract data type similar to a regular queue or stack data structure in which each element additionally has a "priority" associated with it. In a priority queue, an element with high priority is served before an element with low priority. In some implementations, if two elements have the same priority, they are served according to the order in which they were enqueued, while in other implementations, ordering of elements with the same priority is undefined.

In computer science, a search algorithm is an algorithm which solves a search problem. Search algorithms work to retrieve information stored within some data structure, or calculated in the search space of a problem domain, either with discrete or continuous values.

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

Kruskal's algorithm finds a minimum spanning forest of an undirected edge-weighted graph. If the graph is connected, it finds a minimum spanning tree. It is a greedy algorithm in graph theory as in each step it adds the next lowest-weight edge that will not form a cycle to the minimum spanning forest.

In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

In computing, load balancing refers to the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

Breadth-first search Algorithm for searching the nodes of a graph in order by their hop count from a starting node

Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Extra memory, usually a queue, is needed to keep track of the child nodes that were encountered but not yet explored.

In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

In the mathematical field of graph theory, a spanning treeT of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G. In general, a graph may have several spanning trees, but a graph that is not connected will not contain a spanning tree. If all of the edges of G are also edges of a spanning tree T of G, then G is a tree and is identical to T.

In graph theory, a biconnected component is a maximal biconnected subgraph. Any connected graph decomposes into a tree of biconnected components called the block-cut tree of the graph. The blocks are attached to each other at shared vertices called cut vertices or separating vertices or articulation points. Specifically, a cut vertex is any vertex whose removal increases the number of connected components.

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values.

In computer science, fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in the sequence are faster. The original version of fractional cascading, introduced in two papers by Chazelle and Guibas in 1986, combined the idea of cascading, originating in range searching data structures of Lueker (1978) and Willard (1978), with the idea of fractional sampling, which originated in Chazelle (1983). Later authors introduced more complex forms of fractional cascading that allow the data structure to be maintained as the data changes by a sequence of discrete insertion and deletion events.

In graph theory, a branch of mathematics, the triconnected components of a biconnected graph are a system of smaller graphs that describe all of the 2-vertex cuts in the graph. An SPQR tree is a tree data structure used in computer science, and more specifically graph algorithms, to represent the triconnected components of a graph. The SPQR tree of a graph may be constructed in linear time and has several applications in dynamic graph algorithms and graph drawing.

In computer science, a Cartesian tree is a binary tree derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap-ordered and that a symmetric (in-order) traversal of the tree returns the original sequence. Introduced by Vuillemin (1980) in the context of geometric range searching data structures, Cartesian trees have also been used in the definition of the treap and randomized binary search tree data structures for binary search problems. The Cartesian tree for a sequence may be constructed in linear time using a stack-based algorithm for finding all nearest smaller values in a sequence.

In computer science, the all nearest smaller values problem is the following task: for each position in a sequence of numbers, search among the previous positions for the last position that contains a smaller value. This problem can be solved efficiently both by parallel and non-parallel algorithms: Berkman, Schieber & Vishkin (1993), who first identified the procedure as a useful subroutine for other parallel programs, developed efficient algorithms to solve it in the Parallel Random Access Machine model; it may also be solved in linear time on a non-parallel computer using a stack-based algorithm. Later researchers have studied algorithms to solve it in other models of parallel computation.

Image segmentation strives to partition a digital image into regions of pixels with similar properties, e.g. homogeneity. The higher-level region representation simplifies image analysis tasks such as counting objects or detecting changes, because region attributes can be compared more readily than raw pixels.

The breadth-first-search algorithm is a way to explore the vertexes of a graph layer by layer. It is a basic algorithm in graph theory which can be used as a part of other graph algorithms. For instance, BFS is used by Dinic's algorithm to find maximum flow in a graph. Moreover, BFS is also one of kernel algorithms in Graph500 benchmark, which is a benchmark for data-intensive supercomputing problems. This article discusses the possibility of speeding up BFS through the use of parallel computing.

References

↑ KORF Richard E., FERGUSON Chris (1988). "Distributed Tree Search and its Application to Alpha-Beta Pruning" (PDF). AAAI. University of California, Los Angeles. Archived from the original (PDF) on 2016-05-31.
↑ "What is IP Routing ?". MetaSwitch. MetaSwitch Networks. 2016.
↑ Vajgel, Peter (April 30, 2009). "Needle in a Haystack : Efficient Storage of Billions of Photos". Facebook. Facebook (company).
↑ Kröll, Brigitte; Widmayer, Peter (1995-08-16). Akl, Selim G.; Dehne, Frank; Sack, Jörg-Rüdiger; Santoro, Nicola (eds.). Balanced distributed search trees do not exist. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 50–61. doi:10.1007/3-540-60220-8_50. hdl:20.500.11850/68782. ISBN 9783540602200.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] KORF Richard E., FERGUSON Chris (1988). "Distributed Tree Search and its Application to Alpha-Beta Pruning" (PDF). AAAI. University of California, Los Angeles. Archived from the original (PDF) on 2016-05-31.

[2] "What is IP Routing ?". MetaSwitch. MetaSwitch Networks. 2016.

[3] Vajgel, Peter (April 30, 2009). "Needle in a Haystack : Efficient Storage of Billions of Photos". Facebook. Facebook (company).

[4] Kröll, Brigitte; Widmayer, Peter (1995-08-16). Akl, Selim G.; Dehne, Frank; Sack, Jörg-Rüdiger; Santoro, Nicola (eds.). Balanced distributed search trees do not exist. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 50–61. doi:10.1007/3-540-60220-8_50. hdl:20.500.11850/68782. ISBN 9783540602200.

[1]

[2]

[3]

[4]