Lowest common ancestor

Last updated
In this tree, the lowest common ancestor of the nodes x and y is marked in dark green. Other common ancestors are shown in light green. Lowest common ancestor.svg
In this tree, the lowest common ancestor of the nodes x and y is marked in dark green. Other common ancestors are shown in light green.

In graph theory and computer science, the lowest common ancestor (LCA) (also called least common ancestor) of two nodes v and w in a tree or directed acyclic graph (DAG) T is the lowest (i.e. deepest) node that has both v and w as descendants, where we define each node to be a descendant of itself (so if v has a direct connection from w, w is the lowest common ancestor).

Contents

The LCA of v and w in T is the shared ancestor of v and w that is located farthest from the root. Computation of lowest common ancestors may be useful, for instance, as part of a procedure for determining the distance between pairs of nodes in a tree: the distance from v to w can be computed as the distance from the root to v, plus the distance from the root to w, minus twice the distance from the root to their lowest common ancestor ( Djidjev, Pantziou & Zaroliagis 1991 ).

In a tree data structure where each node points to its parent, the lowest common ancestor can be easily determined by finding the first intersection of the paths from v and w to the root. In general, the computational time required for this algorithm is O(h) where h is the height of the tree (length of longest path from a leaf to the root). However, there exist several algorithms for processing trees so that lowest common ancestors may be found more quickly. Tarjan's off-line lowest common ancestors algorithm, for example, preprocesses a tree in linear time to provide constant-time LCA queries. In general DAGs, similar algorithms exist, but with super-linear complexity.

History

The lowest common ancestor problem was defined by AlfredAho , John Hopcroft ,and Jeffrey Ullman  ( 1973 ), but DovHareland Robert Tarjan  ( 1984 ) were the first to develop an optimally efficient lowest common ancestor data structure. Their algorithm processes any tree in linear time, using a heavy path decomposition, so that subsequent lowest common ancestor queries may be answered in constant time per query. However, their data structure is complex and difficult to implement. Tarjan also found a simpler but less efficient algorithm, based on the union-find data structure, for computing lowest common ancestors of an offline batch of pairs of nodes.

BaruchSchieber and Uzi Vishkin  ( 1988 ) simplified the data structure of Harel and Tarjan, leading to an implementable structure with the same asymptotic preprocessing and query time bounds. Their simplification is based on the principle that, in two special kinds of trees, lowest common ancestors are easy to determine: if the tree is a path, then the lowest common ancestor can be computed simply from the minimum of the levels of the two queried nodes, while if the tree is a complete binary tree, the nodes may be indexed in such a way that lowest common ancestors reduce to simple binary operations on the indices. The structure of Schieber and Vishkin decomposes any tree into a collection of paths, such that the connections between the paths have the structure of a binary tree, and combines both of these two simpler indexing techniques.

OmerBerkmanandUzi Vishkin ( 1993 ) discovered a completely new way to answer lowest common ancestor queries, again achieving linear preprocessing time with constant query time. Their method involves forming an Euler tour of a graph formed from the input tree by doubling every edge, and using this tour to write a sequence of level numbers of the nodes in the order the tour visits them; a lowest common ancestor query can then be transformed into a query that seeks the minimum value occurring within some subinterval of this sequence of numbers. They then handle this range minimum query problem (RMQ) by combining two techniques, one technique based on precomputing the answers to large intervals that have sizes that are powers of two, and the other based on table lookup for small-interval queries. This method was later presented in a simplified form by MichaelBenderand Martin Farach-Colton  ( 2000 ). As had been previously observed by Gabow, Bentley & Tarjan (1984), the range minimum problem can in turn be transformed back into a lowest common ancestor problem using the technique of Cartesian trees.

Further simplifications were made by Alstrup et al. (2004) and Fischer & Heun (2006).

Sleator and Tarjan  ( 1983 ) proposed the dynamic LCA variant of the problem in which the data structure should be prepared to handle LCA queries intermixed with operations that change the tree (that is, rearrange the tree by adding and removing edges). This variant can be solved in time in the total size of the tree for all modifications and queries. This is done by maintaining the forest using the dynamic trees data structure with partitioning by size; this then maintains a heavy-light decomposition of each tree, and allows LCA queries to be carried out in logarithmic time in the size of the tree.

Linear space and constant search time solution to LCA in trees

As mentioned above, LCA can be reduced to RMQ. An efficient solution to the resulting RMQ problem starts by partitioning the number sequence into blocks. Two different techniques are used for queries across blocks and within blocks.

Reduction from LCA to RMQ

Reduction of LCA to RMQ starts by walking the tree. For each node visited, record in sequence its label and depth. Suppose nodes x and y occur in positions i and j in this sequence, respectively. Then the LCA of x and y will be found in position RMQ(i, j), where the RMQ is taken over the depth values.

An example showing how LCA is reduced to RMQ. LMC to RMQ.png
An example showing how LCA is reduced to RMQ.

Linear space and constant search time algorithm for RMQ reduced from LCA

Despite that there exists a constant time and linear space solution for general RMQ, but a simplified solution can be applied that make uses of LCA’s properties. This simplified solution can only be used for RMQ reduced from LCA.

Similar to the solution mentioned above, we divide the sequence into each block , where each block has size of .

An illustration showing a RMQ problem is divided into blocks that each has size = b Splitting RMQ into blocks.png
An illustration showing a RMQ problem is divided into blocks that each has size = b

By splitting the sequence into blocks, the  query can be solved by solving two different cases:

Case 1: if i and j are in different blocks

To answer the query in case one, there are 3 groups of variables precomputed to help reduce query time.

First, the minimum element with the smallest index in each block is precomputed and denoted as . A set of takes space.

Second, given the set of , the RMQ query for this set is precomputed using the solution with constant time and linearithmic space. There are blocks, so the lookup table in that solution takes space. Because , = space. Hence, the precomputed RMQ query using the solution with constant time and linearithmic space on these blocks only take space.

Third, in each block , let be an index in such that . For all from until , block is divided into two intervals and . Then the minimum element with the smallest index for intervals in and in each block is precomputed. Such minimum elements are called as prefix min for the interval in and suffix min for the interval in . Each iteration of computes a pair of prefix min and suffix min. Hence, the total number of prefix mins and suffix mins in a block is . Since there are blocks, in total, all prefix min and suffix min arrays take which is spaces.

In total, it takes space to store all 3 groups of precomputed variables mentioned above.

Therefore, answering the query in case 1 is simply tasking the minimum of the following three questions:

Let be the block that contains the element at index , and for index .

  1. The suffix min in in the block
  2. Answering the RMQ query on a subset of s from blocks using the solution with constant time and linearithmic space
  3. The prefix min in in the block

All 3 questions can be answered in constant time. Hence, case 1 can be answered in linear space and constant time.

Case 2: if i and j are in the same block

The sequence of RMQ that reduced from LCA has one property that a normal RMQ doesn’t have. The next element is always +1 or -1 from the current element. For example:

An illustration showing how the example RMQ is encoded as a bitstring +-1 for RMQ reduced from LCA.png
An illustration showing how the example RMQ is encoded as a bitstring

Therefore, each block can be encoded as a bitstring with 0 represents the current depth -1, and 1 represent the current depth +1. This transformation turns a block into a bitstring of size . A bitstring of size has possible bitstrings. Since , so .

Hence, is always one of the possible bitstring with size of .

Then, for each possible bitstrings, we apply the naïve quadratic space constant time solution. This will take up spaces, which is .

Therefore, answering the query in case 2 is simply finding the corresponding block (in which is a bitstring) and perform a table lookup for that bitstring. Hence, case 2 can be solved using linear space with constant searching time.

Extension to directed acyclic graphs

A DAG with the lowest common ancestors of x and y in dark green and other common ancestors in light green. Lowest common ancestors in a DAG.svg
A DAG with the lowest common ancestors of x and y in dark green and other common ancestors in light green.

While originally studied in the context of trees, the notion of lowest common ancestors can be defined for directed acyclic graphs (DAGs), using either of two possible definitions. In both, the edges of the DAG are assumed to point from parents to children.

In a tree, the lowest common ancestor is unique; in a DAG of n nodes, each pair of nodes may have as much as n-2 LCAs ( Bender et al. 2005 ), while the existence of an LCA for a pair of nodes is not even guaranteed in arbitrary connected DAGs.

A brute-force algorithm for finding lowest common ancestors is given by Aït-Kaci et al. (1989): find all ancestors of x and y, then return the maximum element of the intersection of the two sets. Better algorithms exist that, analogous to the LCA algorithms on trees, preprocess a graph to enable constant-time LCA queries. The problem of LCA existence can be solved optimally for sparse DAGs by means of an O(|V||E|) algorithm due to Kowaluk & Lingas (2005).

Dash et al. (2013) present a unified framework for preprocessing directed acyclic graphs to compute a representative lowest common ancestor in a rooted DAG in constant time. Their framework can achieve near-linear preprocessing times for sparse graphs and is available for public use. [1]

Applications

The problem of computing lowest common ancestors of classes in an inheritance hierarchy arises in the implementation of object-oriented programming systems ( Aït-Kaci et al. 1989 ). The LCA problem also finds applications in models of complex systems found in distributed computing ( Bender et al. 2005 ).

See also

Related Research Articles

<span class="mw-page-title-main">Dijkstra's algorithm</span> Algorithm for finding shortest paths

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

In computer science, Tarjan's off-line lowest common ancestors algorithm is an algorithm for computing lowest common ancestors for pairs of nodes in a tree, based on the union-find data structure. The lowest common ancestor of two nodes d and e in a rooted tree T is the node g that is an ancestor of both d and e and that has the greatest depth in T. It is named after Robert Tarjan, who discovered the technique in 1979. Tarjan's algorithm is an offline algorithm; that is, unlike other lowest common ancestor algorithms, it requires that all pairs of nodes for which the lowest common ancestor is desired must be specified in advance. The simplest version of the algorithm uses the union-find data structure, which unlike other lowest common ancestor data structures can take more than constant time per operation when the number of pairs of nodes is similar in magnitude to the number of nodes. A later refinement by Gabow & Tarjan (1983) speeds the algorithm up to linear time.

<span class="mw-page-title-main">Strongly connected component</span> Partition of a graph whose components are reachable from all vertices

In the mathematical theory of directed graphs, a graph is said to be strongly connected if every vertex is reachable from every other vertex. The strongly connected components of a directed graph form a partition into subgraphs that are themselves strongly connected. It is possible to test the strong connectivity of a graph, or to find its strongly connected components, in linear time (that is, Θ(V + E )).

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

In graph theory, reachability refers to the ability to get from one vertex to another within a graph. A vertex can reach a vertex if there exists a sequence of adjacent vertices which starts with and ends with .

<span class="mw-page-title-main">Biconnected component</span> Maximal biconnected subgraph

In graph theory, a biconnected component or block is a maximal biconnected subgraph. Any connected graph decomposes into a tree of biconnected components called the block-cut tree of the graph. The blocks are attached to each other at shared vertices called cut vertices or separating vertices or articulation points. Specifically, a cut vertex is any vertex whose removal increases the number of connected components. A block containing at most one cut vertex is called a leaf block, it corresponds to a leaf vertex in the block-cut tree.

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.

In computer science, a range minimum query (RMQ) solves the problem of finding the minimal value in a sub-array of an array of comparable objects. Range minimum queries have several use cases in computer science, such as the lowest common ancestor problem and the longest common prefix problem (LCP).

<span class="mw-page-title-main">Decision tree model</span> Model of computational complexity

In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of queries or tests that are done adaptively, so the outcome of previous tests can influence the tests performed next.

<span class="mw-page-title-main">Euler tour technique</span>

The Euler tour technique (ETT), named after Leonhard Euler, is a method in graph theory for representing trees. The tree is viewed as a directed graph that contains two directed edges for each edge in the tree. The tree can then be represented as a Eulerian circuit of the directed graph, known as the Euler tour representation (ETR) of the tree. The ETT allows for efficient, parallel computation of solutions to common problems in algorithmic graph theory. It was introduced by Tarjan and Vishkin in 1984.

<span class="mw-page-title-main">Top tree</span> Data structure

A top tree is a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median.

In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.

In graph theory and theoretical computer science, the level ancestor problem is the problem of preprocessing a given rooted tree T into a data structure that can determine the ancestor of a given node at a given distance from the root of the tree.

In computer science, the longest common prefix array is an auxiliary data structure to the suffix array. It stores the lengths of the longest common prefixes (LCPs) between all pairs of consecutive suffixes in a sorted suffix array.

In computer science, one approach to the dynamic optimality problem on online algorithms for binary search trees involves reformulating the problem geometrically, in terms of augmenting a set of points in the plane with as few additional points as possible to avoid rectangles with only two points on their boundary.

In computing and graph theory, a dynamic connectivity structure is a data structure that dynamically maintains information about the connected components of a graph.

In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:

References

  1. "Try our source code for free".