Top tree

Last updated

A top tree is a data structure based on a binary tree for unrooted dynamic trees that is used mainly for various path-related operations. It allows simple divide-and-conquer algorithms. It has since been augmented to maintain dynamically various properties of a tree such as diameter, center and median.

Contents

A top tree is defined for an underlying tree and a set of at most two vertices called as External Boundary Vertices

An image depicting a top tree built on an underlying tree (black nodes). A tree divided into edge clusters and the complete top-tree for it. Filled nodes in the top-tree are path-clusters, while small circle nodes are leaf-clusters. The big circle node is the root. Capital letters denote clusters, non-capital letters are nodes. Top tree.jpg
An image depicting a top tree built on an underlying tree (black nodes). A tree divided into edge clusters and the complete top-tree for it. Filled nodes in the top-tree are path-clusters, while small circle nodes are leaf-clusters. The big circle node is the root. Capital letters denote clusters, non-capital letters are nodes.

Glossary

Boundary Node

See Boundary Vertex

Boundary Vertex

A vertex in a connected subtree is a Boundary Vertex if it is connected to a vertex outside the subtree by an edge.

External Boundary Vertices

Up to a pair of vertices in the top tree can be called as External Boundary Vertices, they can be thought of as Boundary Vertices of the cluster which represents the entire top tree.

Cluster

A cluster is a connected subtree with at most two Boundary Vertices. The set of Boundary Vertices of a given cluster is denoted as With each cluster the user may associate some meta information and give methods to maintain it under the various internal operations.

Path Cluster

If contains at least one edge then is called a Path Cluster.

Point Cluster

See Leaf Cluster

Leaf Cluster

If does not contain any edge i.e. has only one Boundary Vertex then is called a Leaf Cluster.

Edge Cluster

A Cluster containing a single edge is called an Edge Cluster.

Leaf Edge Cluster

A Leaf in the original Cluster is represented by a Cluster with just a single Boundary Vertex and is called a Leaf Edge Cluster.

Path Edge Cluster

Edge Clusters with two Boundary Nodes are called Path Edge Cluster.

Internal Node

A node in \ is called an Internal Node of

Cluster Path

The path between the Boundary Vertices of is called the cluster path of and it is denoted by

Mergeable Clusters

Two Clusters and are Mergeable if is a singleton set (they have exactly one node in common) and is a Cluster.

Introduction

Top trees are used for maintaining a Dynamic forest (set of trees) under link and cut operations.

The basic idea is to maintain a balanced Binary tree of logarithmic height in the number of nodes in the original tree ( i.e. in time) ; the top tree essentially represents the recursive subdivision of the original tree into clusters.

In general the tree may have weight on its edges.

There is a one-to-one correspondence with the edges of the original tree and the leaf nodes of the top tree and each internal node of represents a cluster that is formed due to the union of the clusters that are its children.

The top tree data structure can be initialized in time.

Therefore the top tree over () is a binary tree such that

A tree with a single vertex has an empty top tree, and one with just an edge is just a single node.

These trees are freely augmentable allowing the user a wide variety of flexibility and productivity without going into the details of the internal workings of the data structure, something which is also referred to as the Black Box.

Dynamic Operations

The following three are the user allowable Forest Updates.

Internal Operations

The Forest updates are all carried out by a sequence of at most Internal Operations, the sequence of which is computed in further time. It may happen that during a tree update, a leaf cluster may change to a path cluster and the converse. Updates to top tree are done exclusively by these internal operations.

The is updated by calling a user defined function associated with each internal operation.

Split is usually implemented using Clean method which calls user method for updates of and using and updates such that it's known there is no pending update needed in its children. Than the is discarded without calling user defined functions. Clean is often required for queries without need to Split. If Split does not use Clean subroutine, and Clean is required, its effect could be achieved with overhead by combining Merge and Split.

The next two functions are analogous to the above two and are used for base clusters.

User can define Choose operation which for a root (nonleaf) cluster selects one of its child clusters. The top tree blackbox provides Search routine, which organizes Choose queries and reorganization of the top tree (using the Internal operations) such that it locates the only edge in intersection of all selected clusters. Sometimes the search should be limited to a path. There is a variant of nonlocal search for such purposes. If there are two external boundary vertices in the root cluster , the edge is searched only on the path . It is sufficient to do following modification: If only one of root cluster children is path cluster, it is selected by default without calling the Choose operation.

Finding i-th edge on longer path from to could be done by =Expose({v,w}) followed by Search() with appropriate Choose. To implement the Choose we use global variable representing and global variable representing Choose selects the cluster with iff length of is at least . To support the operation the length must be maintained in the .

Similar task could be formulated for graph with edges with nonunit lengths. In that case the distance could address an edge or a vertex between two edges. We could define Choose such that the edge leading to the vertex is returned in the latter case. There could be defined update increasing all edge lengths along a path by a constant. In such scenario these updates are done in constant time just in root cluster. Clean is required to distribute the delayed update to the children. The Clean should be called before the Choose is invoked. To maintain length in would in that case require to maintain unitlength in as well.

Finding center of tree containing vertex could be done by finding either bicenter edge or edge with center as one endpoint. The edge could be found by =Expose({v}) followed by Search() with appropriate Choose. The choose selects between children with the child with higher maxdistance. To support the operation the maximal distance in the cluster subtree from a boundary vertex should be maintained in the . That requires maintenance of the cluster path length as well.

Interesting Results and Applications

A number of interesting applications originally implemented by other methods have been easily implemented using the top tree's interface. Some of them include

Implementation

Top trees have been implemented in a variety of ways, some of them include implementation using a Multilevel Partition (Top-trees and dynamic graph algorithms Jacob Holm and Kristian de Lichtenberg. Technical Report), and even by using Sleator-Tarjan s-t trees (typically with amortized time bounds), Frederickson's Topology Trees (with worst case time bounds) (Alstrup et al. Maintaining Information in Fully Dynamic Trees with Top Trees).

Amortized implementations are more simple, and with small multiplicative factors in time complexity. On the contrary the worst case implementations allow speeding up queries by switching off unneeded info updates during the query (implemented by persistence techniques). After the query is answered the original state of the top tree is used and the query version is discarded.

Using Multilevel Partitioning

Any partitioning of clusters of a tree can be represented by a Cluster Partition Tree CPT by replacing each cluster in the tree by an edge. If we use a strategy P for partitioning then the CPT would be CPTP This is done recursively till only one edge remains.

We would notice that all the nodes of the corresponding top tree are uniquely mapped into the edges of this multilevel partition. There may be some edges in the multilevel partition that do not correspond to any node in the top tree, these are the edges which represent only a single child in the level below it, i.e. a simple cluster. Only the edges that correspond to composite clusters correspond to nodes in the top tree

A partitioning strategy is important while we partition the Tree into clusters. Only a careful strategy ensures that we end up in an height Multilevel Partition ( and therefore the top tree).

The above partitioning strategy ensures the maintenance of the top tree in time.

See also

Related Research Articles

<span class="mw-page-title-main">Minimum spanning tree</span> Least-weight tree connecting graph vertices

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

<span class="mw-page-title-main">Dijkstra's algorithm</span> Graph search algorithm

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

<span class="mw-page-title-main">Prim's algorithm</span> Method for finding minimum spanning trees

In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

<span class="mw-page-title-main">Breadth-first search</span> Algorithm to search the nodes of a graph

Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Extra memory, usually a queue, is needed to keep track of the child nodes that were encountered but not yet explored.

<span class="mw-page-title-main">Component (graph theory)</span> Maximal subgraph whose vertices can reach each other

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are sometimes called connected components.

<span class="mw-page-title-main">Eulerian path</span> Trail in a graph that visits each edge once

In graph theory, an Eulerian trail is a trail in a finite graph that visits every edge exactly once. Similarly, an Eulerian circuit or Eulerian cycle is an Eulerian trail that starts and ends on the same vertex. They were first discussed by Leonhard Euler while solving the famous Seven Bridges of Königsberg problem in 1736. The problem can be stated mathematically like this:

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

Johnson's algorithm is a way to find the shortest paths between all pairs of vertices in an edge-weighted directed graph. It allows some of the edge weights to be negative numbers, but no negative-weight cycles may exist. It works by using the Bellman–Ford algorithm to compute a transformation of the input graph that removes all negative weights, allowing Dijkstra's algorithm to be used on the transformed graph. It is named after Donald B. Johnson, who first published the technique in 1977.

<span class="mw-page-title-main">Maximal independent set</span> Independent set which is not a subset of any other independent set

In graph theory, a maximal independent set (MIS) or maximal stable set is an independent set that is not a subset of any other independent set. In other words, there is no vertex outside the independent set that may join it because it is maximal with respect to the independent set property.

In graph theory, reachability refers to the ability to get from one vertex to another within a graph. A vertex can reach a vertex if there exists a sequence of adjacent vertices which starts with and ends with .

In computer science, fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in the sequence are faster. The original version of fractional cascading, introduced in two papers by Chazelle and Guibas in 1986, combined the idea of cascading, originating in range searching data structures of Lueker (1978) and Willard (1978), with the idea of fractional sampling, which originated in Chazelle (1983). Later authors introduced more complex forms of fractional cascading that allow the data structure to be maintained as the data changes by a sequence of discrete insertion and deletion events.

The reverse-delete algorithm is an algorithm in graph theory used to obtain a minimum spanning tree from a given connected, edge-weighted graph. It first appeared in Kruskal (1956), but it should not be confused with Kruskal's algorithm which appears in the same paper. If the graph is disconnected, this algorithm will find a minimum spanning tree for each disconnected part of the graph. The set of these minimum spanning trees is called a minimum spanning forest, which contains every vertex in the graph.

In graph theory, the planar separator theorem is a form of isoperimetric inequality for planar graphs, that states that any planar graph can be split into smaller pieces by removing a small number of vertices. Specifically, the removal of vertices from an n-vertex graph can partition the graph into disjoint subgraphs each of which has at most vertices.

In graph theory, the tree-depth of a connected undirected graph is a numerical invariant of , the minimum height of a Trémaux tree for a supergraph of . This invariant and its close relatives have gone under many different names in the literature, including vertex ranking number, ordered chromatic number, and minimum elimination tree height; it is also closely related to the cycle rank of directed graphs and the star height of regular languages. Intuitively, where the treewidth of a graph measures how far it is from being a tree, this parameter measures how far a graph is from being a star.

In computer science, the method of contraction hierarchies is a speed-up technique for finding the shortest-path in a graph. The most intuitive applications are car-navigation systems: a user wants to drive from to using the quickest possible route. The metric optimized here is the travel time. Intersections are represented by vertices, the road sections connecting them by edges. The edge weights represent the time it takes to drive along this segment of the road. A path from to is a sequence of edges ; the shortest path is the one with the minimal sum of edge weights among all possible paths. The shortest path in a graph can be computed using Dijkstra's algorithm but, given that road networks consist of tens of millions of vertices, this is impractical. Contraction hierarchies is a speed-up method optimized to exploit properties of graphs representing road networks. The speed-up is achieved by creating shortcuts in a preprocessing phase which are then used during a shortest-path query to skip over "unimportant" vertices. This is based on the observation that road networks are highly hierarchical. Some intersections, for example highway junctions, are "more important" and higher up in the hierarchy than for example a junction leading into a dead end. Shortcuts can be used to save the precomputed distance between two important junctions such that the algorithm doesn't have to consider the full path between these junctions at query time. Contraction hierarchies do not know about which roads humans consider "important", but they are provided with the graph as input and are able to assign importance to vertices using heuristics.

In computing and graph theory, a dynamic connectivity structure is a data structure that dynamically maintains information about the connected components of a graph.

In computing, a distance oracle (DO) is a data structure for calculating distances between vertices in a graph.

A central problem in algorithmic graph theory is the shortest path problem. One of the generalizations of the shortest path problem is known as the single-source-shortest-paths (SSSP) problem, which consists of finding the shortest paths from a source vertex to all other vertices in the graph. There are classical sequential algorithms which solve this problem, such as Dijkstra's algorithm. In this article, however, we present two parallel algorithms solving this problem.

The highway dimension is a graph parameter modelling transportation networks, such as road networks or public transportation networks. It was first formally defined by Abraham et al. based on the observation by Bast et al. that any road network has a sparse set of "transit nodes", such that driving from a point A to a sufficiently far away point B along the shortest route will always pass through one of these transit nodes. It has also been proposed that the highway dimension captures the properties of public transportation networks well, given that longer routes using busses, trains, or airplanes will typically be serviced by larger transit hubs. This relates to the spoke–hub distribution paradigm in transport topology optimization.

References

  1. Holm, J.; De Lichtenberg, K.; Thorup, M. (2001). "Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity". Journal of the ACM. 48 (4): 723. doi:10.1145/502090.502095. S2CID   7273552.
  2. Thorup, Mikkel (2000), "Near-optimal fully-dynamic graph connectivity", Proceedings of the Thirty-Second Annual {ACM} Symposium on Theory of Computing
  3. Holm, Jacob; Rotenberg, Eva; Thorup, Mikkel (2018), "Dynamic Bridge-Finding in Amortized Time", Proceedings of the Twenty-Ninth Annual {ACM-SIAM} Symposium on Discrete Algorithms, {SODA} 2018, doi:10.1137/1.9781611975031.3, S2CID   33964042
  4. Holm, J.; De Lichtenberg, K.; Thorup, M. (2001). "Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity". Journal of the ACM. 48 (4): 723. doi:10.1145/502090.502095. S2CID   7273552.
  5. Bille, Philip; Gørtz, Inge Li; Landau, Gad M.; Weimann, Oren (2015). "Tree Compression with Top Trees". Inf. Comput. 243: 166–177. arXiv: 1304.5702 . doi:10.1016/j.ic.2014.12.012.