Euler tour technique

Last updated
Euler tour of a tree, with edges labeled to show the order in which they are traversed by the tour Stirling permutation Euler tour.svg
Euler tour of a tree, with edges labeled to show the order in which they are traversed by the tour

The Euler tour technique (ETT), named after Leonhard Euler, is a method in graph theory for representing trees. The tree is viewed as a directed graph that contains two directed edges for each edge in the tree. The tree can then be represented as a Eulerian circuit of the directed graph, known as the Euler tour representation (ETR) of the tree. The ETT allows for efficient, parallel computation of solutions to common problems in algorithmic graph theory. It was introduced by Tarjan and Vishkin in 1984. [1]

Contents

Construction

Given an undirected tree presented as a set of edges, the Euler tour representation (ETR) can be constructed in parallel as follows:

Construct an edge list (called succ) in Euler tour order by setting pointers succ(u,v) for all edges (u,v) in parallel according to the following rule:

The resulting list succ will be circular.

The overall construction takes work W(n) = O(sort(n)) (the time it takes to sort n items in parallel) if the tree has n nodes, as in trees the number of edges is one less than the number of nodes.

Roots, advance and retreat edges

If the tree has a root, we can split the circular list succ at that root. In that case, we can speak of advance and retreat edges: given a pair of nodes u,v, the first occurrence of either (u,v) or (v,u) in the ETR is called the advance edge, and the second occurrence is called the retreat edge. This appeals to the intuition that the first time such an edge is traversed the distance to the root is increased, while the second time the distance decreases.

Rerooting the tree can be done in constant time O(1) by splitting the circular list succ at the new root.

Applications

All of the following problems can be solved in O(Prefix sum(n)) (the time it takes to solve the prefix sum problem in parallel for a list of n items):

  1. Classifying advance and retreat edges: Do list ranking on the ETR and save the result in a two-dimensional array A. Then (u,v) is an advance edge iff A(u,v) < A(v,u), and a retreat edge otherwise.
  2. Determine the level of each node: Do a prefix sum on the ETR, where every advance edge counts as 1, and every retreat edge counts as −1. Then the value at the advance edge (u,v) is the level of v.
  3. Number of nodes in a subtree rooted at v: assume the parent of v is u, determine advance edge (u,v), and the retreat edge (v,u) in parallel, and then count the number of advance edges between (u,v) and (v,u) using prefix sum.
  4. The depth-first search index of a node v: count the number of advance edges up to and including (u,v).
  5. Determine the lowest common ancestor of two nodes.

Euler tour trees

Henzinger and King [2] suggest to represent a given tree by keeping its Euler tour in a balanced binary search tree, keyed by the index in the tour. So for example, the unbalanced tree in the example above, having 7 nodes, will be represented by a balanced binary tree with 14 nodes, one for each time each node appears on the tour.

We can represent a forest (an acyclic graph) using a collection of ET trees - one ET tree for one forest tree. This representation allows us to quickly answer the question "what is the root of node v?" by just moving to the first node of the ET tree (since nodes in the ET tree are keyed by their location in the Euler tour, and the root is the first and last node in the tour). When the represented forest is updated (e.g. by connecting two trees to a single tree or by splitting a tree to two trees), the corresponding Euler-tour structure can be updated in time O(log(n)).

Link/cut trees have similar performance guarantees. While LC trees are good for maintaining aggregates on paths of a tree (making it a good choice data structure in network flow algorithms), ET trees are better at keeping aggregate information on subtrees. [3]

Related Research Articles

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Tree decomposition</span> Mapping of a graph into a tree

In graph theory, a tree decomposition is a mapping of a graph into a tree that can be used to define the treewidth of the graph and speed up solving certain computational problems on the graph.

<span class="mw-page-title-main">Eulerian path</span> Trail in a graph that visits each edge once

In graph theory, an Eulerian trail is a trail in a finite graph that visits every edge exactly once. Similarly, an Eulerian circuit or Eulerian cycle is an Eulerian trail that starts and ends on the same vertex. They were first discussed by Leonhard Euler while solving the famous Seven Bridges of Königsberg problem in 1736. The problem can be stated mathematically like this:

<span class="mw-page-title-main">Maximum flow problem</span> Computational problem in graph theory

In optimization theory, maximum flow problems involve finding a feasible flow through a flow network that obtains the maximum possible flow rate.

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

In computational phylogenetics, tree alignment is a computational problem concerned with producing multiple sequence alignments, or alignments of three or more sequences of DNA, RNA, or protein. Sequences are arranged into a phylogenetic tree, modeling the evolutionary relationships between species or taxa. The edit distances between sequences are calculated for each of the tree's internal vertices, such that the sum of all edit distances within the tree is minimized. Tree alignment can be accomplished using one of several algorithms with various trade-offs between manageable tree size and computational effort.

<span class="mw-page-title-main">Biconnected component</span> Maximal biconnected subgraph

In graph theory, a biconnected component is a maximal biconnected subgraph. Any connected graph decomposes into a tree of biconnected components called the block-cut tree of the graph. The blocks are attached to each other at shared vertices called cut vertices or separating vertices or articulation points. Specifically, a cut vertex is any vertex whose removal increases the number of connected components.

In parallel algorithms, the list ranking problem involves determining the position, or rank, of each item in a linked list. That is, the first item in the list should be assigned the number 1, the second item in the list should be assigned the number 2, etc. Although it is straightforward to solve this problem efficiently on a sequential computer, by traversing the list in order, it is more complicated to solve in parallel. As Anderson & Miller (1990) wrote, the problem was viewed as important in the parallel algorithms community both for its many applications and because solving it led to many important ideas that could be applied in parallel algorithms more generally.

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

In graph theory and computer science, the lowest common ancestor (LCA) of two nodes v and w in a tree or directed acyclic graph (DAG) T is the lowest node that has both v and w as descendants, where we define each node to be a descendant of itself.

A link/cut tree is a data structure for representing a forest, a set of rooted trees, and offers the following operations:

<span class="mw-page-title-main">Handshaking lemma</span> Every graph has evenly many odd vertices

In graph theory, a branch of mathematics, the handshaking lemma is the statement that, in every finite undirected graph, the number of vertices that touch an odd number of edges is even. For example, if there is a party of people who shake hands, the number of people who shake an odd number of other people's hands is even. The handshaking lemma is a consequence of the degree sum formula, also sometimes called the handshaking lemma, according to which the sum of the degrees equals twice the number of edges in the graph. Both results were proven by Leonhard Euler (1736) in his famous paper on the Seven Bridges of Königsberg that began the study of graph theory.

<span class="mw-page-title-main">SPQR tree</span> Representation of a graphs triconnected components

In graph theory, a branch of mathematics, the triconnected components of a biconnected graph are a system of smaller graphs that describe all of the 2-vertex cuts in the graph. An SPQR tree is a tree data structure used in computer science, and more specifically graph algorithms, to represent the triconnected components of a graph. The SPQR tree of a graph may be constructed in linear time and has several applications in dynamic graph algorithms and graph drawing.

In computer science, a finger tree is a purely functional data structure that can be used to efficiently implement other functional data structures. A finger tree gives amortized constant time access to the "fingers" (leaves) of the tree, which is where data is stored, and concatenation and splitting logarithmic time in the size of the smaller piece. It also stores in each internal node the result of applying some associative operation to its descendants. This "summary" data stored in the internal nodes can be used to provide the functionality of data structures other than trees.

<span class="mw-page-title-main">Cartesian tree</span> Binary tree derived from a sequence of numbers

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

In computer science, the method of contraction hierarchies is a speed-up technique for finding the shortest-path in a graph. The most intuitive applications are car-navigation systems: a user wants to drive from to using the quickest possible route. The metric optimized here is the travel time. Intersections are represented by vertices, the road sections connecting them by edges. The edge weights represent the time it takes to drive along this segment of the road. A path from to is a sequence of edges ; the shortest path is the one with the minimal sum of edge weights among all possible paths. The shortest path in a graph can be computed using Dijkstra's algorithm but, given that road networks consist of tens of millions of vertices, this is impractical. Contraction hierarchies is a speed-up method optimized to exploit properties of graphs representing road networks. The speed-up is achieved by creating shortcuts in a preprocessing phase which are then used during a shortest-path query to skip over "unimportant" vertices. This is based on the observation that road networks are highly hierarchical. Some intersections, for example highway junctions, are "more important" and higher up in the hierarchy than for example a junction leading into a dead end. Shortcuts can be used to save the precomputed distance between two important junctions such that the algorithm doesn't have to consider the full path between these junctions at query time. Contraction hierarchies do not know about which roads humans consider "important", but they are provided with the graph as input and are able to assign importance to vertices using heuristics.

In graph theory and theoretical computer science, the level ancestor problem is the problem of preprocessing a given rooted tree T into a data structure that can determine the ancestor of a given node at a given distance from the root of the tree.

In computing and graph theory, a dynamic connectivity structure is a data structure that dynamically maintains information about the connected components of a graph.

In computer science, parallel tree contraction is a broadly applicable technique for the parallel solution of a large number of tree problems, and is used as an algorithm design technique for the design of a large number of parallel graph algorithms. Parallel tree contraction was introduced by Gary L. Miller and John H. Reif, and has subsequently been modified to improve efficiency by X. He and Y. Yesha, Hillel Gazit, Gary L. Miller and Shang-Hua Teng and many others.

References

  1. Tarjan, R.E.; Vishkin, U. (1984). Finding biconnected components and computing tree functions in logarithmic parallel time. Proceedings of FOCS. pp. 12–20. CiteSeerX   10.1.1.419.3088 . doi:10.1109/SFCS.1984q5896.
  2. Henzinger, M. R.; King, V. (1995). "Randomized dynamic graph algorithms with polylogarithmic time per operation". Proceedings of the twenty-seventh annual ACM symposium on Theory of computing - STOC '95. p. 519. doi:10.1145/225058.225269. ISBN   0897917189.
  3. Euler tour trees - in Lecture Notes in Advanced Data Structures. Prof. Erik Demaine; Scribe: Katherine Lai.