Pointer jumping

Last updated June 04, 2024

Pointer jumping or path doubling is a design technique for parallel algorithms that operate on pointer structures, such as linked lists and directed graphs. Pointer jumping allows an algorithm to follow paths with a time complexity that is logarithmic with respect to the length of the longest path. It does this by "jumping" to the end of the path computed by neighbors.

The basic operation of pointer jumping is to replace each neighbor in a pointer structure with its neighbor's neighbor. In each step of the algorithm, this replacement is done for all nodes in the data structure, which can be done independently in parallel. In the next step when a neighbor's neighbor is followed, the neighbor's path already followed in the previous step is added to the node's followed path in a single step. Thus, each step effectively doubles the distance traversed by the explored paths.

Pointer jumping is best understood by looking at simple examples such as list ranking and root finding.

List ranking

One of the simpler tasks that can be solved by a pointer jumping algorithm is the list ranking problem. This problem is defined as follows: given a linked list of $N$ nodes, find the distance (measured in the number of nodes) of each node to the end of the list. The distance d(n) is defined as follows, for nodes n that point to their successor by a pointer called next:

If n.next is nil, then d(n) = 0.
For any other node, d(n) = d(n.next) + 1.

This problem can easily be solved in linear time on a sequential machine, but a parallel algorithm can do better: given $n$ processors, the problem can be solved in logarithmic time, $O (log N)$ , by the following pointer jumping algorithm:^[1]^: 693

Allocate an array of $N$ integers.
Initialize: for each processor/list node n, in parallel:
- If n.next = nil, set d[n] ← 0.
- Else, set d[n] ← 1.
While any node n has n.next ≠ nil:
- For each processor/list node n, in parallel:
  - If n.next ≠ nil:
    - Set d[n] ← d[n] + d[n.next].
    - Set n.next ← n.next.next.

The pointer jumping occurs in the last line of the algorithm, where each node's next pointer is reset to skip the node's direct successor. It is assumed, as in common in the PRAM model of computation, that memory access are performed in lock-step, so that each n.next.next memory fetch is performed before each n.next memory store; otherwise, processors may clobber each other's data, producing inconsistencies.^[1]^: 694

The following diagram follows how the parallel list ranking algorithm uses pointer jumping for a linked list with 11 elements. As the algorithm describes, the first iteration starts initialized with all ranks set to 1 except those with a null pointer for next. The first iteration looks at immediate neighbors. Each subsequent iteration jumps twice as far as the previous.

Analyzing the algorithm yields a logarithmic running time. The initialization loop takes constant time, because each of the $N$ processors performs a constant amount of work, all in parallel. The inner loop of the main loop also takes constant time, as does (by assumption) the termination check for the loop, so the running time is determined by how often this inner loop is executed. Since the pointer jumping in each iteration splits the list into two parts, one consisting of the "odd" elements and one of the "even" elements, the length of the list pointed to by each processor's n is halved in each iteration, which can be done at most $O (log N)$ time before each list has a length of at most one.^[1]^{: 694–695}

Root finding

Following a path in a graph is an inherently serial operation, but pointer jumping reduces the total amount of work by following all paths simultaneously and sharing results among dependent operations. Pointer jumping iterates and finds a successor — a vertex closer to the tree root — each time. By following successors computed for other vertices, the traversal down each path can be doubled every iteration, which means that the tree roots can be found in logarithmic time.

Pointer doubling operates on an array successor with an entry for every vertex in the graph. Each successor[i] is initialized with the parent index of vertex i if that vertex is not a root or to i itself if that vertex is a root. At each iteration, each successor is updated to its successor's successor. The root is found when the successor's successor points to itself.

The following pseudocode demonstrates the algorithm.

algorithmInput: An array parent representing a forest of trees. parent[i] is the parent of vertex i or itself for a root     Output: An array containing the root ancestor for every vertex      fori ← 1 to length(parent) do in parallel         successor[i] ← parent[i]     while true         fori ← 1 to length(successor) do in parallel             successor_next[i] ← successor[successor[i]]         if successor_next = successor then             break         fori ← 1 to length(successor) do in parallel             successor[i] ← successor_next[i]     return successor

The following image provides an example of using pointer jumping on a small forest. On each iteration the successor points to the vertex following one more successor. After two iterations, every vertex points to its root node.

History and examples

Although the name pointer jumping would come later, JáJá^[2]^: 88 attributes the first uses of the technique in early parallel graph algorithms ^[3]^[4]^: 43 and list ranking.^[5] The technique has been described with other names such as shortcutting,^[6]^[7] but by the 1990s textbooks on parallel algorithms consistently used the term pointer jumping.^[2]^: 52–56^[1]^{: 692–701}^[8]^: 34–35 Today, pointer jumping is considered a software design pattern for operating on recursive data types in parallel.^[9]^: 99

As a technique for following linked paths, graph algorithms are a natural fit for pointer jumping. Consequently, several parallel graph algorithms utilizing pointer jumping have been designed. These include algorithms for finding the roots of a forest of rooted trees,^[2]^: 52–53^[6] connected components,^[2]^{: 213–221} minimum spanning trees ^[2],^{: 222–227}^[10] and biconnected components ^[2]^{: 227–239}^[7]. However, pointer jumping has also shown to be useful in a variety of other problems including computer vision,^[11] image compression,^[12] and Bayesian inference.^[13]

Related Research Articles

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.

<span class="mw-page-title-main">Dijkstra's algorithm</span> Algorithm for finding shortest paths

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.

<span class="mw-page-title-main">Prim's algorithm</span> Method for finding minimum spanning trees

In computer science, Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex.

<span class="mw-page-title-main">Breadth-first search</span> Algorithm to search the nodes of a graph

Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Extra memory, usually a queue, is needed to keep track of the child nodes that were encountered but not yet explored.

<span class="mw-page-title-main">Depth-first search</span> Search algorithm

Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node and explores as far as possible along each branch before backtracking. Extra memory, usually a stack, is needed to keep track of the nodes discovered so far along a specified branch which helps in backtracking of the graph.

A* is a graph traversal and pathfinding algorithm, which is used in many fields of computer science due to its completeness, optimality, and optimal efficiency. Given a weighted graph, a source node and a goal node, the algorithm finds the shortest path from source to goal.

The Bellman–Ford algorithm is an algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph. It is slower than Dijkstra's algorithm for the same problem, but more versatile, as it is capable of handling graphs in which some of the edge weights are negative numbers. The algorithm was first proposed by Alfonso Shimbel, but is instead named after Richard Bellman and Lester Ford Jr., who published it in 1958 and 1956, respectively. Edward F. Moore also published a variation of the algorithm in 1959, and for this reason it is also sometimes called the Bellman–Ford–Moore algorithm.

In computer science, a node $d$ of a control-flow graph dominates a node $n$ if every path from the entry node to $n$ must go through $d$ . Notationally, this is written as $d dom n$ . By definition, every node dominates itself.

In the mathematical field of graph theory, a spanning treeT of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G. In general, a graph may have several spanning trees, but a graph that is not connected will not contain a spanning tree. If all of the edges of G are also edges of a spanning tree T of G, then G is a tree and is identical to T.

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In graph theory, a path decomposition of a graph $G$ is, informally, a representation of $G$ as a "thickened" path graph, and the pathwidth of $G$ is a number that measures how much the path was thickened to form $G$ . More formally, a path-decomposition is a sequence of subsets of vertices of $G$ such that the endpoints of each edge appear in one of the subsets and such that each vertex appears in a contiguous subsequence of the subsets, and the pathwidth is one less than the size of the largest set in such a decomposition. Pathwidth is also known as interval thickness, vertex separation number, or node searching number.

In computer science, graph traversal refers to the process of visiting each vertex in a graph. Such traversals are classified by the order in which the vertices are visited. Tree traversal is a special case of graph traversal.

In computer science, a Cartesian tree is a binary tree derived from a sequence of distinct numbers. To construct the Cartesian tree, set its root to be the minimum number in the sequence, and recursively construct its left and right subtrees from the subsequences before and after this number. It is uniquely defined as a min-heap whose symmetric (in-order) traversal returns the original sequence.

The Euler tour technique (ETT), named after Leonhard Euler, is a method in graph theory for representing trees. The tree is viewed as a directed graph that contains two directed edges for each edge in the tree. The tree can then be represented as a Eulerian circuit of the directed graph, known as the Euler tour representation (ETR) of the tree. The ETT allows for efficient, parallel computation of solutions to common problems in algorithmic graph theory. It was introduced by Tarjan and Vishkin in 1984.

In graph theory and theoretical computer science, the level ancestor problem is the problem of preprocessing a given rooted tree T into a data structure that can determine the ancestor of a given node at a given distance from the root of the tree.

A central problem in algorithmic graph theory is the shortest path problem. Hereby, the problem of finding the shortest path between every pair of nodes is known as all-pair-shortest-paths (APSP) problem. As sequential algorithms for this problem often yield long runtimes, parallelization has shown to be beneficial in this field. In this article two efficient algorithms solving this problem are introduced.

The breadth-first-search algorithm is a way to explore the vertices of a graph layer by layer. It is a basic algorithm in graph theory which can be used as a part of other graph algorithms. For instance, BFS is used by Dinic's algorithm to find maximum flow in a graph. Moreover, BFS is also one of the kernel algorithms in Graph500 benchmark, which is a benchmark for data-intensive supercomputing problems. This article discusses the possibility of speeding up BFS through the use of parallel computing.

References

1 2 3 4 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03293-7.
1 2 3 4 5 6 JáJá, Joseph (1992). An Introduction to Parallel Algorithms. Addison Wesley. ISBN 0-201-54856-9.
↑ Hirschberg, D. S. (1976). "Parallel algorithms for the transitive closure and the connected component problems". Proceedings of the eighth annual ACM symposium on Theory of computing - STOC '76. pp. 55–57. doi:10.1145/800113.803631. S2CID 306043.
↑ Savage, Carla Diane (1977). Parallel Algorithms for Graph Theoretic Problems (Thesis). University of Illinois at Urbana-Champaign. Archived from the original on June 1, 2022.
↑ Wylie, James C. (1979). "Chapter 4: Computational Structures". The Complexity of Parallel Computations (Thesis). Cornell University.
1 2 Shiloach, Yossi; Vishkin, Uzi (1982). "An O(log n) Parallel Connectivity Algorithm". Journal of Algorithms. 3 (1): 57–67. doi:10.1016/0196-6774(82)90008-6.
1 2 Tarjan, Robert E; Vishkin, Uzi (1984). "Finding biconnected components and computing tree functions in logarithmic parallel time". 25th Annual Symposium on Foundations of Computer Science, 1984. pp. 12–20. doi:10.1109/SFCS.1984.715896. ISBN 0-8186-0591-X.
↑ Quinn, Michael J. (1994). Parallel Computing: Theory and Practice (2 ed.). McGraw-Hill. ISBN 0-07-051294-9.
↑ Mattson, Timothy G.; Sanders, Beverly A.; Massingill, Berna L. (2005). Patterns for Parallel Programming. Addison-Wesley. ISBN 0-321-22811-1.
↑ Chung, Sun; Condon, Anne (1996). "Parallel implementation of Bouvka's minimum spanning tree algorithm". Proceedings of International Conference on Parallel Processing. pp. 302–308. doi:10.1109/IPPS.1996.508073. ISBN 0-8186-7255-2. S2CID 12710022.
↑ Little, James J.; Blelloch, Guy E.; Cass, Todd A. (1989). "Algorithmic Techniques for Computer Vision on a Fine-Grained Parallel Machine". IEEE Transactions on Pattern Analysis and Machine Intelligence. 11 (3): 244–257. doi:10.1109/34.21793.
↑ Cook, Gregory W.; Delp, Edward J. (1994). "An investigation of JPEG image and video compression using parallel processing". Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 437–440. doi:10.1109/ICASSP.1994.389394. ISBN 0-7803-1775-0. S2CID 8879246.
↑ Namasivayam, Vasanth Krishna; Prasanna, Viktor K. (2006). Scalable Parallel Implementation of ExactInference in Bayesian Networks. 12th International Conference on Parallel and Distributed Systems - (ICPADS'06). pp. 8 pp. doi:10.1109/ICPADS.2006.96. ISBN 0-7695-2612-8. S2CID 15728730.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[clrs-1] 1 2 3 4 Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001) [1990]. Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03293-7.

[JaJa-2] 1 2 3 4 5 6 JáJá, Joseph (1992). An Introduction to Parallel Algorithms. Addison Wesley. ISBN 0-201-54856-9.

[3] Hirschberg, D. S. (1976). "Parallel algorithms for the transitive closure and the connected component problems". Proceedings of the eighth annual ACM symposium on Theory of computing - STOC '76. pp. 55–57. doi:10.1145/800113.803631. S2CID 306043.

[4] Savage, Carla Diane (1977). Parallel Algorithms for Graph Theoretic Problems (Thesis). University of Illinois at Urbana-Champaign. Archived from the original on June 1, 2022.

[Wylie-5] Wylie, James C. (1979). "Chapter 4: Computational Structures". The Complexity of Parallel Computations (Thesis). Cornell University.

[Shiloach-6] 1 2 Shiloach, Yossi; Vishkin, Uzi (1982). "An O(log n) Parallel Connectivity Algorithm". Journal of Algorithms. 3 (1): 57–67. doi:10.1016/0196-6774(82)90008-6.

[Tarjan-7] 1 2 Tarjan, Robert E; Vishkin, Uzi (1984). "Finding biconnected components and computing tree functions in logarithmic parallel time". 25th Annual Symposium on Foundations of Computer Science, 1984. pp. 12–20. doi:10.1109/SFCS.1984.715896. ISBN 0-8186-0591-X.

[8] Quinn, Michael J. (1994). Parallel Computing: Theory and Practice (2 ed.). McGraw-Hill. ISBN 0-07-051294-9.

[9] Mattson, Timothy G.; Sanders, Beverly A.; Massingill, Berna L. (2005). Patterns for Parallel Programming. Addison-Wesley. ISBN 0-321-22811-1.

[10] Chung, Sun; Condon, Anne (1996). "Parallel implementation of Bouvka's minimum spanning tree algorithm". Proceedings of International Conference on Parallel Processing. pp. 302–308. doi:10.1109/IPPS.1996.508073. ISBN 0-8186-7255-2. S2CID 12710022.

[11] Little, James J.; Blelloch, Guy E.; Cass, Todd A. (1989). "Algorithmic Techniques for Computer Vision on a Fine-Grained Parallel Machine". IEEE Transactions on Pattern Analysis and Machine Intelligence. 11 (3): 244–257. doi:10.1109/34.21793.

[12] Cook, Gregory W.; Delp, Edward J. (1994). "An investigation of JPEG image and video compression using parallel processing". Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 437–440. doi:10.1109/ICASSP.1994.389394. ISBN 0-7803-1775-0. S2CID 8879246.

[13] Namasivayam, Vasanth Krishna; Prasanna, Viktor K. (2006). Scalable Parallel Implementation of ExactInference in Bayesian Networks. 12th International Conference on Parallel and Distributed Systems - (ICPADS'06). pp. 8 pp. doi:10.1109/ICPADS.2006.96. ISBN 0-7695-2612-8. S2CID 15728730.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing