In computer science, fractional cascading is a technique to speed up a sequence of binary searches for the same value in a sequence of related data structures. The first binary search in the sequence takes a logarithmic amount of time, as is standard for binary searches, but successive searches in the sequence are faster. The original version of fractional cascading, introduced in two papers by Chazelle and Guibas in 1986 (Chazelle & Guibas 1986a; Chazelle & Guibas 1986b), combined the idea of cascading, originating in range searching data structures of Lueker (1978) and Willard (1978), with the idea of fractional sampling, which originated in Chazelle (1983). Later authors introduced more complex forms of fractional cascading that allow the data structure to be maintained as the data changes by a sequence of discrete insertion and deletion events.
As a simple example of fractional cascading, consider the following problem. We are given as input a collection of ordered lists of numbers, such that the total length of all lists is , and must process them so that we can perform binary searches for a query value in each of the lists. For instance, with and ,
The simplest solution to this searching problem is just to store each list separately. If we do so, the space requirement is , but the time to perform a query is , as we must perform a separate binary search in each of lists. The worst case for querying this structure occurs when each of the lists has equal size , so each of the binary searches involved in a query takes time .
A second solution allows faster queries at the expense of more space: we may merge all the lists into a single big list , and associate with each item of a list of the results of searching for in each of the smaller lists . If we describe an element of this merged list as where is the numerical value and , , , and are the positions (the first number has position 0) of the next element at least as large as in each of the original input lists (or the position after the end of the list if no such element exists), then we would have
This merged solution allows a query in time : simply search for in and then report the results stored at the item found by this search. For instance, if , searching for in finds the item 62[1,3,2,3], from which we return the results , (a flag value indicating that is past the end of ), , and . However, this solution pays a high penalty in space complexity: it uses space as each of the items in must store a list of search results.
Fractional cascading allows this same searching problem to be solved with time and space bounds meeting the best of both worlds: query time , and space . The fractional cascading solution is to store a new sequence of lists . The final list in this sequence, , is equal to ; each earlier list is formed by merging with every second item from . With each item in this merged list, we store two numbers: the position resulting from searching for in and the position resulting from searching for in . For the data above, this would give us the following lists:
Suppose we wish to perform a query in this structure, for . We first do a standard binary search for in , finding the value 64[1,5]. The "1" in 64[1,5], tells us that the search for in should return . The "5" in 64[1,5] tells us that the approximate location of in is position 5. More precisely, binary searching for in would return either the value 79[3,5] at position 5, or the value 62[3,3] one place earlier. By comparing to 62, and observing that it is smaller, we determine that the correct search result in is 62[3,3]. The first "3" in 62[3,3] tells us that the search for in should return , a flag value meaning that is past the end of list . The second "3" in 62[3,3] tells us that the approximate location of in is position 3. More precisely, binary searching for in would return either the value 62[2,3] at position 3, or the value 44[1,2] one place earlier. A comparison of with the smaller value 44 shows us that the correct search result in is 62[2,3]. The "2" in 62[2,3] tells us that the search for in should return , and the "3" in 62[2,3] tells us that the result of searching for in is either 79[3,0] at position 3 or 46[2,0] at position 2; comparing with 46 shows that the correct result is 79[3,0] and that the result of searching for in is . Thus, we have found in each of our four lists, by doing a binary search in the single list followed by a single comparison in each of the successive lists.
More generally, for any data structure of this type, we perform a query by doing a binary search for in , and determining from the resulting value the position of in . Then, for each , we use the known position of in to find its position in . The value associated with the position of in points to a position in that is either the correct result of the binary search for in or is a single step away from that correct result, so stepping from to requires only a single comparison. Thus, the total time for a query is
In our example, the fractionally cascaded lists have a total of 25 elements, less than twice that of the original input. In general, the size of in this data structure is at most as may easily be proven by induction. Therefore, the total size of the data structure is at most as may be seen by regrouping the contributions to the total size coming from the same input list together with each other.
In general, fractional cascading begins with a catalog graph, a directed graph in which each vertex is labeled with an ordered list. A query in this data structure consists of a path in the graph and a query value q; the data structure must determine the position of q in each of the ordered lists associated with the vertices of the path. For the simple example above, the catalog graph is itself a path, with just four nodes. It is possible for later vertices in the path to be determined dynamically as part of a query, in response to the results found by the searches in earlier parts of the path.
To handle queries of this type, for a graph in which each vertex has at most d incoming and at most d outgoing edges for some constant d, the lists associated with each vertex are augmented by a fraction of the items from each outgoing neighbor of the vertex; the fraction must be chosen to be smaller than 1/d, so that the total amount by which all lists are augmented remains linear in the input size. Each item in each augmented list stores with it the position of that item in the unaugmented list stored at the same vertex, and in each of the outgoing neighboring lists. In the simple example above, d = 1, and we augmented each list with a 1/2 fraction of the neighboring items.
A query in this data structure consists of a standard binary search in the augmented list associated with the first vertex of the query path, together with simpler searches at each successive vertex of the path. If a 1/r fraction of items are used to augment the lists from each neighboring item, then each successive query result may be found within at most r steps of the position stored at the query result from the previous path vertex, and therefore may be found in constant time without having to perform a full binary search.
In dynamic fractional cascading, the list stored at each node of the catalog graph may change dynamically, by a sequence of updates in which list items are inserted and deleted. This causes several difficulties for the data structure.
First, when an item is inserted or deleted at a node of the catalog graph, it must be placed within the augmented list associated with that node, and may cause changes to propagate to other nodes of the catalog graph. Instead of storing the augmented lists in arrays, they should be stored as binary search trees, so that these changes can be handled efficiently while still allowing binary searches of the augmented lists.
Second, an insertion or deletion may cause a change to the subset of the list associated with a node that is passed on to neighboring nodes of the catalog graph. It is no longer feasible, in the dynamic setting, for this subset to be chosen as the items at every dth position of the list, for some d, as this subset would change too drastically after every update. Rather, a technique closely related to B-trees allows the selection of a fraction of data that is guaranteed to be smaller than 1/d, with the selected items guaranteed to be spaced a constant number of positions apart in the full list, and such that an insertion or deletion into the augmented list associated with a node causes changes to propagate to other nodes for a fraction of the operations that is less than 1/d. In this way, the distribution of the data among the nodes satisfies the properties needed for the query algorithm to be fast, while guaranteeing that the average number of binary search tree operations per data insertion or deletion is constant.
Third, and most critically, the static fractional cascading data structure maintains, for each element x of the augmented list at each node of the catalog graph, the index of the result that would be obtained when searching for x among the input items from that node and among the augmented lists stored at neighboring nodes. However, this information would be too expensive to maintain in the dynamic setting. Inserting or deleting a single value x could cause the indexes stored at an unbounded number of other values to change. Instead, dynamic versions of fractional cascading maintain several data structures for each node:
These data structures allow dynamic fractional cascading to be performed at a time of O(log n) per insertion or deletion, and a sequence of k binary searches following a path of length k in the catalog graph to be performed in time O(log n + k log log n).
Typical applications of fractional cascading involve range search data structures in computational geometry. For example, consider the problem of half-plane range reporting : that is, intersecting a fixed set of n points with a query half-plane and listing all the points in the intersection. The problem is to structure the points in such a way that a query of this type may be answered efficiently in terms of the intersection size h. One structure that can be used for this purpose is the convex layers of the input point set, a family of nested convex polygons consisting of the convex hull of the point set and the recursively-constructed convex layers of the remaining points. Within a single layer, the points inside the query half-plane may be found by performing a binary search for the half-plane boundary line's slope among the sorted sequence of convex polygon edge slopes, leading to the polygon vertex that is inside the query half-plane and farthest from its boundary, and then sequentially searching along the polygon edges to find all other vertices inside the query half-plane. The whole half-plane range reporting problem may be solved by repeating this search procedure starting from the outermost layer and continuing inwards until reaching a layer that is disjoint from the query halfspace. Fractional cascading speeds up the successive binary searches among the sequences of polygon edge slopes in each layer, leading to a data structure for this problem with space O(n) and query time O(log n + h). The data structure may be constructed in time O(n log n) by an algorithm of Chazelle (1985). As in our example, this application involves binary searches in a linear sequence of lists (the nested sequence of the convex layers), so the catalog graph is just a path.
Another application of fractional cascading in geometric data structures concerns point location in a monotone subdivision, that is, a partition of the plane into polygons such that any vertical line intersects any polygon in at most two points. As Edelsbrunner, Guibas & Stolfi (1986) showed, this problem can be solved by finding a sequence of polygonal paths that stretch from left to right across the subdivision, and binary searching for the lowest of these paths that is above the query point. Testing whether the query point is above or below one of the paths can itself be solved as a binary search problem, searching for the x coordinate of the points among the x coordinates of the path vertices to determine which path edge might be above or below the query point. Thus, each point location query can be solved as an outer layer of binary search among the paths, each step of which itself performs a binary search among x coordinates of vertices. Fractional cascading can be used to speed up the time for the inner binary searches, reducing the total time per query to O(log n) using a data structure with space O(n). In this application the catalog graph is a tree representing the possible search sequences of the outer binary search.
Beyond computational geometry, Lakshman & Stiliadis (1998) and Buddhikot, Suri & Waldvogel (1999) apply fractional cascading in the design of data structures for fast packet filtering in internet routers. Gao et al. (2004) use fractional cascading as a model for data distribution and retrieval in sensor networks.
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.
In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the left child and the right child. That is, it is a k-ary tree with k = 2. A recursive definition using set theory is that a binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set containing the root.
A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.
Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Extra memory, usually a queue, is needed to keep track of the child nodes that were encountered but not yet explored.
The point location problem is a fundamental topic of computational geometry. It finds applications in areas that deal with processing geometrical data: computer graphics, geographic information systems (GIS), motion planning, and computer aided design (CAD).
In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.
In computer science, a fusion tree is a type of tree data structure that implements an associative array on w-bit integers on a finite universe, where each of the input integers has size less than 2w and is non-negative. When operating on a collection of n key–value pairs, it uses O(n) space and performs searches in O(logwn) time, which is asymptotically faster than a traditional self-balancing binary search tree, and also better than the van Emde Boas tree for large values of w. It achieves this speed by using certain constant-time operations that can be done on a machine word. Fusion trees were invented in 1990 by Michael Fredman and Dan Willard.
In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.
In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:
In graph theory and computer science, the lowest common ancestor (LCA) of two nodes v and w in a tree or directed acyclic graph (DAG) T is the lowest node that has both v and w as descendants, where we define each node to be a descendant of itself.
Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values.
In computer science, the range searching problem consists of processing a set S of objects, in order to determine which objects from S intersect with a query object, called the range. For example, if S is a set of points corresponding to the coordinates of several cities, find the subset of cities within a given range of latitudes and longitudes.
In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, (unlabeled) trees, and planar graphs. Unlike general lossless data compression algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure, insofar as the size of the stored or encoded data similarly depends upon the specific content of the data itself.
In computer science, the segment tree is a data structure used for storing information about intervals or segments. It allows querying which of the stored segments contain a given point. A similar data structure is the interval tree.
In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be reported efficiently, and is typically used in two or higher dimensions. Range trees were introduced by Jon Louis Bentley in 1979. Similar data structures were discovered independently by Lueker, Lee and Wong, and Willard. The range tree is an alternative to the k-d tree. Compared to k-d trees, range trees offer faster query times of but worse storage of , where n is the number of points stored in the tree, d is the dimension of each point and k is the number of points reported by a given query.
Skip graphs are a kind of distributed data structure based on skip lists. They were invented in 2003 by James Aspnes and Gauri Shah. A nearly identical data structure called SkipNet was independently invented by Nicholas Harvey, Michael Jones, Stefan Saroiu, Marvin Theimer and Alec Wolman, also in 2003.
In computer science, a ball tree, balltree or metric tree, is a space partitioning data structure for organizing points in a multi-dimensional space. A ball tree partitions data points into a nested set of balls. The resulting data structure has characteristics that make it useful for a number of applications, most notably nearest neighbor search.
In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.
In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Like a B-tree, a fractal tree index is a generalization of a binary search tree in that a node can have more than two children. Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations. The goal of the buffers is to schedule disk writes so that each write performs a large amount of useful work, thereby avoiding the worst-case performance of B-trees, in which each disk write may change a small amount of data on disk. Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data. The fractal tree index has been commercialized in databases by Tokutek. Originally, it was implemented as a cache-oblivious lookahead array, but the current implementation is an extension of the Bε tree. The Bε is related to the Buffered Repository Tree. The Buffered Repository Tree has degree 2, whereas the Bε tree has degree Bε. The fractal tree index has also been used in a prototype filesystem. An open source implementation of the fractal tree index is available, which demonstrates the implementation details outlined below.
In computational geometry, the convex layers of a set of points in the Euclidean plane are a sequence of nested convex polygons having the points as their vertices. The outermost one is the convex hull of the points and the rest are formed in the same way recursively. The innermost layer may be degenerate, consisting only of one or two points. The problem of constructing convex layers has also been called onion peeling or onion decomposition.