PH-tree

PH-tree
PH-tree
Type	tree, map
Invented	2014
Operation
Time complexity in big O notation
Operation	Average
Search	O(log n)
Insert	O(log n)
Delete	O(log n)
Space complexity
Space	O(n)

Last updated April 12, 2024

The PH-tree^[1] is a tree data structure used for spatial indexing of multi-dimensional data (keys) such as geographical coordinates, points, feature vectors, rectangles or bounding boxes. The PH-tree is space partitioning index^[2] with a structure similar to that of a quadtree or octree.^[3] However, unlike quadtrees, it uses a splitting policy based on tries and similar to Crit bit trees that is based on the bit-representation of the keys. The bit-based splitting policy, when combined with the use of different internal representations for nodes, provides scalability with high-dimensional data. The bit-representation splitting policy also imposes a maximum depth, thus avoiding degenerated trees and the need for rebalancing.^[1]

Overview

The basic PH-tree is a spatial index that maps keys, which are $d$ -dimensional vectors with integers, to user defined values. The PH-tree is a multi-dimensional generalization of a Crit bit tree in the sense that a Crit bit tree is equivalent to a PH-tree with $1$ -dimensional keys. Like the Crit bit tree, and unlike most other spatial indexes, the PH-tree is a map rather than a multimap .^[1]^[4]

A $d$ -dimensional PH-tree is a tree of nodes where each node partitions space by subdividing it into $2^{d}$ quadrants (see below for how potentially large nodes scales with high dimensional data). Each quadrant contains at most one entry, either a key-value pair (leaf quadrant) or a key-subnode pair. For a key-subnode pair, the key represents the center of the subnode. The key is also the common prefix (bit-representation) of all keys in the subnode and its child subnodes. Each node has at least two entries, otherwise it is merged with the parent node.^[1]

Some other structural properties of PH-trees are:^[1]

They are $2^{n}$ -ary trees.
They are inherently unbalanced but imbalance is limited due to their depth being limited to the bit width of the keys, e.g. to 32 for a $d$ -dimensional key with 32bit integers.
Insertion or removal operations cause exactly one node to be modified and potentially a second node to be added or removed. This can be useful for concurrent implementations. This also means little variation in modification cost.
Their structure is independent from insertion/removal order.

Splitting strategy

Similar to most quadtrees, the PH-tree is a hierarchy of nodes where every node splits the space in all $d$ dimensions.^[1] Thus, a node can have up to $2^{d}$ subnodes, one for each quadrant.

Quadrant numbering

The PH-tree uses the bits of the multi-dimensional keys to determine their position in the tree. All keys that have the same leading bits are stored in the same branch of the tree.^[1]

For example, in a node at level $L$ , to determine the quadrant where a key should be inserted (or removed or looked up), it looks at the $L$ 's bit of each dimension of the key. For a 3D node with 8 quadrants (forming a cube) the $L$ 's bit of the first dimension of the key determines whether the target quadrant is on the left or the right of the cube, the $L$ 's bit of the second dimension determines whether it is at the front or the back, and the $L$ 's bit of the third dimension determines bottom vs top, see picture.

1D example

Example with three 1D keys with 8bit values: $k_{0}=\{1\}_{base\ 10}=\{00000001\}_{base\ 2}$ , $k_{1}=\{4\}_{10}=\{00000100\}_{2}$ and $k_{2}=\{35\}_{10}=\{00100011\}_{2}$ . Adding $k_{0}$ and $k_{1}$ to an empty tree results in a single node. The two keys first differ in their 6th bit so the node has a level $L=5$ (starting with 0). The node has a 5bit prefix representing the common 5 bits of both keys. The node has two quadrants, each key is stored in one quadrant. Adding a third key $k_{3}$ results in one additional node at $L=2$ with one quadrant containing the original node as subnode and the other quadrant containing the new key $k_{2}$ .^{[ citation needed ]}

2D example

With 2D keys every node has $2^{d}=4$ quadrants. The position of the quadrant where a key is stored is extracted from the respective bits of the keys, one bit from each dimension. The four quadrants of the node form a 2D hypercube (quadrants may be empty). The bits that are extracted from the keys form the hypercube address $h$ , for $k_{0}\rightarrow h=\{00\}_{2}$ and for $k_{1}\rightarrow h=\{01\}_{2}$ . $h$ is effectively the position of the quadrant in the node's hypercube.^{[ citation needed ]}

Node structure

The ordering of the entries in a node always follows Z-ordering. Entries in a node can, for example, be stored in fixed size arrays of size $2^{d}$ . $h$ is then effectively the array index of a quadrant. This allows lookup, insert and remove with $O(1)$ and there is no need to store $h$ . Space complexity is however $O(2^{d})$ per node, so it is less suitable for high dimensional data.^[1]

Another solution is to store entries in a sorted collection, such as dynamic arrays and/or B-trees. This slows down lookup operations to $O(\log {n_{node\_entries}})$ but reduces memory consumption to $O(n_{node\_entries})$ .^[1]

The original implementation aimed for minimal memory consumption by switching between fixed and dynamic array representation depending on which uses less memory.^[1] Other implementations do not switch dynamically but use fixed arrays for $d\lesssim 4$ , dynamic arrays for $d\lesssim 8$ and B-trees for high dimensional data.

Operations

Lookup, insertion and removal operations all work very similar: find the correct node, then perform the operation on the node. Window queries and $k$ -nearest-neighbor searches are more complex.

Lookup

The Lookup operation determines whether a key exists in the tree. It walks down the tree and checks every node whether it contains a candidate subnode or a user value that matches the key.^[1]

function lookup(key) is     entry ← get_root_entry()    // if the tree is not empty the root entry contains a root node     while entry != NIL && entry.is_subnode() do          node ← entry.get_node()         entry ← node.get_entry(key)       repeatreturn entry                    // entry can be NIL

function get_entry(key) is     node ← current node     h ← extract_bits_at_depth(key, node.get_depth()}     entry ← node.get_entry_at(h)   return entry                    // entry can be NIL

Insert

The Insert operation inserts a new key-value pair into the tree unless they key already exists. The operation traverses the tree like the Lookup function and then inserts the key into the node. There are several cases to consider:^[1]

The quadrant is empty and we can simply insert a new entry into the quadrant and return.
The quadrant contains a user entry with a key that is identical to the new entry. One way to deal with such a collision is to return a flag that indicates failed insertion. If the tree is implemented as multi-map with a collection as the node's entry, the new value is added to that collection.
The quadrant contains an entry (user entry or subnode entry) with a different key. This case requires replacing the existing entry with a new subnode that holds the old and the new entry.

function insert(node, key, value)     level ← node.get_level()            // Level is 0 for root     h ← extract_bits_at_level(key, level)     entry ← node.get_entry(h)     if entry == NIL then         // Case 1.         entry_new ← create_entry(key, value)         node.set_entry(h, entry_new)            else if !entry.is_subnode() && entry.get_key() == key then        // Case 2. Collision, there is already an entry        return ← failed_insertion             else         // Case 3.         level_diff ← get_level_of_difference(key, entry.get_key())          entry_new ← create_entry(key, value)         // new subnode with existing entry and new entry         subnode_new ← create_node(level_diff, entry, entry_new)          node.set_entry(h, subnode_new)          end ifreturn

Remove

Removal works inversely to insertion, with the additional constraint that any subnode has to be removed if less than two entries remain. The remaining entry is moved to the parent node.^[1]

Window queries

Window queries are queries that return all keys that lie inside a rectangular axis-aligned hyperbox. They can be defined to be two $d$ -dimensional points $min$ and $max$ that represent the "lower left" and "upper right" corners of the query box. A trivial implementation traverses all entries in a node (starting with the root node) and if an entry matches it either adds it to the result list (if it is a user entry) or recursively traverses it (if it is a subnode).^[1]

function query(node, min, max, result_list) isforeach entry ← node.get_entries() doif entry.is_subnode() thenif entry.get_prefix() >= min and entry.get_prefix() <= max then                 query(entry.get_subnode(), min, max, result_list)             end ifelseif entry.get_key() >= min and entry.get_key() <= max then                 result_list.add(entry)             end ifend ifrepeatreturn

In order to accurately estimate query time complexity the analysis needs to include the dimensionality $d$ . Traversing and comparing all $n_{node\_entries}$ entries in a node has a time complexity of $O(d\cdot n_{node\_entries})$ because each comparison of $d$ -dimensional key with $min/max$ takes $O(d)$ time. Since nodes can have up to $2^{d}$ entries, this does not scale well with increasing dimensionality $d$ . There are various ways how this approach can be improved by making use of the hypercube address $h$ .^[4]

Min $h$ & max $h$

The idea is to find minimum and maximum values for the quadrant's addresses $h$ such that the search can avoid some quadrants that do not overlap with the query box. Let $C$ be the center of a node (this is equal to the node's prefix) and $h_{min}$ and $h_{max}$ be two bit strings with $d$ bits each. Also, let subscript $i$ with $0\leq i<d$ indicate the $i$ 's bit of $h_{min}$ and $h_{max}$ and the $i$ 'th dimension of $min$ , $max$ and $C$ .

Let $h_{min,i}=(min_{i}\leq C_{i})$ and $h_{max,i}=(max_{i}\geq C_{i})$ . $h_{min}$ then has a ` $1$ ` for every dimension where the "lower" half of the node and all quadrants in it does not overlap with the query box. Similarly, $h_{min}$ has a ` $0$ ` for every dimension where the "upper" half does not overlap with the query box.

$h_{min}$ and $h_{max}$ then present the lowest and highest $h$ in a node that need to be traversed. Quadrants with $h<h_{min}$ or $h>h_{max}$ do not intersect with the query box. A proof is available in.^[4] With this, the above query function can be improved to:

function query(node, min, max, result_list) is     h_min ← calculate h_min     h_max ← calculate h_max     for each entry ← node.get_entries_range(h_min, h_max) do         [ ... ]     repeatreturn

Calculating $h_{min}$ and $h_{max}$ is $O(2d)=O(d)$ . Depending on the distribution of the occupied quadrants in a node this approach will allow avoiding anywhere from no to almost all key comparisons. This reduces the average traversal time but the resulting complexity is still $O(d+d\cdot n_{node\_entries})$ .^[4]

Check quadrants for overlap with query box

Between $h_{min}$ and $h_{max}$ there can still be quadrants that do not overlap with the query box. Idea: $h_{min}$ and $h_{max}$ each have one bit for every dimensions that indicates whether the query box overlaps with the lower/upper half of a node in that dimension. This can be used to quickly check whether a quadrant $h$ overlaps with the query box without having to compare $d$ -dimensional keys: a quadrant $h$ overlaps with the query box if for every ` $0$ ` bit in $h$ there is a corresponding ` $0$ ` bit in $h_{min}$ and for every ` $1$ ` bit in $h$ there is a corresponding ` $1$ ` bit in $h_{max}$ . On a CPU with 64bit registers it is thus possible to check for overlap of up to $64$ -dimensional keys in $O(1)$ .^[4]

function is_overlap(h, h_min, h_max) isreturn (h | h_min) & h_max == h            // evaluates to 'true' if quadrant and query overlap.

function query(node, min, max, result_list) is     h_min ← calculate h_min     h_max ← calculate h_max     for each entry ← node.get_entries_range(h_min, h_max) do         h ← entry.get_h();         if (h | h_min) & h_max == h then   // evaluates to 'true' if quadrant and query overlap.            [ ... ]         end ifrepeatreturn

The resulting time complexity is $O(d+n_{node\_entries})$ compared to the $O(d\cdot n_{node\_entries})$ of the full iteration.^[4]

Traverse quadrants that overlap with query box

For higher dimensions with larger nodes it is also possible to avoid iterating through all $h$ and instead directly calculate the next higher $h$ that overlaps with the query box. The first step puts ` $1$ `-bits into a given $h_{input}$ for all quadrants that have no overlap with the query box. The second step increments the adapted $h$ and the added ` $1$ `-bits trigger an overflow so that the non-overlapping quadrants are skipped. The last step removes all the undesirable bits used for triggering the overflow. The logic is described in detail in.^[4] The calculation works as follows:

function increment_h(h_input, h_min, h_max) is     h_out = h_input | (~ h_max )        // pre - mask     h_out += 1                          // increment     h_out = ( h_out & h_max ) | h_min   // post - mask return h_out

Again, for $d\leq 64$ this can be done on most CPUs in $O(1)$ . The resulting time complexity for traversing a node is $O(d+n_{overlapping\_quadrants})$ .^[4] This works best if most of the quadrants that overlap with the query box are occupied with an entry.

$k$ -nearest neighbors

$k$ nearest neighbor searches can be implemented using standard algorithms.^[5]

Floating point keys

The PH-tree can only store integer values. Floating point values can trivially be stored as integers casting them as an integer. However, the authors also propose an approach without loss of precision.^[1]^[4]

Lossless conversion

Lossless converting of a floating point value into an integer value (and back) without loss if precision can be achieved by simply interpreting the 32 or 64 bits of the floating point value as an integer (with 32 or 64 bits). Due to the way that IEEE 754 encodes floating point values, the resulting integer values have the same ordering as the original floating point values, at least for positive values. Ordering for negative values can be achieved by inverting the non-sign bits.^[1]^[4]

Example implementations in Java:

longencode(doublevalue){longr=Double.doubleToRawLongBits(value);return(r>=0)?r:r^0x7FFFFFFFFFFFFFFFL;}

Example implementations in C++:

std::int64_tencode(doublevalue){std::int64_tr;memcpy(&r,&value,sizeof(r));returnr>=0?r:r^0x7FFFFFFFFFFFFFFFL;}

Encoding (and the inverse decoding) is lossless for all floating point values. The ordering works well in practice, including $\pm \infty$ and $-0.0$ . However, the integer representation also turns $NaN$ into a normal comparable value (smaller than infinity), infinities are comparable to each other and $0.0$ is larger than $-0.0$ .^[6] That means that, for example, a query range $[0.0,10.0]$ will not match a value of $-0.0$ . In order to match $-0.0$ the query range needs to be $[-0.0,10.0]$ .^{[ citation needed ]}

Hyperbox keys

In order to store volumes (axis-aligned hyper-boxes) as keys, implementations typically use corner representation^[7] which converts the two $d$ -dimensional minimum and maximum corners of a box into a single key with $2d$ dimensions, for example by interleaving them: $k=\{min_{0},max_{0},min_{1},max_{1},...,min_{d-1},max_{d-1}\}$ .

This works trivially for lookup, insert and remove operations. Window queries need to be converted from $d$ -dimensional vectors to $2d$ -dimensional vectors. For example, for a window query that matches all boxes that are completely inside the query box, the query keys are:^[7]^[8]

$k_{min}=\{min_{0},min_{0},min_{1},min_{1},...,min_{d-1},min_{d-1}\}$

$k_{max}=\{max_{0},max_{0},max_{1},max_{1},...,max_{d-1},max_{d-1}\}$

For a window query operation that matches all boxes that intersect with a query box, the query keys are:^[8]

$k_{min}=\{-\infty ,min_{0},-\infty ,min_{1},...,-\infty ,min_{d-1}\}$

$k_{max}=\{max_{0},+\infty ,max_{1},+\infty ,...,max_{d-1},+\infty \}$

Scalability

In high dimensions with less than $2^{d}$ entries, a PH-tree may have only a single node, effectively “degenerating” into a B-Tree with Z-order curve. The add/remove/lookup operations remain $O(\log {n})$ and window queries can use the quadrant filters. However, this cannot avoid the curse of dimensionality, for high dimensional data with $d=50$ or $d=100$ a PH-tree is only marginally better than a full scan.^[9]

Uses

Research has reported fast add/remove/exact-match operations with large and fast changing datasets.^[10] Window queries have been shown to work well especially for small windows^[11] or large dataset^[12]

The PH-tree is mainly suited for in-memory use.^[10]^[13]^[14] The size of the nodes (number of entries) is fixed while persistent storage tends to benefit from indexes with configurable node size to align node size with page size on disk. This is easier with other spatial indexes, such as R-Trees.

Implementations

Java: GitHub repository by original inventor
C++: GitHub repository by original inventor
C++: GitHub repository
C++: GitHub repository

External links

PH-tree website with detailed description, examples and performance comparison

Related Research Articles

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems.

In computing, Chord is a protocol and algorithm for a peer-to-peer distributed hash table. A distributed hash table stores key-value pairs by assigning keys to different computers ; a node will store the values for all the keys for which it is responsible. Chord specifies how keys are assigned to nodes, and how a node can discover the value for a given key by first locating the node responsible for that key.

Kademlia is a distributed hash table for decentralized peer-to-peer computer networks designed by Petar Maymounkov and David Mazières in 2002. It specifies the structure of the network and the exchange of information through node lookups. Kademlia nodes communicate among themselves using UDP. A virtual or overlay network is formed by the participant nodes. Each node is identified by a number or node ID. The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values.

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

<span class="mw-page-title-main">R-tree</span> Data structures used in spatial indexing

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts. A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as "Find all museums within 2 km of my current location", "retrieve all road segments within 2 km of my location" or "find the nearest gas station". The R-tree can also accelerate nearest neighbor search for various distance metrics, including great-circle distance.

In computer science, a fusion tree is a type of tree data structure that implements an associative array on $w$ -bit integers on a finite universe, where each of the input integers has size less than 2^w and is non-negative. When operating on a collection of $n$ key–value pairs, it uses $O (n)$ space and performs searches in $O (log w n)$ time, which is asymptotically faster than a traditional self-balancing binary search tree, and also better than the van Emde Boas tree for large values of $w$ . It achieves this speed by using certain constant-time operations that can be done on a machine word. Fusion trees were invented in 1990 by Michael Fredman and Dan Willard.

A van Emde Boas tree, also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with $m$ -bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975. It performs all operations in $O (log m)$ time, or equivalently in $O (log log M)$ time, where $M = 2 m$ is the largest element that can be stored in the tree. The parameter $M$ is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured.

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree.

<i>k</i>-d tree Multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any number of dimensions. k-d trees are a useful data structure for several applications, such as:

In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It is named in France after Henri Lebesgue, who studied it in 1904, and named in the United States after Guy Macdonald Morton, who first applied the order to file sequencing in 1966. The z-value of a point in multidimensions is simply calculated by interleaving the binary representations of its coordinate values. Once the data are sorted into this ordering, any one-dimensional data structure can be used, such as simple one dimensional arrays, binary search trees, B-trees, skip lists or hash tables. The resulting ordering can equivalently be described as the order one would get from a depth-first traversal of a quadtree or octree.

An implicit k-d tree is a k-d tree defined implicitly above a rectilinear grid. Its split planes' positions and orientations are not given explicitly but implicitly by some recursive splitting-function defined on the hyperrectangles belonging to the tree's nodes. Each inner node's split plane is positioned on a grid plane of the underlying grid, partitioning the node's grid into two subgrids.

In peer-to-peer networks, Koorde is a distributed hash table (DHT) system based on the Chord DHT and the De Bruijn graph. Inheriting the simplicity of Chord, Koorde meets $O (log n)$ hops per node, and hops per lookup request with $O (log n)$ neighbors per node.

In computer science, M-trees are tree data structures that are similar to R-trees and B-trees. It is constructed using a metric and relies on the triangle inequality for efficient range and k-nearest neighbor (k-NN) queries. While M-trees can perform well in many conditions, the tree can also have large overlap and there is no clear strategy on how to best avoid overlap. In addition, it can only be used for distance functions that satisfy the triangle inequality, while many advanced dissimilarity functions used in information retrieval do not satisfy this.

In computer science, an x-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982, along with the more complicated y-fast trie, as a way to improve the space usage of van Emde Boas trees, while retaining the O(log log M) query time.

The Priority R-tree is a worst-case asymptotically optimal alternative to the spatial tree R-tree. It was first proposed by Arge, De Berg, Haverkort and Yi, K. in an article from 2004. The prioritized R-tree is essentially a hybrid between a k-dimensional tree and a r-tree in that it defines a given object's N-dimensional bounding volume as a point in N-dimensions, represented by the ordered pair of the rectangles. The term prioritized arrives from the introduction of four priority-leaves that represents the most extreme values of each dimensions, included in every branch of the tree. Before answering a window-query by traversing the sub-branches, the prioritized R-tree first checks for overlap in its priority nodes. The sub-branches are traversed by checking whether the least value of the first dimension of the query is above the value of the sub-branches. This gives access to a quick indexation by the value of the first dimension of the bounding box.

In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.

A relaxed K-d tree or relaxed K-dimensional tree is a data structure which is a variant of K-d trees. Like K-dimensional trees, a relaxed K-dimensional tree stores a set of n-multidimensional records, each one having a unique K-dimensional key x=(x₀,... ,x_K−1). Unlike K-d trees, in a relaxed K-d tree, the discriminants in each node are arbitrary. Relaxed K-d trees were introduced in 1998.

References

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Zäschke, Tilmann; Zimmerli, Christoph; Norrie, Moira C. (June 2014). "The PH-tree". Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. pp. 397–408. doi:10.1145/2588555.2588564. ISBN 9781450323765. S2CID 6862850 . Retrieved 10 February 2022.
↑ Kouahla, Z.; Benrazek, A.-E.; Ferrag, M. A.; Farou, B.; Seridi, H.; Kurulay, M.; Anjum, A.; Asheralieva, A. (2022). "Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues". Future Internet. 14 (1): 19. doi: 10.3390/fi14010019 .
↑ Mahmood, A. R.; Punni, S.; Aref, W. G. (2018). "Spatio-temporal access methods: a survey (2010 – 2017)". Geoinformatica. 23 (1): 1–36. doi:10.1007/s10707-018-0329-2. S2CID 106407322.
1 2 3 4 5 6 7 8 9 10 Zäschke, Tilmann; Norrie, Moira (2017). "Efficient Z-Ordered Traversal of Hypercube Indexes". Datenbanksysteme für Business, Technologie und Web (BTW 2017). Lecture Notes in Informatics. Vol. P-265. Bonn: Gesellschaft für Informatik. pp. 465–484. doi:10.3929/ethz-a-010802003. ISBN 9783885796596.
↑ Hjaltason, Gísli R.; Samet, Hanan (June 1999). "Distance browsing in spatial databases". ACM Transactions on Database Systems. 24 (2): 265–318. doi:10.1145/320248.320255. S2CID 10881319 . Retrieved 12 February 2022.
↑ IEEE 754 2019
1 2 Seeger, B.; Kriegel, H. P. (1988). "Techniques for Design and Implementation of Efficient Spatial Access Methods". Proceedings 1988 VLDB Conference: 14th International Conference on Very Large Data Bases. 14: 360.
1 2 Samet, Hanan (2006). Foundations of multidimensional and metric data structures. San Francisco: Elsevier/Morgan-Kaufmann. pp. 440–441, 453–457. ISBN 0-12-369446-9.
↑ Li, Yan; Ge, Tingjian; Chen, Cindy (2020). "Online Indices for Predictive Top-k Entity and Aggregate Queries on Knowledge Graphs". 2020 IEEE 36th International Conference on Data Engineering (ICDE). pp. 1057–1068. doi:10.1109/ICDE48307.2020.00096. ISBN 978-1-7281-2903-7. S2CID 218907333.
1 2 Sprenger, Stefan (2019). Efficient Processing of Range Queries in Main Memory (doctoralThesis). Humboldt-Universität zu Berlin. doi:10.18452/19786.
↑ Khatibi, A.; Porto, F.; Rittmeyer, J. G.; Ogasawara, E.; Valduriez, P.; Shasha, D. (August 2017). "Pre-processing and Indexing Techniques for Constellation Queries in Big Data". Big Data Analytics and Knowledge Discovery (PDF). Lecture Notes in Computer Science. Vol. 10440. pp. 164–172. doi:10.1007/978-3-319-64283-3_12. ISBN 978-3-319-64282-6. S2CID 3857469.
↑ Winter, C.; Kipf, A.; Anneser, C.; Zacharatou, E. T.; Neumann, T.; Kemper, A. (2020). "Database Technology". GeoBlocks: A Query-Cache Accelerated Data Structure for Spatial Aggregation over Polygons. Vol. 23. OpenProceedings.org. pp. 169–180. doi:10.5441/002/edbt.2021.16.
↑ Wang, S.; Maier, D.; Ooi, B. (2016). "Fast and Adaptive Indexing of Multi-Dimensional Observational Data". Proceedings of the VLDB Endowment. 9 (14): 1683. doi:10.14778/3007328.3007334.
↑ Herrera, Stiw; da Silva, Larissa Miguez; Reis, Paulo Ricardo; Silva, Anderson; Porto, Fabio (2021). "Managing Sparse Spatio-Temporal Data in SAVIME: an Evaluation of the PH-tree Index". Anais do XXXVI Simpósio Brasileiro de Bancos de Dados: 337–342. doi: 10.5753/sbbd.2021.17895 . S2CID 245185935.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[PH-tree-2014-1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Zäschke, Tilmann; Zimmerli, Christoph; Norrie, Moira C. (June 2014). "The PH-tree". Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. pp. 397–408. doi:10.1145/2588555.2588564. ISBN 9781450323765. S2CID 6862850 . Retrieved 10 February 2022.

[Kouahla-2022-2] Kouahla, Z.; Benrazek, A.-E.; Ferrag, M. A.; Farou, B.; Seridi, H.; Kurulay, M.; Anjum, A.; Asheralieva, A. (2022). "Survey on Big IoT Data Indexing: Potential Solutions, Recent Advancements, and Open Issues". Future Internet. 14 (1): 19. doi: 10.3390/fi14010019 .

[Mahmood-2018-3] Mahmood, A. R.; Punni, S.; Aref, W. G. (2018). "Spatio-temporal access methods: a survey (2010 – 2017)". Geoinformatica. 23 (1): 1–36. doi:10.1007/s10707-018-0329-2. S2CID 106407322.

[z-ordered-traversal-2017-4] 1 2 3 4 5 6 7 8 9 10 Zäschke, Tilmann; Norrie, Moira (2017). "Efficient Z-Ordered Traversal of Hypercube Indexes". Datenbanksysteme für Business, Technologie und Web (BTW 2017). Lecture Notes in Informatics. Vol. P-265. Bonn: Gesellschaft für Informatik. pp. 465–484. doi:10.3929/ethz-a-010802003. ISBN 9783885796596.

[kNN-5] Hjaltason, Gísli R.; Samet, Hanan (June 1999). "Distance browsing in spatial databases". ACM Transactions on Database Systems. 24 (2): 265–318. doi:10.1145/320248.320255. S2CID 10881319 . Retrieved 12 February 2022.

[IEEE-754-2019-6] IEEE 754 2019

[Seeger-1988-corner-representation-7] 1 2 Seeger, B.; Kriegel, H. P. (1988). "Techniques for Design and Implementation of Efficient Spatial Access Methods". Proceedings 1988 VLDB Conference: 14th International Conference on Very Large Data Bases. 14: 360.

[Samet-foundations-2006-8] 1 2 Samet, Hanan (2006). Foundations of multidimensional and metric data structures. San Francisco: Elsevier/Morgan-Kaufmann. pp. 440–441, 453–457. ISBN 0-12-369446-9.

[Li-2020-9] Li, Yan; Ge, Tingjian; Chen, Cindy (2020). "Online Indices for Predictive Top-k Entity and Aggregate Queries on Knowledge Graphs". 2020 IEEE 36th International Conference on Data Engineering (ICDE). pp. 1057–1068. doi:10.1109/ICDE48307.2020.00096. ISBN 978-1-7281-2903-7. S2CID 218907333.

[Spengler-thesis-10] 1 2 Sprenger, Stefan (2019). Efficient Processing of Range Queries in Main Memory (doctoralThesis). Humboldt-Universität zu Berlin. doi:10.18452/19786.

[Khatibi-2017-11] Khatibi, A.; Porto, F.; Rittmeyer, J. G.; Ogasawara, E.; Valduriez, P.; Shasha, D. (August 2017). "Pre-processing and Indexing Techniques for Constellation Queries in Big Data". Big Data Analytics and Knowledge Discovery (PDF). Lecture Notes in Computer Science. Vol. 10440. pp. 164–172. doi:10.1007/978-3-319-64283-3_12. ISBN 978-3-319-64282-6. S2CID 3857469.

[Kipf-2020-12] Winter, C.; Kipf, A.; Anneser, C.; Zacharatou, E. T.; Neumann, T.; Kemper, A. (2020). "Database Technology". GeoBlocks: A Query-Cache Accelerated Data Structure for Spatial Aggregation over Polygons. Vol. 23. OpenProceedings.org. pp. 169–180. doi:10.5441/002/edbt.2021.16.

[Wang-2016-13] Wang, S.; Maier, D.; Ooi, B. (2016). "Fast and Adaptive Indexing of Multi-Dimensional Observational Data". Proceedings of the VLDB Endowment. 9 (14): 1683. doi:10.14778/3007328.3007334.

[Herrera-2021-14] Herrera, Stiw; da Silva, Larissa Miguez; Reis, Paulo Ricardo; Silva, Anderson; Porto, Fabio (2021). "Managing Sparse Spatio-Temporal Data in SAVIME: an Evaluation of the PH-tree Index". Anais do XXXVI Simpósio Brasileiro de Bancos de Dados: 337–342. doi: 10.5753/sbbd.2021.17895 . S2CID 245185935.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

v t e Tree data structures
Search trees (dynamic sets/associative arrays)	2–3 2–3–4 AA (a,b) AVL B B+ B* B^x (Optimal) Binary search Dancing HTree Interval Order statistic (Left-leaning) Red–black Scapegoat Splay T Treap UB Weight-balanced
Heaps	Binary Binomial Brodal Fibonacci Leftist Pairing Skew van Emde Boas Weak
Tries	Ctrie C-trie (compressed ADT) Hash Radix Suffix Ternary search X-fast Y-fast
Spatial data partitioning trees	Ball BK BSP Cartesian Hilbert R k-d (implicit k-d) M Metric MVP Octree PH Priority R Quad R R+ R* Segment VP X
Other trees	Cover Exponential Fenwick Finger Fractal tree index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top

PH-tree

Contents

Overview

Splitting strategy

Quadrant numbering

1D example

2D example

Node structure

Operations

Lookup

Insert

Remove

Window queries

Min $h$ & max $h$

Check quadrants for overlap with query box

Traverse quadrants that overlap with query box

$k$ -nearest neighbors

Floating point keys

Lossless conversion

Hyperbox keys

Scalability

Uses

Implementations

See also

External links

Related Research Articles

References

PH-tree

Contents

Overview

Splitting strategy

Quadrant numbering

1D example

2D example

Node structure

Operations

Lookup

Insert

Remove

Window queries

Min h & max h

Check quadrants for overlap with query box

Traverse quadrants that overlap with query box

k-nearest neighbors

Floating point keys

Lossless conversion

Hyperbox keys

Scalability

Uses

Implementations

See also

External links

Related Research Articles

References

Min $h$ & max $h$

$k$ -nearest neighbors