Oblivious data structure

Last updated July 30, 2024

In computer science, an oblivious data structure is a data structure that gives no information about the sequence or pattern of the operations that have been applied except for the final result of the operations.^[1]

In most conditions, even if the data is encrypted, the access pattern can be achieved, and this pattern can leak some important information such as encryption keys. And in the outsourcing of cloud data, this leakage of access pattern is still very serious. An access pattern is a specification of an access mode for every attribute of a relation schema. For example, the sequences of user read or write the data in the cloud are access patterns.

We say a machine is oblivious if the sequence in which it accesses is equivalent for any two inputs with the same running time. So the data access pattern is independent from the input.

Applications:

Cloud data outsourcing: When writing or reading data from a cloud server, oblivious data structures are useful. And modern databases rely on data structures heavily, so oblivious data structures come in handy.
Secure processor: Tamper-resilient secure processors are used for defense against physical attacks or the malicious intruders access the users’ computer platforms. The existing secure processors designed in academia and industry include AEGIS and Intel SGX. But the memory addresses are still transferred in the clear on the memory bus. So the research finds that this memory buses can give out the information about encryption keys. With the Oblivious data structure comes in practical, the secure processor can obfuscate memory access pattern in a provably secure manner.
Secure computation: Traditionally people used circuit-model to do the secure computation, but the model is not enough for the security when the amount of data is getting big. RAM-model secure computation was proposed as an alternative to the traditional circuit model, and oblivious data structure is used to prevent information access behavioral being stolen.

Oblivious data structures

Oblivious RAM

Goldreich and Ostrovsky proposed this term on software protection.

The memory access of oblivious RAM is probabilistic and the probabilistic distribution is independent of the input. In the paper composed by Goldreich and Ostrovsky have theorem to oblivious RAM: Let $RAM(m)$ denote a RAM with m memory locations and access to a random oracle machine. Then t steps of an arbitrary $RAM(m)$ program can be simulated by less than ⁠ $O(t(\log _{2}t)^{3})$ ⁠ steps of an oblivious ⁠ $\mathrm {RAM} (m(\log _{2}m)^{2})$ ⁠. Every oblivious simulation of $RAM(m)$ must make at least $\max\{m,(t-1)\log _{2}m\}$ accesses in order to simulate t steps.

Now we have the square-root algorithm to simulate the oblivious ram working.

For each ${\sqrt {m}}$ accesses, randomly permute first $m+{\sqrt {m}}$ memory.
Check the shelter words first if we want to access a word.
If the word is there, access one of the dummy words. And if the word is not there, find the permuted location.

To access original RAM in t steps we need to simulate it with $t+{\sqrt {m}}$ steps for the oblivious RAM. For each access, the cost would be O( ${\sqrt {m}}\cdot \log m$ ).

Another way to simulate is hierarchical algorithm. The basic idea is to consider the shelter memory as a buffer, and extend it to the multiple levels of buffers. For level $I$ , there are ⁠ $4^{i}$ ⁠ buckets and for each bucket has log t items. For each level there is a random selected hash function.

The operation is like the following: At first load program to the last level, which can be say has ⁠ $4^{t}$ ⁠ buckets. For reading, check the bucket ⁠ $h_{i}(V)$ ⁠ from each level, If (V,X) is already found, pick a bucket randomly to access, and if it is not found, check the bucket ⁠ $h_{i}(V)$ ⁠, there is only one real match and remaining are dummy entries . For writing, put (V,X) to the first level, and if the first I levels are full, move all I levels to ⁠ $I+1$ ⁠ levels and empty the first I levels.

The time cost for each level cost O(log t); cost for every access is ⁠ $O((\log t)^{2})$ ⁠; The cost of Hashing is ⁠ $O(t(\log t)^{3})$ ⁠.

Oblivious tree

An Oblivious Tree is a rooted tree with the following property:

All the leaves are in the same level.
All the internal nodes have degree at most 3.
Only the nodes along the rightmost path in the tree may have degree of one.

The oblivious tree is a data structure similar to 2–3 tree, but with the additional property of being oblivious. The rightmost path may have degree one and this can help to describe the update algorithms. Oblivious tree requires randomization to achieve a ⁠ $O(\log(n))$ ⁠ running time for the update operations. And for two sequences of operations M and N acting to the tree, the output of the tree has the same output probability distributions. For the tree, there are three operations:

CREATE (L): build a new tree storing the sequence of values L at its leaves.
INSERT (b, i,T): insert a new leaf node storing the value b as the i^th leaf of the tree T.
DELETE (i, T): remove the i^th leaf from T.

Step of Create: The list of nodes at the i^thlevel is obtained traversing the list of nodes at level i+1 from left to right and repeatedly doing the following:

Choose d {2, 3} uniformly at random.
If there are less than d nodes left at level i+1, set d equal to the number of nodes left.
Create a new node n at level I with the next d nodes at level i+1 as children and compute the size of n as the sum of the sizes of its children.

oblivious tree

For example, if the coin tosses of d {2, 3} has an outcome of: 2, 3, 2, 2, 2, 2, 3 stores the string “OBLIVION” as follow oblivious tree.

Both the INSERT (b, I, T) and DELETE(I, T) have the O(log n) expected running time. And for INSERT and DELETE we have:

INSERT (b, I, CREATE (L)) = CREATE (L [1] + …….., L[ i], b, L[i+1]………..) DELETE (I, CREATE (L)) = CREATE (L[1]+ ………L[I - 1], L[i+1], ………..)

For example, if the CREATE (ABCDEFG) or INSERT (C, 2, CREATE (ABDEFG)) is run, it yields the same probabilities of out come between these two operations.

Related Research Articles

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

<span class="mw-page-title-main">Heap (data structure)</span> Computer science data structure

In computer science, a heap is a tree-based data structure that satisfies the heap property: In a max heap, for any given node C, if P is a parent node of C, then the key of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap is called the root node.

In computer science, a priority queue is an abstract data type similar to a regular queue or stack abstract data type. Each element in a priority queue has an associated priority. In a priority queue, elements with high priority are served before elements with low priority. In some implementations, if two elements have the same priority, they are served in the same order in which they were enqueued. In other implementations, the order of elements with the same priority is undefined.

In computer science, a red–black tree is a self-balancing binary search tree data structure noted for fast storage and retrieval of ordered information. The nodes in a red-black tree hold an extra "color" bit, often drawn as red and black, which help ensure that the tree is always approximately balanced.

A splay tree is a binary search tree with the additional property that recently accessed elements are quick to access again. Like self-balancing binary search trees, a splay tree performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For random access patterns drawn from a non-uniform random distribution, their amortized time can be faster than logarithmic, proportional to the entropy of the access pattern. For many patterns of non-random operations, also, splay trees can take better than logarithmic time, without requiring advance knowledge of the pattern. According to the unproven dynamic optimality conjecture, their performance on all access patterns is within a constant factor of the best possible performance that could be achieved by any other self-adjusting binary search tree, even one selected to fit that pattern. The splay tree was invented by Daniel Sleator and Robert Tarjan in 1985.

<span class="mw-page-title-main">Binary heap</span> Variant of heap data structure

A binary heap is a heap data structure that takes the form of a binary tree. Binary heaps are a common way of implementing priority queues. The binary heap was introduced by J. W. J. Williams in 1964, as a data structure for heapsort.

<span class="mw-page-title-main">Treap</span> Random search tree data structure

In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys. After any sequence of insertions and deletions of keys, the shape of the tree is a random variable with the same probability distribution as a random binary tree; in particular, with high probability its height is proportional to the logarithm of the number of keys, so that each search, insertion, or deletion operation takes logarithmic time to perform.

In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting of a collection of heap-ordered trees. It has a better amortized running time than many other priority queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987. Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time analysis.

In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article.

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set into disjoint subsets. It provides operations for adding new sets, merging sets, and finding a representative member of a set. The last operation makes it possible to find out efficiently if any two elements are in the same or different sets.

A van Emde Boas tree, also known as a vEB tree or van Emde Boas priority queue, is a tree data structure which implements an associative array with $m$ -bit integer keys. It was invented by a team led by Dutch computer scientist Peter van Emde Boas in 1975. It performs all operations in $O (log m)$ time, or equivalently in $time, where is the largest element that can be stored in the tree. The parameter is not to be confused with the actual number of elements stored in the tree, by which the performance of other tree data-structures is often measured.$

A B+ tree is an m-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more children.

In computing, a cache-oblivious algorithm is an algorithm designed to take advantage of a processor cache without having the size of the cache as an explicit parameter. An optimal cache-oblivious algorithm is a cache-oblivious algorithm that uses the cache optimally. Thus, a cache-oblivious algorithm is designed to perform well, without modification, on multiple machines with different cache sizes, or for a memory hierarchy with different levels of cache having different sizes. Cache-oblivious algorithms are contrasted with explicit loop tiling, which explicitly breaks a problem into blocks that are optimally sized for a given cache.

In computer science, a longest common substring of two or more strings is a longest string that is a substring of all of them. There may be more than one longest common substring. Applications include data deduplication and plagiarism detection.

In computer science and probability theory, a random binary tree is a binary tree selected at random from some probability distribution on binary trees. Different distributions have been used, leading to different properties for these trees.

In computer science, a fractal tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree. Like a B-tree, a fractal tree index is a generalization of a binary search tree in that a node can have more than two children. Furthermore, unlike a B-tree, a fractal tree index has buffers at each node, which allow insertions, deletions and other changes to be stored in intermediate locations. The goal of the buffers is to schedule disk writes so that each write performs a large amount of useful work, thereby avoiding the worst-case performance of B-trees, in which each disk write may change a small amount of data on disk. Like a B-tree, fractal tree indexes are optimized for systems that read and write large blocks of data. The fractal tree index has been commercialized in databases by Tokutek. Originally, it was implemented as a cache-oblivious lookahead array, but the current implementation is an extension of the B^ε tree. The B^ε is related to the Buffered Repository Tree. The Buffered Repository Tree has degree 2, whereas the B^ε tree has degree B^ε. The fractal tree index has also been used in a prototype filesystem. An open source implementation of the fractal tree index is available, which demonstrates the implementation details outlined below.

An Oblivious RAM (ORAM) simulator is a compiler that transforms an algorithm in such a way that the resulting algorithm preserves the input-output behavior of the original algorithm but the distribution of the memory access patterns of the transformed algorithm is independent of the memory access pattern of the original algorithm.

In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:

In computer science, a parallel external memory (PEM) model is a cache-aware, external-memory abstract machine. It is the parallel-computing analogy to the single-processor external memory (EM) model. In a similar way, it is the cache-aware analogy to the parallel random-access machine (PRAM). The PEM model consists of a number of processors, together with their respective private caches and a shared main memory.

External memory graph traversal is a type of graph traversal optimized for accessing externally stored memory.

References

↑ Wang, Xiao; Nayak, Kartik; Liu, Chang; Chan, Hubert; Shi, Elaine; Stefanov, Emil; Huang, Yan (November 2014). "Oblivious Data Structures". CCS '14: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. Scottsdale, Arizona. pp. 215–226. doi: 10.1145/2660267.2660314 .

Micciancio, Daniele (May 1997). "Oblivious data structures: applications to cryptography". STOC '97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. Symposium on Theory of Computing. El Paso, Texas. pp. 456–464. doi: 10.1145/258533.258638 .
Goldreich, Oded; Ostrovsky, Rafail (May 1996). "Software protection and simulation on oblivious RAMs". Journal of the ACM . 43 (3). Association for Computing Machinery: 431–473. doi: 10.1145/233551.233553 .
Mitchell, John C.; Zimmerman, Joe (March 2014). "Data-Oblivious Data Structures". 31st International Symposium on Theoretical Aspects of Computer Science. Symposium on Theoretical Aspects of Computer Science. Lyon, France. pp. 554–565. doi: 10.4230/LIPIcs.STACS.2014.554 .
Gentry, Craig; Goldman, Kenny A.; Halevi, Shai; Jutla, Charanjit S.; Raykova, Mariana; Wichs, Daniel (July 2013). "Optimizing ORAM and Using It Efficiently for Secure Computation". 13th International Symposium, PETS 2013. Privacy Enhancing Technologies Symposium. Bloomington, IN. doi: 10.1007/978-3-642-39077-7_1 .

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Wang-1] Wang, Xiao; Nayak, Kartik; Liu, Chang; Chan, Hubert; Shi, Elaine; Stefanov, Emil; Huang, Yan (November 2014). "Oblivious Data Structures". CCS '14: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. Scottsdale, Arizona. pp. 215–226. doi: 10.1145/2660267.2660314 .

[1]