In cryptography, a private information retrieval (PIR) protocol is a protocol that allows a user to retrieve an item from a server in possession of a database without revealing which item is retrieved. PIR is a weaker version of 1-out-of-n oblivious transfer, where it is also required that the user should not get information about other database items.
One trivial, but very inefficient way to achieve PIR is for the server to send an entire copy of the database to the user. In fact, this is the only possible protocol (in the classical or the quantum setting [1] ) that gives the user information theoretic privacy for their query in a single-server setting. [2] There are two ways to address this problem: make the server computationally bounded or assume that there are multiple non-cooperating servers, each having a copy of the database.
The problem was introduced in 1995 by Chor, Goldreich, Kushilevitz and Sudan [2] in the information-theoretic setting and in 1997 by Kushilevitz and Ostrovsky in the computational setting. [3] Since then, very efficient solutions have been discovered. Single database (computationally private) PIR can be achieved with constant (amortized) communication and k-database (information theoretic) PIR can be done with communication.
The first single-database computational PIR scheme to achieve communication complexity less than was created in 1997 by Kushilevitz and Ostrovsky [3] and achieved communication complexity of for any , where is the number of bits in the database. The security of their scheme was based on the well-studied Quadratic residuosity problem. In 1999, Christian Cachin, Silvio Micali and Markus Stadler [4] achieved poly-logarithmic communication complexity. The security of their system is based on the Phi-hiding assumption. In 2004, Helger Lipmaa [5] achieved log-squared communication complexity , where is the length of the strings and is the security parameter. The security of his system reduces to the semantic security of a length-flexible additively homomorphic cryptosystem like the Damgård–Jurik cryptosystem. In 2005 Craig Gentry and Zulfikar Ramzan [6] achieved log-squared communication complexity which retrieves log-square (consecutive) bits of the database. The security of their scheme is also based on a variant of the Phi-hiding assumption. The communication rate was finally brought down to by Aggelos Kiayias, Nikos Leonardos, Helger Lipmaa, Kateryna Pavlyk, Qiang Tang, in 2015. [7]
All previous sublinear-communication computational PIR protocol required linear computational complexity of public-key operations. In 2009, Helger Lipmaa [8] designed a computational PIR protocol with communication complexity and worst-case computation of public-key operations. Amortization techniques that retrieve non-consecutive bits have been considered by Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky and Amit Sahai. [9]
As shown by Ostrovsky and Skeith, [10] the schemes by Kushilevitz and Ostrovsky [3] and Lipmaa [5] use similar ideas based on homomorphic encryption. The Kushilevitz and Ostrovsky protocol is based on the Goldwasser–Micali cryptosystem while the protocol by Lipmaa is based on the Damgård–Jurik cryptosystem.
Achieving information theoretic security requires the assumption that there are multiple non-cooperating servers, each having a copy of the database. Without this assumption, any information-theoretically secure PIR protocol requires an amount of communication that is at least the size of the database n. Multi-server PIR protocols tolerant of non-responsive or malicious/colluding servers are called robust or Byzantine robust respectively. These issues were first considered by Beimel and Stahl (2002). An ℓ-server system that can operate where only k of the servers respond, ν of the servers respond incorrectly, and which can withstand up to t colluding servers without revealing the client's query is called "t-private ν-Byzantine robust k-out-of-ℓ PIR" [DGH 2012]. In 2012, C. Devet, I. Goldberg, and N. Heninger (DGH 2012) proposed an optimally robust scheme that is Byzantine-robust to which is the theoretical maximum value. It is based on an earlier protocol of Goldberg that uses Shamir's Secret Sharing to hide the query. Goldberg has released a C++ implementation on SourceForge. [11]
One-way functions are necessary, but not known to be sufficient, for nontrivial (i.e., with sublinear communication) single database computationally private information retrieval. In fact, such a protocol was proved by Giovanni Di Crescenzo, Tal Malkin and Rafail Ostrovsky to imply oblivious transfer (see below). [12]
Oblivious transfer, also called symmetric PIR, is PIR with the additional restriction that the user may not learn any item other than the one she requested. It is termed symmetric because both the user and the database have a privacy requirement.
Collision-resistant cryptographic hash functions are implied by any one-round computational PIR scheme, as shown by Ishai, Kushilevitz and Ostrovsky. [13]
The basic motivation for Private Information Retrieval is a family of two-party protocols in which one of the parties (the sender) owns a database, and the other part (the receiver) wants to query it with certain privacy restrictions and warranties. So, as a result of the protocol, if the receiver wants the i-th value in the database he must learn the i-th entry, but the sender must learn nothing about i. In a general PIR protocol, a computationally unbounded sender can learn nothing about i so privacy is theoretically preserved. Since the PIR problem was posed, different approaches to its solution have been pursued and some variations were proposed.
A CPIR (Computationally Private Information Retrieval) protocol is similar to a PIR protocol: the receiver retrieves an element chosen by him from the sender's database, so that the sender obtains no knowledge about which element was transferred. [8] The only difference is that privacy is safeguarded against a polynomially bounded sender. [14]
A CSPIR (Computationally Symmetric Private Information Retrieval) protocol is used in a similar scenario in which a CPIR protocol is used. If the sender owns a database, and the receiver wants to get the i-th value in this database, at the end of the execution of a SPIR protocol, the receiver should have learned nothing about values in the database other than the i-th one. [14]
Numerous Computational PIR and Information theoretic PIR schemes in the literature have been implemented. Here is an incomplete list:
In theoretical computer science, communication complexity studies the amount of communication required to solve a problem when the input to the problem is distributed among two or more parties. The study of communication complexity was first introduced by Andrew Yao in 1979, while studying the problem of computation distributed among several machines. The problem is usually stated as follows: two parties each receive a -bit string and . The goal is for Alice to compute the value of a certain function, , that depends on both and , with the least amount of communication between them.
In cryptography, an oblivious transfer (OT) protocol is a type of protocol in which a sender transfers one of potentially many pieces of information to a receiver, but remains oblivious as to what piece has been transferred.
Secure multi-party computation is a subfield of cryptography with the goal of creating methods for parties to jointly compute a function over their inputs while keeping those inputs private. Unlike traditional cryptographic tasks, where cryptography assures security and integrity of communication or storage and the adversary is outside the system of participants, the cryptography in this model protects participants' privacy from each other.
A cryptosystem is considered to have information-theoretic security if the system is secure against adversaries with unlimited computing resources and time. In contrast, a system which depends on the computational cost of cryptanalysis to be secure is called computationally, or conditionally, secure.
In cryptography, a password-authenticated key agreement (PAK) method is an interactive method for two or more parties to establish cryptographic keys based on one or more party's knowledge of a password.
In computational complexity theory, a computational hardness assumption is the hypothesis that a particular problem cannot be solved efficiently. It is not known how to prove (unconditional) hardness for essentially any useful problem. Instead, computer scientists rely on reductions to formally relate the hardness of a new or complicated problem to a computational hardness assumption about a problem that is better-understood.
Mix networks are routing protocols that create hard-to-trace communications by using a chain of proxy servers known as mixes which take in messages from multiple senders, shuffle them, and send them back out in random order to the next destination. This breaks the link between the source of the request and the destination, making it harder for eavesdroppers to trace end-to-end communications. Furthermore, mixes only know the node that it immediately received the message from, and the immediate destination to send the shuffled messages to, making the network resistant to malicious mix nodes.
Secure two-party computation (2PC) a.k.a. Secure function evaluation is sub-problem of secure multi-party computation (MPC) that has received special attention by researchers because of its close relation to many cryptographic tasks. The goal of 2PC is to create a generic protocol that allows two parties to jointly compute an arbitrary function on their inputs without sharing the value of their inputs with the opposing party. One of the most well known examples of 2PC is Yao's Millionaires' problem, in which two parties, Alice and Bob, are millionaires who wish to determine who is wealthier without revealing their wealth. Formally, Alice has wealth , Bob has wealth , and they wish to compute without revealing the values or .
Distributed key generation (DKG) is a cryptographic process in which multiple parties contribute to the calculation of a shared public and private key set. Unlike most public key encryption models, distributed key generation does not rely on Trusted Third Parties. Instead, the participation of a threshold of honest parties determines whether a key pair can be computed successfully. Distributed key generation prevents single parties from having access to a private key. The involvement of many parties requires Distributed key generation to ensure secrecy in the presence of malicious contributions to the key calculation.
Rafail Ostrovsky is a distinguished professor of computer science and mathematics at UCLA and a well-known researcher in algorithms and cryptography.
Non-interactive zero-knowledge proofs are cryptographic primitives, where information between a prover and a verifier can be authenticated by the prover, without revealing any of the specific information beyond the validity of the statement itself. This function of encryption makes direct communication between the prover and verifier unnecessary, effectively removing any intermediaries.
Proactive secret sharing is an underlying technique in Proactive Security Protocols. It is a method to update distributed keys (shares) in a secret sharing scheme periodically such that an attacker has less time to compromise shares and as long as the attacker visits less than a threshold or a quorum group, the system remains secure. This contrasts to a non-proactive scheme where if the threshold number of shares are compromised during the lifetime of the secret, the secret is compromised. The model which takes time constraints into account was originally suggested as an extension of the notion of Byzantine fault tolerance where redundancy of sharing allows robustness into the time domain (periods) and was proposed by Rafail Ostrovsky and Moti Yung in 1991. The method has been used in the areas of cryptographic protocols in secure multi-party computation and in threshold cryptosystems.
In information theory, the computationally bounded adversary problem is a different way of looking at the problem of sending data over a noisy channel. In previous models the best that could be done was ensuring correct decoding for up to d/2 errors, where d was the Hamming distance of the code. The problem with doing it this way is that it does not take into consideration the actual amount of computing power available to the adversary. Rather, it only concerns itself with how many bits of a given code word can change and still have the message decode properly. In the computationally bounded adversary model the channel – the adversary – is restricted to only being able to perform a reasonable amount of computation to decide which bits of the code word need to change. In other words, this model does not need to consider how many errors can possibly be handled, but only how many errors could possibly be introduced given a reasonable amount of computing power on the part of the adversary. Once the channel has been given this restriction it becomes possible to construct codes that are both faster to encode and decode compared to previous methods that can also handle a large number of errors.
Amit Sahai is an Indian-American computer scientist. He is a professor of computer science at UCLA and the director of the Center for Encrypted Functionalities.
In computer science, an oblivious data structure is a data structure that gives no information about the sequence or pattern of the operations that have been applied except for the final result of the operations.
An Oblivious RAM (ORAM) simulator is a compiler that transforms an algorithm in such a way that the resulting algorithm preserves the input-output behavior of the original algorithm but the distribution of the memory access patterns of the transformed algorithm is independent of the memory access pattern of the original algorithm.
Riffle is an anonymity network developed by researchers at MIT and EPFL as a response to the problems of the Tor network.
Garbled circuit is a cryptographic protocol that enables two-party secure computation in which two mistrusting parties can jointly evaluate a function over their private inputs without the presence of a trusted third party. In the garbled circuit protocol, the function has to be described as a Boolean circuit.
Searchable symmetric encryption (SSE) is a form of encryption that allows one to efficiently search over a collection of encrypted documents or files without the ability to decrypt them. SSE can be used to outsource files to an untrusted cloud storage server without ever revealing the files in the clear but while preserving the server's ability to search over them.
An oblivious pseudorandom function (OPRF) is a cryptographic function, similar to a keyed-hash function, but with the distinction that in an OPRF two parties cooperate to securely compute a pseudorandom function (PRF).