Reconstruction attack

Last updated

A reconstruction attack is any method for partially reconstructing a private dataset from public aggregate information. Typically, the dataset contains sensitive information about individuals, whose privacy needs to be protected. The attacker has no or only partial access to the dataset, but has access to public aggregate statistics about the datasets, which could be exact or distorted, for example by adding noise. If the public statistics are not sufficiently distorted, the attacker is able to accurately reconstruct a large portion of the original private data. Reconstruction attacks are relevant to the analysis of private data, as they show that, in order to preserve even a very weak notion of individual privacy, any published statistics need to be sufficiently distorted. This phenomenon was called the Fundamental Law of Information Recovery by Dwork and Roth, and formulated as "overly accurate answers to too many questions will destroy privacy in a spectacular way." [1]

The Dinur-Nissim Attack

In 2003, Irit Dinur and Kobbi Nissim proposed a reconstruction attack based on noisy answers to multiple statistical queries. [2] Their work was recognized by the 2013 ACM PODS Alberto O. Mendelzon Test-of-Time Award in part for being the seed for the development of differential privacy. [3]

Dinur and Nissim model a private database as a sequence of bits , where each bit is the private information of a single individual. A database query is specified by a subset , and is defined to equal . They show that, given approximate answers to queries specified by sets , such that for all , if is sufficiently small and is sufficiently large, then an attacker can reconstruct most of the private bits in . Here the error bound can be a function of and . Nissim and Dinur's attack works in two regimes: in one regime, is exponential in , and the error can be linear in ; in the other regime, is polynomial in , and the error is on the order of .

Related Research Articles

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

In abstract algebra and multilinear algebra, a multilinear form on a vector space over a field is a map

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972.

In universal algebra and in model theory, a structure consists of a set along with a collection of finitary operations and relations that are defined on it.

Homomorphic encryption A form of encryption that allows computation on ciphertexts

Homomorphic encryption is a form of encryption allowing one to perform calculations on encrypted data without decrypting it first. The result of the computation is in an encrypted form, when decrypted the output is the same as if the operations had been performed on the unencrypted data.

In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

In computer science, lattice problems are a class of optimization problems related to mathematical objects called lattices. The conjectured intractability of such problems is central to the construction of secure lattice-based cryptosystems: Lattice problems are an example of NP-hard problems which have been shown to be average-case hard, providing a test case for the security of cryptographic algorithms. In addition, some lattice problems which are worst-case hard can be used as a basis for extremely secure cryptographic schemes. The use of worst-case hardness in such schemes makes them among the very few schemes that are very likely secure even against quantum computers. For applications in such cryptosystems, lattices over vector space or free modules are generally considered.

The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry and Kunal Talwar in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies.

Cynthia Dwork American computer scientist

Cynthia Dwork is an American computer scientist at Harvard University, where she is Gordon McKay Professor of Computer Science, Radcliffe Alumnae Professor at the Radcliffe Institute for Advanced Study, and Affiliated Professor, Harvard Law School and Harvard's Department of Statistics. She is a distinguished scientist at Microsoft Research.

Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is that if the effect of making an arbitrary single substitution in the database is small enough, the query result cannot be used to infer much about any single individual, and therefore provides privacy. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible even to internal analysts.

Learning with errors (LWE) is the computational problem of inferring a linear -ary function over a finite ring from given samples some of which may be erroneous. The LWE problem is conjectured to be hard to solve, and thus be useful in cryptography.

Uncertainty theory is a branch of mathematics based on normality, monotonicity, self-duality, countable subadditivity, and product measure axioms.

In discrete mathematics, ideal lattices are a special class of lattices and a generalization of cyclic lattices. Ideal lattices naturally occur in many parts of number theory, but also in other areas. In particular, they have a significant place in cryptography. Micciancio defined a generalization of cyclic lattices as ideal lattices. They can be used in cryptosystems to decrease by a square root the number of parameters necessary to describe a lattice, making them more efficient. Ideal lattices are a new concept, but similar lattice classes have been used for a long time. For example cyclic lattices, a special case of ideal lattices, are used in NTRUEncrypt and NTRUSign.

Local differential privacy is a model of differential privacy with the added restriction that even if an adversary has access to the personal responses of an individual in the database, that adversary will still be unable to learn too much about the user's personal data. This is contrasted with global differential privacy, a model of differential privacy that incorporates a central aggregator with access to the raw data.

Differentially private analysis of graphs studies algorithms for computing accurate graph statistics while preserving differential privacy. Such algorithms are used for data represented in the form of a graph where nodes correspond to individuals and edges correspond to relationships between them. For examples, edges could correspond to friendships, sexual relationships, or communication patterns. A party that collected sensitive graph data can process it using a differentially private algorithm and publish the output of the algorithm. The goal differentially private analysis of graphs is to design algorithms that compute accurate global information about graphs while preserving privacy of individuals whose data is stored in the graph.

Adding controlled noise from predetermined distributions is a way of designing differentially private mechanisms. This technique is useful for designing private mechanisms for real-valued functions on sensitive data. Some commonly used distributions for adding noise include Laplace and Gaussian distributions.

Since the advent of differential privacy, a number of systems supporting differentially private data analyses have been implemented and deployed.

Kobbi Nissim is a computer scientist at Georgetown University, where he is the McDevitt Chair of Computer Science. His areas of research include cryptography and data privacy. He is known for the introduction of differential privacy.

LSH is a cryptographic hash function designed in 2014 by Republic of Korea to provide integrity in general-purpose software environments such as PCs and smart devices. LSH is one of the cryptographic algorithms approved by the Korean Cryptographic Module Validation Program (KCMVP). And it is the national standard of Republic of Korea.

References

  1. The Algorithmic Foundations of Differential Privacy by Cynthia Dwork and Aaron Roth. Foundations and Trends in Theoretical Computer Science. Vol. 9, no. 3–4, pp. 211‐407, Aug. 2014. DOI:10.1561/0400000042
  2. Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '03). ACM, New York, NY, USA, 202–210. DOI:10.1145/773153.773173
  3. "ACM PODS Alberto O. Mendelzon Test-of-Time Award".