Differential privacy

Last updated
An informal definition of differential privacy Differential privacy informal definition.png
An informal definition of differential privacy

Differential privacy (DP) is a mathematically rigorous framework for releasing statistical information about datasets while protecting the privacy of individual data subjects. It enables a data holder to share aggregate patterns of the group while limiting information that is leaked about specific individuals. [1] [2] This is done by injecting carefully calibrated noise into statistical computations such that the utility of the statistic is preserved while provably limiting what can be inferred about any individual in the dataset.

Contents

Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible even to internal analysts.

Roughly, an algorithm is differentially private if an observer seeing its output cannot tell whether a particular individual's information was used in the computation. Differential privacy is often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms provably resist such attacks. [3]

Historical background

Official statistics organizations are charged with collecting information from individuals or establishments, and publishing aggregate data to serve the public interest. For example, the 1790 United States Census collected information about individuals living in the United States and published tabulations based on sex, age, race, and condition of servitude. [4] Census records were originally posted, but starting with the 1840 Census they were collected under a promise of confidentiality that the information provided will be used for statistical purposes, and that the publications will not produce information that can be traced back to a specific individual or establishment.

To accomplish the goal of confidentiality, statistical organizations have long suppressed information in their publications. For example, in a table presenting the sales of each business in a town grouped by business category, a cell that has information from only one company might be suppressed, in order to maintain the confidentiality of that company's specific sales.

The adoption of electronic information processing systems by statistical agencies in the 1950s and 1960s dramatically increased the number of tables that a statistical organization could produce and, in so doing, significantly increased the potential for an improper disclosure of confidential information. For example, if a business that had its sales numbers suppressed also had those numbers appear in the total sales of a region, then it might be possible to determine the suppressed value by subtracting the other sales from that total. But there might also be combinations of additions and subtractions that might cause the private information to be revealed. The number of combinations that needed to be checked increases exponentially with the number of publications, and it is potentially unbounded if data users are able to make queries of the statistical database using an interactive query system.

ε-differential privacy

A formal definition of e-differential privacy.
D
1
{\displaystyle D_{1}}
is a dataset without the private data, and
D
2
{\displaystyle D_{2}}
is one with it. This is "pure e-differential privacy", meaning d=0. Differential privacy formal definition.png
A formal definition of ε-differential privacy. is a dataset without the private data, and is one with it. This is "pure ε-differential privacy", meaning δ=0.

The 2006 Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith article [3] introduced the concept of ε-differential privacy, a mathematical definition for the privacy loss associated with any data release drawn from a statistical database. [5] (Here, the term statistical database means a set of data that are collected under the pledge of confidentiality for the purpose of producing statistics that, by their production, do not compromise the privacy of those individuals who provided the data.)

The definition of ε-differential privacy requires that a change to one entry in a database only creates a small change in the probability distribution of the outputs of measurements, as seen by the by the attacker. [3] The intuition for the definition of ε-differential privacy is that a person's privacy cannot be compromised by a statistical release if their data are not in the database. [6] In differential privacy, each individual is given roughly the same privacy that would result from having their data removed. [6] That is, the statistical functions run on the database should be substantially affected by the removal, addition, or change of any individual in the data. [6]

How much any individual contributes to the result of a database query depends in part on how many people's data are involved in the query. If the database contains data from a single person, that person's data contributes 100%. If the database contains data from a hundred people, each person's data contributes just 1%. The key insight of differential privacy is that as the query is made on the data of fewer and fewer people, more noise needs to be added to the query result to produce the same amount of privacy. Hence the name of the 2006 paper, "Calibrating noise to sensitivity in private data analysis."[ citation needed ]

Definition

Let ε be a positive real number and be a randomized algorithm that takes a dataset as input (representing the actions of the trusted party holding the data). Let denote the image of .

The algorithm is said to provide (ε, δ)-differential privacy if, for all datasets and that differ on a single element (i.e., the data of one person), and all subsets of :

where the probability is taken over the randomness used by the algorithm. [7] This definition is sometimes called "approximate differential privacy", with "pure differential privacy" being a special case when . In the latter case, the algorithm is commonly said to satisfy ε-differential privacy (i.e., omitting ).[ citation needed ]

Differential privacy offers strong and robust guarantees that facilitate modular design and analysis of differentially private mechanisms due to its composability, robustness to post-processing, and graceful degradation in the presence of correlated data.[ citation needed ]

Example

According to this definition, differential privacy is a condition on the release mechanism (i.e., the trusted party releasing information about the dataset) and not on the dataset itself. Intuitively, this means that for any two datasets that are similar, a given differentially private algorithm will behave approximately the same on both datasets. The definition gives a strong guarantee that presence or absence of an individual will not affect the final output of the algorithm significantly.

For example, assume we have a database of medical records where each record is a pair (Name, X), where is a Boolean denoting whether a person has diabetes or not. For example:

NameRossMonicaJoeyPhoebeChandlerRachel
Has Diabetes (X)110010

Now suppose a malicious user (often termed an adversary) wants to find whether Chandler has diabetes or not. Suppose he also knows in which row of the database Chandler resides. Now suppose the adversary is only allowed to use a particular form of query that returns the partial sum of the first rows of column in the database. In order to find Chandler's diabetes status the adversary executes and , then computes their difference. In this example, and , so their difference is 1. This indicates that the "Has Diabetes" field in Chandler's row must be 1. This example highlights how individual information can be compromised even without explicitly querying for the information of a specific individual.

Continuing this example, if we construct by replacing (Chandler, 1) with (Chandler, 0) then this malicious adversary will be able to distinguish from by computing for each dataset. If the adversary were required to receive the values via an -differentially private algorithm, for a sufficiently small , then he or she would be unable to distinguish between the two datasets.

Composability and robustness to post processing

Composability refers to the fact that the joint distribution of the outputs of (possibly adaptively chosen) differentially private mechanisms satisfies differential privacy. [3]

The other important property for modular use of differential privacy is robustness to post processing. This is defined to mean that for any deterministic or randomized function defined over the image of the mechanism , if satisfies ε-differential privacy, so does . [3]

The property of composition permits modular construction and analysis of differentially private mechanisms [3] and motivates the concept of the privacy loss budget.[ citation needed ] If all elements that access sensitive data of a complex mechanisms are separately differentially private, so will be their combination, followed by arbitrary post-processing. [3]

Group privacy

In general, ε-differential privacy is designed to protect the privacy between neighboring databases which differ only in one row. This means that no adversary with arbitrary auxiliary information can know if one particular participant submitted his information. However this is also extendable. [3] We may want to protect databases differing in rows, which amounts to an adversary with arbitrary auxiliary information knowing if particular participants submitted their information. This can be achieved because if items change, the probability dilation is bounded by instead of , [9] i.e., for D1 and D2 differing on items:Thus setting ε instead to achieves the desired result (protection of items). [3] In other words, instead of having each item ε-differentially private protected, now every group of items is ε-differentially private protected (and each item is -differentially private protected). [3]

Hypothesis testing interpretation

One can think of differential privacy as bounding the error rates in a hypothesis test. Consider two hypotheses:

Then, there are two error rates:

Ideal protection would imply that both error rates are equal, but for a fixed (ε, δ) setting, an attacker can achieve the following rates: [10]

ε-differentially private mechanisms

Since differential privacy is a probabilistic concept, any differentially private mechanism is necessarily randomized. Some of these, like the Laplace mechanism, described below, rely on adding controlled noise to the function that we want to compute. Others, like the exponential mechanism [11] and posterior sampling [12] sample from a problem-dependent family of distributions instead.

An important definition with respect to ε-differentially private mechanisms is sensitivity. [3] Let be a positive integer, be a collection of datasets, and be a function. One definition of the sensitivity of a function, denoted , can be defined by: [3] where the maximum is over all pairs of datasets and in differing in at most one element and denotes the L1 norm. [3] In the example of the medical database below, if we consider to be the function , then the sensitivity of the function is one, since changing any one of the entries in the database causes the output of the function to change by either zero or one. This can be generalized to other metric spaces (measures of distance), and must be to make certain differentially private algorithms work, including adding noise from the Gaussian distribution (which requires the L2 norm) instead of the Laplace distribution. [3]

There are techniques (which are described below) using which we can create a differentially private algorithm for functions, with parameters that vary depending on their sensitivity. [3]

Laplace mechanism

Laplace mechanism offering .5-differential privacy for a function with sensitivity 1. Laplace mechanism.png
Laplace mechanism offering .5-differential privacy for a function with sensitivity 1.

The Laplace mechanism adds Laplace noise (i.e. noise from the Laplace distribution, which can be expressed by probability density function , which has mean zero and standard deviation ). Now in our case we define the output function of as a real valued function (called as the transcript output by ) as where and is the original real valued query/function we planned to execute on the database. Now clearly can be considered to be a continuous random variable, where

which is at most . We can consider to be the privacy factor . Thus follows a differentially private mechanism (as can be seen from the definition above). If we try to use this concept in our diabetes example then it follows from the above derived fact that in order to have as the -differential private algorithm we need to have . Though we have used Laplace noise here, other forms of noise, such as the Gaussian Noise, can be employed, but they may require a slight relaxation of the definition of differential privacy. [9]

Randomized response

A simple example, especially developed in the social sciences, [13] is to ask a person to answer the question "Do you own the attribute A?", according to the following procedure:

  1. Toss a coin.
  2. If heads, then toss the coin again (ignoring the outcome), and answer the question honestly.
  3. If tails, then toss the coin again and answer "Yes" if heads, "No" if tails.

(The seemingly redundant extra toss in the first case is needed in situations where just the act of tossing a coin may be observed by others, even if the actual result stays hidden.) The confidentiality then arises from the refutability of the individual responses.

But, overall, these data with many responses are significant, since positive responses are given to a quarter by people who do not have the attribute A and three-quarters by people who actually possess it. Thus, if p is the true proportion of people with A, then we expect to obtain (1/4)(1-p) + (3/4)p = (1/4) + p/2 positive responses. Hence it is possible to estimate p.

In particular, if the attribute A is synonymous with illegal behavior, then answering "Yes" is not incriminating, insofar as the person has a probability of a "Yes" response, whatever it may be.

Although this example, inspired by randomized response, might be applicable to microdata (i.e., releasing datasets with each individual response), by definition differential privacy excludes microdata releases and is only applicable to queries (i.e., aggregating individual responses into one result) as this would violate the requirements, more specifically the plausible deniability that a subject participated or not. [14] [15]

Stable transformations

A transformation is -stable if the Hamming distance between and is at most -times the Hamming distance between and for any two databases .[ citation needed ] If there is a mechanism that is -differentially private, then the composite mechanism is -differentially private. [8]

This could be generalized to group privacy, as the group size could be thought of as the Hamming distance between and (where contains the group and does not). In this case is -differentially private.[ citation needed ]

Research

Early research leading to differential privacy

In 1977, Tore Dalenius formalized the mathematics of cell suppression. [16] Tore Dalenius was a Swedish statistician who contributed to statistical privacy through his 1977 paper that revealed a key point about statistical databases, which was that databases should not reveal information about an individual that is not otherwise accessible. [17] He also defined a typology for statistical disclosures. [5]

In 1979, Dorothy Denning, Peter J. Denning and Mayer D. Schwartz formalized the concept of a Tracker, an adversary that could learn the confidential contents of a statistical database by creating a series of targeted queries and remembering the results. [18] This and future research showed that privacy properties in a database could only be preserved by considering each new query in light of (possibly all) previous queries. This line of work is sometimes called query privacy, with the final result being that tracking the impact of a query on the privacy of individuals in the database was NP-hard.[ citation needed ]

21st century

In 2003, Kobbi Nissim and Irit Dinur demonstrated that it is impossible to publish arbitrary queries on a private statistical database without revealing some amount of private information, and that the entire information content of the database can be revealed by publishing the results of a surprisingly small number of random queries—far fewer than was implied by previous work. [19] The general phenomenon is known as the Fundamental Law of Information Recovery, and its key insight, namely that in the most general case, privacy cannot be protected without injecting some amount of noise, led to development of differential privacy.[ citation needed ]

In 2006, Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adam D. Smith published an article [3] formalizing the amount of noise that needed to be added and proposing a generalized mechanism for doing so.[ citation needed ] This paper also created the first formal definition of differential privacy. [5] Their work was a co-recipient of the 2016 TCC Test-of-Time Award [20] and the 2017 Gödel Prize. [21]

Since then, subsequent research has shown that there are many ways to produce very accurate statistics from the database while still ensuring high levels of privacy. [1]

Adoption in real-world applications

To date there are over 12 real-world deployments of differential privacy, the most noteworthy being:

Public purpose considerations

There are several public purpose considerations regarding differential privacy that are important to consider, especially for policymakers and policy-focused audiences interested in the social opportunities and risks of the technology: [31]

Attacks in practice

Because differential privacy techniques are implemented on real computers, they are vulnerable to various attacks not possible to compensate for solely in the mathematics of the techniques themselves. In addition to standard defects of software artifacts that can be identified using testing or fuzzing, implementations of differentially private mechanisms may suffer from the following vulnerabilities:

See also

Related Research Articles

<span class="mw-page-title-main">Noether's theorem</span> Statement relating differentiable symmetries to conserved quantities

Noether's theorem states that every continuous symmetry of the action of a physical system with conservative forces has a corresponding conservation law. This is the first of two theorems proven by mathematician Emmy Noether in 1915 and published in 1918. The action of a physical system is the integral over time of a Lagrangian function, from which the system's behavior can be determined by the principle of least action. This theorem only applies to continuous and smooth symmetries of physical space.

In the calculus of variations and classical mechanics, the Euler–Lagrange equations are a system of second-order ordinary differential equations whose solutions are stationary points of the given action functional. The equations were discovered in the 1750s by Swiss mathematician Leonhard Euler and Italian mathematician Joseph-Louis Lagrange.

Vapnik–Chervonenkis theory was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view.

<span class="mw-page-title-main">Path integral formulation</span> Formulation of quantum mechanics

The path integral formulation is a description in quantum mechanics that generalizes the stationary action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional integral, over an infinity of quantum-mechanically possible trajectories to compute a quantum amplitude.

In probability theory and related fields, Malliavin calculus is a set of mathematical techniques and ideas that extend the mathematical field of calculus of variations from deterministic functions to stochastic processes. In particular, it allows the computation of derivatives of random variables. Malliavin calculus is also called the stochastic calculus of variations. P. Malliavin first initiated the calculus on infinite dimensional space. Then, the significant contributors such as S. Kusuoka, D. Stroock, J-M. Bismut, Shinzo Watanabe, I. Shigekawa, and so on finally completed the foundations.

In mathematics, in particular in algebraic geometry and differential geometry, Dolbeault cohomology (named after Pierre Dolbeault) is an analog of de Rham cohomology for complex manifolds. Let M be a complex manifold. Then the Dolbeault cohomology groups depend on a pair of integers p and q and are realized as a subquotient of the space of complex differential forms of degree (p,q).

<span class="mw-page-title-main">Hamilton's principle</span> Formulation of the principle of stationary action

In physics, Hamilton's principle is William Rowan Hamilton's formulation of the principle of stationary action. It states that the dynamics of a physical system are determined by a variational problem for a functional based on a single function, the Lagrangian, which may contain all physical information concerning the system and the forces acting on it. The variational problem is equivalent to and allows for the derivation of the differential equations of motion of the physical system. Although formulated originally for classical mechanics, Hamilton's principle also applies to classical fields such as the electromagnetic and gravitational fields, and plays an important role in quantum mechanics, quantum field theory and criticality theories.

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed, and marks as outliers points that lie alone in low-density regions . DBSCAN is one of the most common, and most commonly cited, clustering algorithms.

In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and Joram Lindenstrauss concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. In the classical proof of the lemma, the embedding is a random orthogonal projection.

Property testing is a field of theoretical computer science, concerned with the design of super-fast algorithms for approximate decision making, where the decision refers to properties or parameters of huge objects.

Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based clusters in spatial data. It was presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and Jörg Sander. Its basic idea is similar to DBSCAN, but it addresses one of DBSCAN's major weaknesses: the problem of detecting meaningful clusters in data of varying density. To do so, the points of the database are (linearly) ordered such that spatially closest points become neighbors in the ordering. Additionally, a special distance is stored for each point that represents the density that must be accepted for a cluster so that both points belong to the same cluster. This is represented as a dendrogram.

The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry and Kunal Talwar in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies.

A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.

In computing, the count–min sketch is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions. The count–min sketch was invented in 2003 by Graham Cormode and S. Muthu Muthukrishnan and described by them in a 2005 paper.

In PAC learning, error tolerance refers to the ability of an algorithm to learn when the examples received have been corrupted in some way. In fact, this is a very common and important issue since in many applications it is not possible to access noise-free data. Noise can interfere with the learning process at different levels: the algorithm may receive data that have been occasionally mislabeled, or the inputs may have some false information, or the classification of the examples may have been maliciously adulterated.

Local differential privacy (LDP) is a model of differential privacy with the added requirement that if an adversary has access to the personal responses of an individual in the database, that adversary will still be unable to learn much of the user's personal data. This is contrasted with global differential privacy, a model of differential privacy that incorporates a central aggregator with access to the raw data.

Differentially private analysis of graphs studies algorithms for computing accurate graph statistics while preserving differential privacy. Such algorithms are used for data represented in the form of a graph where nodes correspond to individuals and edges correspond to relationships between them. For examples, edges could correspond to friendships, sexual relationships, or communication patterns. A party that collected sensitive graph data can process it using a differentially private algorithm and publish the output of the algorithm. The goal of differentially private analysis of graphs is to design algorithms that compute accurate global information about graphs while preserving privacy of individuals whose data is stored in the graph.

Adding controlled noise from predetermined distributions is a way of designing differentially private mechanisms. This technique is useful for designing private mechanisms for real-valued functions on sensitive data. Some commonly used distributions for adding noise include Laplace and Gaussian distributions.

A reconstruction attack is any method for partially reconstructing a private dataset from public aggregate information. Typically, the dataset contains sensitive information about individuals, whose privacy needs to be protected. The attacker has no or only partial access to the dataset, but has access to public aggregate statistics about the datasets, which could be exact or distorted, for example by adding noise. If the public statistics are not sufficiently distorted, the attacker is able to accurately reconstruct a large portion of the original private data. Reconstruction attacks are relevant to the analysis of private data, as they show that, in order to preserve even a very weak notion of individual privacy, any published statistics need to be sufficiently distorted. This phenomenon was called the Fundamental Law of Information Recovery by Dwork and Roth, and formulated as "overly accurate answers to too many questions will destroy privacy in a spectacular way."

References

  1. 1 2 Hilton, M; Cal (2012). "Differential Privacy: A Historical Survey". Semantic Scholar. S2CID   16861132 . Retrieved 31 December 2023.
  2. Dwork, Cynthia (2008-04-25). "Differential Privacy: A Survey of Results". In Agrawal, Manindra; Du, Dingzhu; Duan, Zhenhua; Li, Angsheng (eds.). Theory and Applications of Models of Computation. Lecture Notes in Computer Science. Vol. 4978. Springer Berlin Heidelberg. pp. 1–19. doi:10.1007/978-3-540-79228-4_1. ISBN   978-3-540-79227-7. S2CID   2887752.
  3. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Calibrating Noise to Sensitivity in Private Data Analysis by Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam Smith. In Theory of Cryptography Conference (TCC), Springer, 2006. doi : 10.1007/11681878_14. The full version appears in Journal of Privacy and Confidentiality, 7 (3), 17-51. doi : 10.29012/jpc.v7i3.405
  4. "1790 Census Records".
  5. 1 2 3 HILTON, MICHAEL. "Differential Privacy: A Historical Survey" (PDF). S2CID   16861132. Archived from the original (PDF) on 2017-03-01.{{cite journal}}: Cite journal requires |journal= (help)
  6. 1 2 3 Dwork, Cynthia (2008). Agrawal, Manindra; Du, Dingzhu; Duan, Zhenhua; Li, Angsheng (eds.). "Differential Privacy: A Survey of Results". Theory and Applications of Models of Computation. Berlin, Heidelberg: Springer: 1–19. doi:10.1007/978-3-540-79228-4_1. ISBN   978-3-540-79228-4.
  7. The Algorithmic Foundations of Differential Privacy by Cynthia Dwork and Aaron Roth. Foundations and Trends in Theoretical Computer Science. Vol. 9, no. 3–4, pp. 211‐407, Aug. 2014. doi : 10.1561/0400000042
  8. 1 2 3 Privacy integrated queries: an extensible platform for privacy-preserving data analysis by Frank D. McSherry. In Proceedings of the 35th SIGMOD International Conference on Management of Data (SIGMOD), 2009. doi : 10.1145/1559845.1559850
  9. 1 2 Differential Privacy by Cynthia Dwork, International Colloquium on Automata, Languages and Programming (ICALP) 2006, p. 1–12. doi : 10.1007/11787006_1
  10. Kairouz, Peter, Sewoong Oh, and Pramod Viswanath. "The composition theorem for differential privacy." International conference on machine learning. PMLR, 2015.link
  11. F.McSherry and K.Talwar. Mechasim Design via Differential Privacy. Proceedings of the 48th Annual Symposium of Foundations of Computer Science, 2007.
  12. Christos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, Benjamin Rubinstein. Robust and Private Bayesian Inference. Algorithmic Learning Theory 2014
  13. Warner, S. L. (March 1965). "Randomised response: a survey technique for eliminating evasive answer bias". Journal of the American Statistical Association . 60 (309). Taylor & Francis: 63–69. doi:10.1080/01621459.1965.10480775. JSTOR   2283137. PMID   12261830. S2CID   35435339.
  14. Dwork, Cynthia. "A firm foundation for private data analysis." Communications of the ACM 54.1 (2011): 86–95, supra note 19, page 91.
  15. Bambauer, Jane, Krishnamurty Muralidhar, and Rathindra Sarathy. "Fool's gold: an illustrated critique of differential privacy." Vand. J. Ent. & Tech. L. 16 (2013): 701.
  16. Tore Dalenius (1977). "Towards a methodology for statistical disclosure control". Statistik Tidskrift. 15.
  17. Dwork, Cynthia (2006). Bugliesi, Michele; Preneel, Bart; Sassone, Vladimiro; Wegener, Ingo (eds.). "Differential Privacy". Automata, Languages and Programming. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer: 1–12. doi:10.1007/11787006_1. ISBN   978-3-540-35908-1.
  18. Dorothy E. Denning; Peter J. Denning; Mayer D. Schwartz (March 1979). "The Tracker: A Threat to Statistical Database Security". ACM Transactions on Database Systems. 4 (1): 76–96. doi:10.1145/320064.320069. S2CID   207655625.
  19. Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '03). ACM, New York, NY, USA, 202–210. doi : 10.1145/773153.773173
  20. "TCC Test-of-Time Award".
  21. "2017 Gödel Prize".
  22. Ashwin Machanavajjhala, Daniel Kifer, John M. Abowd, Johannes Gehrke, and Lars Vilhuber. "Privacy: Theory meets Practice on the Map". In Proceedings of the 24th International Conference on Data Engineering, ICDE) 2008.
  23. Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova. "RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response". In Proceedings of the 21st ACM Conference on Computer and Communications Security (CCS), 2014. doi : 10.1145/2660267.2660348
  24. google/rappor, GitHub, 2021-07-15
  25. Tackling Urban Mobility with Technology by Andrew Eland. Google Policy Europe Blog, Nov 18, 2015.
  26. "Apple – Press Info – Apple Previews iOS 10, the Biggest iOS Release Ever". Apple. Retrieved 20 June 2023.
  27. Collecting telemetry data privately by Bolin Ding, Jana Kulkarni, Sergey Yekhanin. NIPS 2017.
  28. Messing, Solomon; DeGregorio, Christina; Hillenbrand, Bennett; King, Gary; Mahanti, Saurav; Mukerjee, Zagreb; Nayak, Chaya; Persily, Nate; State, Bogdan (2020), Facebook Privacy-Protected Full URLs Data Set, Zagreb Mukerjee, Harvard Dataverse, doi:10.7910/dvn/tdoapg , retrieved 2023-02-08
  29. Evans, Georgina; King, Gary (January 2023). "Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset". Political Analysis. 31 (1): 1–21. doi:10.1017/pan.2022.1. ISSN   1047-1987. S2CID   211137209.
  30. "Disclosure Avoidance for the 2020 Census: An Introduction". 2 November 2021.
  31. "Technology Factsheet: Differential Privacy". Belfer Center for Science and International Affairs. Retrieved 2021-04-12.
  32. McSherry, Frank (25 February 2018). "Uber's differential privacy .. probably isn't". GitHub .
  33. Lyu, Min; Su, Dong; Li, Ninghui (1 February 2017). "Understanding the sparse vector technique for differential privacy". Proceedings of the VLDB Endowment. 10 (6): 637–648. arXiv: 1603.01699 . doi:10.14778/3055330.3055331. S2CID   5449336.
  34. Haeberlen, Andreas; Pierce, Benjamin C.; Narayan, Arjun (2011). "Differential Privacy Under Fire". 20th USENIX Security Symposium.
  35. Mironov, Ilya (October 2012). "On significance of the least significant bits for differential privacy". Proceedings of the 2012 ACM conference on Computer and communications security (PDF). ACM. pp. 650–661. doi:10.1145/2382196.2382264. ISBN   9781450316514. S2CID   3421585.
  36. Andrysco, Marc; Kohlbrenner, David; Mowery, Keaton; Jhala, Ranjit; Lerner, Sorin; Shacham, Hovav (May 2015). "On Subnormal Floating Point and Abnormal Timing". 2015 IEEE Symposium on Security and Privacy. pp. 623–639. doi:10.1109/SP.2015.44. ISBN   978-1-4673-6949-7. S2CID   1903469.
  37. Kohlbrenner, David; Shacham, Hovav (August 2017). "On the Effectiveness of Mitigations Against Floating-point Timing Channels". Proceedings of the 26th USENIX Conference on Security Symposium. USENIX Association: 69–81.
  38. Dooley, Isaac; Kale, Laxmikant (September 2006). "Quantifying the interference caused by subnormal floating-point values" (PDF). Proceedings of the Workshop on Operating System Interference in High Performance Applications.

Further reading

Publications

Tutorials