The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry [1] and Kunal Talwar [2] in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies. [3]
Most of the initial research in the field of differential privacy revolved around real-valued functions which have relatively low sensitivity to change in the data of a single individual and whose usefulness is not hampered by small additive perturbations. A natural question is what happens in the situation when one wants to preserve more general sets of properties. The exponential mechanism helps to extend the notion of differential privacy to address these issues. Moreover, it describes a class of mechanisms that includes all possible differentially private mechanisms.
In very generic terms, a privacy mechanism maps a set of inputs from domain to a range . The map may be randomized, in which case each element of the domain corresponds to a probability distribution over the range . The privacy mechanism makes no assumption about the nature of and apart from a base measure on . Let us define a function . Intuitively this function assigns a score to the pair , where and . The score reflects the appeal of the pair , i.e. the higher the score, the more appealing the pair is. Given the input , the mechanism's objective is to return an such that the function is approximately maximized. To achieve this, set up the mechanism as follows:
Definition: For any function , and a base measure over , define:
This definition implies the fact that the probability of returning an increases exponentially with the increase in the value of . Ignoring the base measure then the value which maximizes has the highest probability. Moreover, this mechanism is differentially private. Proof of this claim will follow. One technicality that should be kept in mind is that in order to properly define the should be finite.
Theorem (differential privacy): gives -differential privacy.
Proof: The probability density of at equals
Now, if a single change in changes by at most then the numerator can change at most by a factor of and the denominator minimum by a factor of . Thus, the ratio of the new probability density (i.e. with new ) and the earlier one is at most .
We would ideally want the random draws of from the mechanism to nearly maximize . If we consider to be then we can show that the probability of the mechanism deviating from is low, as long as there is a sufficient mass (in terms of ) of values with value close to the optimum.
Lemma: Let and , we have is at most . The probability is taken over .
Proof: The probability is at most , as the denominator can be at most one. Since both the probabilities have the same normalizing term so,
The value of is at most one, and so this bound implies the lemma statement.
Theorem (Accuracy): For those values of , we have .
Proof: It follows from the previous lemma that the probability of the score being at least is . By hypothesis, . Substituting the value of we get this probability to be at least . Multiplying with yields the desired bound.
We can assume for to be less than or equal to one in all the computations, because we can always normalize with .
Before we get into the details of the example let us define some terms which we will be using extensively throughout our discussion.
Definition (global sensitivity): The global sensitivity of a query is its maximum difference when evaluated on two neighbouring datasets :
Definition: A predicate query for any predicate is defined to be
Note that for any predicate .
The following is due to Avrim Blum, Katrina Ligett and Aaron Roth.
Definition (Usefulness): A mechanism [ permanent dead link ] is -useful for queries in class with probability , if and every dataset , for , .
Informally, it means that with high probability the query will behave in a similar way on the original dataset and on the synthetic dataset .
Consider a common problem in Data Mining. Assume there is a database with entries. Each entry consist of -tuples of the form where . Now, a user wants to learn a linear halfspace of the form . In essence the user wants to figure out the values of such that maximum number of tuples in the database satisfy the inequality. The algorithm we describe below can generate a synthetic database which will allow the user to learn (approximately) the same linear half-space while querying on this synthetic database. The motivation for such an algorithm being that the new database will be generated in a differentially private manner and thus assure privacy to the individual records in the database .
In this section we show that it is possible to release a dataset which is useful for concepts from a polynomial VC-Dimension class and at the same time adhere to -differential privacy as long as the size of the original dataset is at least polynomial on the VC-Dimension of the concept class. To state formally:
Theorem: For any class of functions and any dataset such that
we can output an -useful dataset that preserves -differential privacy. As we had mentioned earlier the algorithm need not be efficient.
One interesting fact is that the algorithm which we are going to develop generates a synthetic dataset whose size is independent of the original dataset; in fact, it only depends on the VC-dimension of the concept class and the parameter . The algorithm outputs a dataset of size
We borrow the Uniform Convergence Theorem from combinatorics and state a corollary of it which aligns to our need.
Lemma: Given any dataset there exists a dataset of size such that .
Proof:
We know from the uniform convergence theorem that
where probability is over the distribution of the dataset. Thus, if the RHS is less than one then we know for sure that the data set exists. To bound the RHS to less than one we need , where is some positive constant. Since we stated earlier that we will output a dataset of size , so using this bound on we get . Hence the lemma.
Now we invoke the exponential mechanism.
Definition: For any function and input dataset , the exponential mechanism outputs each dataset with probability proportional to .
From the exponential mechanism we know this preserves -differential privacy. Let's get back to the proof of the Theorem.
We define .
To show that the mechanism satisfies the -usefulness, we should show that it outputs some dataset with with probability . There are at most output datasets and the probability that is at most proportional to . Thus by union bound, the probability of outputting any such dataset is at most proportional to . Again, we know that there exists some dataset for which . Therefore, such a dataset is output with probability at least proportional to .
Let the event that the exponential mechanism outputs some dataset such that .
the event that the exponential mechanism outputs some dataset such that .
Now setting this quantity to be at least , we find that it suffices to have
And hence we prove the theorem.
In the above example of the usage of exponential mechanism, one can output a synthetic dataset in a differentially private manner and can use the dataset to answer queries with good accuracy. Other private mechanisms, such as posterior sampling, [6] which returns parameters rather than datasets, can be made equivalent to the exponential one. [7]
Apart from the setting of privacy, the exponential mechanism has also been studied in the context of auction theory and classification algorithms. [8] In the case of auctions the exponential mechanism helps to achieve a truthful auction setting.
In physics, the cross section is a measure of the probability that a specific process will take place when some kind of radiant excitation intersects a localized phenomenon. For example, the Rutherford cross-section is a measure of probability that an alpha particle will be deflected by a given angle during a collision with an atomic nucleus. Cross section is typically denoted σ (sigma) and is expressed in units of area, more specifically in barns. In a way, it can be thought of as the size of the object that the excitation must hit in order for the process to occur, but more exactly, it is a parameter of a stochastic process.
The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto,, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena. Originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log45 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.
Noether's theorem or Noether's first theorem states that every differentiable symmetry of the action of a physical system with conservative forces has a corresponding conservation law. The theorem was proven by mathematician Emmy Noether in 1915 and published in 1918, after a special case was proven by E. Cosserat and F. Cosserat in 1909. The action of a physical system is the integral over time of a Lagrangian function, from which the system's behavior can be determined by the principle of least action. This theorem only applies to continuous and smooth symmetries over physical space.
Fermi-Dirac statistics is a type of quantum statistics that applies to the physics of a system consisting of many identical particles that obey the Pauli exclusion principle. A result is the Fermi–Dirac distribution of particles over energy states. It is named after Enrico Fermi and Paul Dirac, each of whom derived the distribution independently in 1926. Fermi–Dirac statistics is a part of the field of statistical mechanics and uses the principles of quantum mechanics.
In quantum statistics, Bose–Einstein (B–E) statistics describes one of two possible ways in which a collection of non-interacting, indistinguishable particles may occupy a set of available discrete energy states at thermodynamic equilibrium. The aggregation of particles in the same state, which is a characteristic of particles obeying Bose–Einstein statistics, accounts for the cohesive streaming of laser light and the frictionless creeping of superfluid helium. The theory of this behaviour was developed (1924–25) by Satyendra Nath Bose, who recognized that a collection of identical and indistinguishable particles can be distributed in this way. The idea was later adopted and extended by Albert Einstein in collaboration with Bose.
The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional integral, over an infinity of quantum-mechanically possible trajectories to compute a quantum amplitude.
In quantum mechanics and quantum field theory, the propagator is a function that specifies the probability amplitude for a particle to travel from one place to another in a given period of time, or to travel with a certain energy and momentum. In Feynman diagrams, which serve to calculate the rate of collisions in quantum field theory, virtual particles contribute their propagator to the rate of the scattering event described by the respective diagram. These may also be viewed as the inverse of the wave operator appropriate to the particle, and are, therefore, often called (causal) Green's functions.
In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is due to Herman Rubin. It is a sharper bound than the known first- or second-moment-based tail bounds such as Markov's inequality or Chebyshev's inequality, which only yield power-law bounds on tail decay. However, the Chernoff bound requires that the variates be independent – a condition that neither Markov's inequality nor Chebyshev's inequality require, although Chebyshev's inequality does require the variates to be pairwise independent.
In general relativity, the Gibbons–Hawking–York boundary term is a term that needs to be added to the Einstein–Hilbert action when the underlying spacetime manifold has a boundary.
In mathematics, in particular in algebraic geometry and differential geometry, Dolbeault cohomology is an analog of de Rham cohomology for complex manifolds. Let M be a complex manifold. Then the Dolbeault cohomology groups depend on a pair of integers p and q and are realized as a subquotient of the space of complex differential forms of degree (p,q).
The Newman–Penrose (NP) formalism is a set of notation developed by Ezra T. Newman and Roger Penrose for general relativity (GR). Their notation is an effort to treat general relativity in terms of spinor notation, which introduces complex forms of the usual variables used in GR. The NP formalism is itself a special case of the tetrad formalism, where the tensors of the theory are projected onto a complete vector basis at each point in spacetime. Usually this vector basis is chosen to reflect some symmetry of the spacetime, leading to simplified expressions for physical observables. In the case of the NP formalism, the vector basis chosen is a null tetrad: a set of four null vectors—two real, and a complex-conjugate pair. The two real members asymptotically point radially inward and radially outward, and the formalism is well adapted to treatment of the propagation of radiation in curved spacetime. The Weyl scalars, derived from the Weyl tensor, are often used. In particular, it can be shown that one of these scalars— in the appropriate frame—encodes the outgoing gravitational radiation of an asymptotically flat system.
In mathematics, the Lévy–Prokhorov metric is a metric on the collection of probability measures on a given metric space. It is named after the French mathematician Paul Lévy and the Soviet mathematician Yuri Vasilyevich Prokhorov; Prokhorov introduced it in 1956 as a generalization of the earlier Lévy metric.
In mathematics, uniform integrability is an important concept in real analysis, functional analysis and measure theory, and plays a vital role in the theory of martingales. The definition used in measure theory is closely related to, but not identical to, the definition typically used in probability.
Uniform convergence in probability is a form of convergence in probability in statistical asymptotic theory and probability theory. It means that, under certain conditions, the empirical frequencies of all events in a certain event-family converge to their theoretical probabilities. Uniform convergence in probability has applications to statistics as well as machine learning as part of statistical learning theory.
Coherent states have been introduced in a physical context, first as quasi-classical states in quantum mechanics, then as the backbone of quantum optics and they are described in that spirit in the article Coherent states. However, they have generated a huge variety of generalizations, which have led to a tremendous amount of literature in mathematical physics. In this article, we sketch the main directions of research on this line. For further details, we refer to several existing surveys.
Stochastic portfolio theory (SPT) is a mathematical theory for analyzing stock market structure and portfolio behavior introduced by E. Robert Fernholz in 2002. It is descriptive as opposed to normative, and is consistent with the observed behavior of actual markets. Normative assumptions, which serve as a basis for earlier theories like modern portfolio theory (MPT) and the capital asset pricing model (CAPM), are absent from SPT.
Generalized relative entropy is a measure of dissimilarity between two quantum states. It is a "one-shot" analogue of quantum relative entropy and shares many properties of the latter quantity.
In theoretical physics, the dual graviton is a hypothetical elementary particle that is a dual of the graviton under electric-magnetic duality, as an S-duality, predicted by some formulations of supergravity in eleven dimensions.
Adding controlled noise from predetermined distributions is a way of designing differentially private mechanisms. This technique is useful for designing private mechanisms for real-valued functions on sensitive data. Some commonly used distributions for adding noise include Laplace and Gaussian distributions.