In probability theory, if a large number of events are all independent of one another and each has probability less than 1, then there is a positive (possibly small) probability that none of the events will occur. The Lovász local lemma allows one to relax the independence condition slightly: As long as the events are "mostly" independent from one another and aren't individually too likely, then there will still be a positive probability that none of them occurs. It is most commonly used in the probabilistic method, in particular to give existence proofs.
There are several different versions of the lemma. The simplest and most frequently used is the symmetric version given below. A weaker version was proved in 1975 by László Lovász and Paul Erdős in the article Problems and results on 3-chromatic hypergraphs and some related questions. For other versions, see ( Alon & Spencer 2000 ). In 2020, Robin Moser and Gábor Tardos received the Gödel Prize for their algorithmic version of the Lovász Local Lemma, which uses entropy compression to provide an efficient randomized algorithm for finding an outcome in which none of the events occurs. [1] [2]
Let be a sequence of events such that each event occurs with probability at most p and such that each event is independent of all the other events except for at most d of them.
Lemma I (Lovász and Erdős 1973; published 1975) If
then there is a nonzero probability that none of the events occurs.
Lemma II (Lovász 1977; published by Joel Spencer [3] ) If
where e = 2.718... is the base of natural logarithms, then there is a nonzero probability that none of the events occurs.
Lemma II today is usually referred to as "Lovász local lemma".
Lemma III (Shearer 1985 [4] ) If
then there is a nonzero probability that none of the events occurs.
The threshold in Lemma III is optimal and it implies that the bound
is also sufficient.
A statement of the asymmetric version (which allows for events with different probability bounds) is as follows:
Lemma (asymmetric version). Let be a finite set of events in the probability space Ω. For let denote the neighbours of in the dependency graph (In the dependency graph, event is not adjacent to events which are mutually independent). If there exists an assignment of reals to the events such that
then the probability of avoiding all events in is positive, in particular
The symmetric version follows immediately from the asymmetric version by setting
to get the sufficient condition
since
Note that, as is often the case with probabilistic arguments, this theorem is nonconstructive and gives no method of determining an explicit element of the probability space in which no event occurs. However, algorithmic versions of the local lemma with stronger preconditions are also known (Beck 1991; Czumaj and Scheideler 2000). More recently, a constructive version of the local lemma was given by Robin Moser and Gábor Tardos requiring no stronger preconditions.
We prove the asymmetric version of the lemma, from which the symmetric version can be derived. By using the principle of mathematical induction we prove that for all in and all subsets of that do not include , . The induction here is applied on the size (cardinality) of the set . For base case the statement obviously holds since . We need to show that the inequality holds for any subset of of a certain cardinality given that it holds for all subsets of a lower cardinality.
Let . We have from Bayes' theorem
We bound the numerator and denominator of the above expression separately. For this, let . First, exploiting the fact that does not depend upon any event in .
Expanding the denominator by using Bayes' theorem and then using the inductive assumption, we get
The inductive assumption can be applied here since each event is conditioned on lesser number of other events, i.e. on a subset of cardinality less than . From (1) and (2), we get
Since the value of x is always in . Note that we have essentially proved . To get the desired probability, we write it in terms of conditional probabilities applying Bayes' theorem repeatedly. Hence,
which is what we had intended to prove.
Suppose 11n points are placed around a circle and colored with n different colors in such a way that each color is applied to exactly 11 points. In any such coloring, there must be a set of n points containing one point of each color but not containing any pair of adjacent points.
To see this, imagine picking a point of each color randomly, with all points equally likely (i.e., having probability 1/11) to be chosen. The 11n different events we want to avoid correspond to the 11n pairs of adjacent points on the circle. For each pair our chance of picking both points in that pair is at most 1/121 (exactly 1/121 if the two points are of different colors, otherwise 0), so we will take p = 1/121.
Whether a given pair (a, b) of points is chosen depends only on what happens in the colors of a and b, and not at all on whether any other collection of points in the other n − 2 colors are chosen. This implies the event "a and b are both chosen" is dependent only on those pairs of adjacent points which share a color either with a or with b.
There are 11 points on the circle sharing a color with a (including a itself), each of which is involved with 2 pairs. This means there are 21 pairs other than (a, b) which include the same color as a, and the same holds true for b. The worst that can happen is that these two sets are disjoint, so we can take d = 42 in the lemma. This gives
By the local lemma, there is a positive probability that none of the bad events occur, meaning that our set contains no pair of adjacent points. This implies that a set satisfying our conditions must exist.
In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.
In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.
In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.
In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of Lp spaces.
In vector calculus, Green's theorem relates a line integral around a simple closed curve C to a double integral over the plane region D bounded by C. It is the two-dimensional special case of Stokes' theorem. In one dimension, it is equivalent to the fundamental theorem of calculus. In three dimensions, it is equivalent to the divergence theorem.
In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.
In mathematics, the Hodge star operator or Hodge star is a linear map defined on the exterior algebra of a finite-dimensional oriented vector space endowed with a nondegenerate symmetric bilinear form. Applying the operator to an element of the algebra produces the Hodge dual of the element. This map was introduced by W. V. D. Hodge.
In probability theory, the Azuma–Hoeffding inequality gives a concentration result for the values of martingales that have bounded differences.
In extremal graph theory, Szemerédi’s regularity lemma states that a graph can be partitioned into a bounded number of parts so that the edges between parts are regular. The lemma shows that certain properties of random graphs can be applied to dense graphs like counting the copies of a given subgraph within graphs. Endre Szemerédi proved the lemma over bipartite graphs for his theorem on arithmetic progressions in 1975 and for general graphs in 1978. Variants of the lemma use different notions of regularity and apply to other mathematical objects like hypergraphs.
Gauss's lemma in number theory gives a condition for an integer to be a quadratic residue. Although it is not useful computationally, it has theoretical significance, being involved in some proofs of quadratic reciprocity.
In mathematics, in particular in algebraic geometry and differential geometry, Dolbeault cohomology (named after Pierre Dolbeault) is an analog of de Rham cohomology for complex manifolds. Let M be a complex manifold. Then the Dolbeault cohomology groups depend on a pair of integers p and q and are realized as a subquotient of the space of complex differential forms of degree (p,q).
In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a restaurant. Imagine a restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there, or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables. The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final distribution. This property greatly simplifies a number of problems in population genetics, linguistic analysis, and image recognition.
In computational learning theory, Rademacher complexity, named after Hans Rademacher, measures richness of a class of sets with respect to a probability distribution. The concept can also be extended to real valued functions.
In theoretical computer science, the algorithmic Lovász local lemma gives an algorithmic way of constructing objects that obey a system of constraints with limited dependence.
Uniform convergence in probability is a form of convergence in probability in statistical asymptotic theory and probability theory. It means that, under certain conditions, the empirical frequencies of all events in a certain event-family converge to their theoretical probabilities. Uniform convergence in probability has applications to statistics as well as machine learning as part of statistical learning theory.
The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry and Kunal Talwar in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies.
In the theory of quantum communication, the quantum capacity is the highest rate at which quantum information can be communicated over many independent uses of a noisy quantum channel from a sender to a receiver. It is also equal to the highest rate at which entanglement can be generated over the channel, and forward classical communication cannot improve it. The quantum capacity theorem is important for the theory of quantum error correction, and more broadly for the theory of quantum computation. The theorem giving a lower bound on the quantum capacity of any channel is colloquially known as the LSD theorem, after the authors Lloyd, Shor, and Devetak who proved it with increasing standards of rigor.
In geometric algebra, the outermorphism of a linear function between vector spaces is a natural extension of the map to arbitrary multivectors. It is the unique unital algebra homomorphism of exterior algebras whose restriction to the vector spaces is the original function.
In PAC learning, error tolerance refers to the ability of an algorithm to learn when the examples received have been corrupted in some way. In fact, this is a very common and important issue since in many applications it is not possible to access noise-free data. Noise can interfere with the learning process at different levels: the algorithm may receive data that have been occasionally mislabeled, or the inputs may have some false information, or the classification of the examples may have been maliciously adulterated.
In fractal geometry, the parabolic Hausdorff dimension is a restricted version of the genuine Hausdorff dimension. Only parabolic cylinders, i. e. rectangles with a distinct non-linear scaling between time and space are permitted as covering sets. It is usefull to determine the Hausdorff dimension of self-similar stochastic processes, such as the geometric Brownian motion or stable Lévy processes plus Borel measurable drift function .