Discrepancy of hypergraphs

Last updated July 23, 2024

Discrepancy of hypergraphs is an area of discrepancy theory that studies the discrepancy of general set systems.

Definitions

In the classical setting, we aim at partitioning the vertices of a hypergraph ${\mathcal {H}}=(V,{\mathcal {E}})$ into two classes in such a way that ideally each hyperedge contains the same number of vertices in both classes. A partition into two classes can be represented by a coloring $\chi \colon V\rightarrow \{-1,+1\}$ . We call −1 and +1 colors. The color-classes $\chi ^{-1}(-1)$ and $\chi ^{-1}(+1)$ form the corresponding partition. For a hyperedge $E\in {\mathcal {E}}$ , set

\chi (E):=\sum _{v\in E}\chi (v).

The discrepancy of ${\mathcal {H}}$ with respect to $\chi$ and the discrepancy of ${\mathcal {H}}$ are defined by

\operatorname {disc} ({\mathcal {H}},\chi ):=\;\max _{E\in {\mathcal {E}}}|\chi (E)|,

\operatorname {disc} ({\mathcal {H}}):=\min _{\chi :V\rightarrow \{-1,+1\}}\operatorname {disc} ({\mathcal {H}},\chi ).

These notions as well as the term 'discrepancy' seem to have appeared for the first time in a paper of Beck.^[1] Earlier results on this problem include the famous lower bound on the discrepancy of arithmetic progressions by Roth^[2] and upper bounds for this problem and other results by Erdős and Spencer^[3]^[4] and Sárközi.^[5]^: 39 At that time, discrepancy problems were called quasi-Ramsey problems.

Examples

To get some intuition for this concept, let's have a look at a few examples.

If all edges of ${\mathcal {H}}$ intersect trivially, i.e. $E_{1}\cap E_{2}=\varnothing$ for any two distinct edges $E_{1},E_{2}\in {\mathcal {E}}$ , then the discrepancy is zero, if all edges have even cardinality, and one, if there is an odd cardinality edge.
The other extreme is marked by the complete hypergraph $(V,2^{V})$ . In this case the discrepancy is $\lceil {\frac {1}{2}}|V|\rceil$ . Any 2-coloring will have a color class of at least this size, and this set is also an edge. On the other hand, any coloring $\chi$ with color classes of size $\lceil {\frac {1}{2}}|V|\rceil$ and $\lfloor {\frac {1}{2}}|V|\rfloor$ proves that the discrepancy is not larger than $\lceil {\frac {1}{2}}|V|\rceil$ . It seems that the discrepancy reflects how chaotic the hyperedges of ${\mathcal {H}}$ intersect. Things are not that easy, however, as the following example shows.
Set $n=4k$ , $k\in {\mathcal {N}}$ and ${\mathcal {H}}_{n}=([n],\{E\subseteq [n]\mid |E\cap [2k]|=|E\setminus [2k]|\})$ . In words, ${\mathcal {H}}_{n}$ is the hypergraph on 4k vertices {1,...,4k}, whose edges are all subsets that have the same number of elements in {1,...,2k} as in {2k+1,...,4k}. Now ${\mathcal {H}}_{n}$ has many (more than ${\binom {n/2}{n/4}}^{2}=\Theta ({\frac {1}{n}}2^{n})$ ) complicatedly intersecting edges. However, its discrepancy is zero, since we can color {1,...,2k} in one color and {2k+1,...,4k} in another color.

The last example shows that we cannot expect to determine the discrepancy by looking at a single parameter like the number of hyperedges. Still, the size of the hypergraph yields first upper bounds.

General hypergraphs

1. For any hypergraph ${\mathcal {H}}$ with n vertices and m edges:

$\operatorname {disc} ({\mathcal {H}})\leq {\sqrt {2n\ln(2m)}}.$

The proof is a simple application of the probabilistic method. Let $\chi :V\rightarrow \{-1,1\}$ be a random coloring, i.e. we have

\Pr(\chi (v)=-1)=\Pr(\chi (v)=1)={\frac {1}{2}}

independently for all $v\in V$ . Since $\chi (E)=\sum _{v\in E}\chi (v)$ is a sum of independent −1, 1 random variables. So we have $\Pr(|\chi (E)|>\lambda )<2\exp(-\lambda ^{2}/(2n))$ for all $E\subseteq V$ and $\lambda \geq 0$ . Taking $\lambda ={\sqrt {2n\ln(2m)}}$ gives

\Pr(\operatorname {disc} ({\mathcal {H}},\chi )>\lambda )\leq \sum _{E\in {\mathcal {E}}}\Pr(|\chi (E)|>\lambda )<1.

Since a random coloring with positive probability has discrepancy at most $\lambda$ , in particular, there are colorings that have discrepancy at most $\lambda$ . Hence $\operatorname {disc} ({\mathcal {H}})\leq \lambda .\ \Box$

2. For any hypergraph ${\mathcal {H}}$ with n vertices and m edges such that $m\geq n$ :

$\operatorname {disc} ({\mathcal {H}})\in O({\sqrt {n}}).$

To prove this, a much more sophisticated approach using the entropy function was necessary. Of course this is particularly interesting for $m=O(n)$ . In the case $m=n$ , $\operatorname {disc} ({\mathcal {H}})\leq 6{\sqrt {n}}$ can be shown for n large enough. Therefore, this result is usually known to as 'Six Standard Deviations Suffice'. It is considered to be one of the milestones of discrepancy theory. The entropy method has seen numerous other applications, e.g. in the proof of the tight upper bound for the arithmetic progressions of Matoušek and Spencer^[6] or the upper bound in terms of the primal shatter function due to Matoušek.^[7]

Hypergraphs of bounded degree

Better discrepancy bounds can be attained when the hypergraph has a bounded degree, that is, each vertex of ${\mathcal {H}}$ is contained in at most t edges, for some small t. In particular:

Beck and Fiala^[8] proved that $\operatorname {disc} ({\mathcal {H}})<2t$ ; this is known as the Beck–Fiala theorem. They conjectured that $\operatorname {disc} ({\mathcal {H}})=O({\sqrt {t}})$ .
Bednarchak and Helm^[9] and Helm^[10] improved the Beck-Fiala bound in tiny steps to $\operatorname {disc} ({\mathcal {H}})\leq 2t-3$ (for a slightly restricted situation, i.e. $t\geq 3$ ).
Bukh^[11] improved this in 2016 to $2t-\log ^{*}t$ , where $\log ^{*}t$ denotes the iterated logarithm.
A corollary of Beck's paper^[1] – the first time the notion of discrepancy explicitly appeared – shows $\operatorname {disc} ({\mathcal {H}})\leq C{\sqrt {t\log m}}\log n$ for some constant C.
The latest improvement in this direction is due to Banaszczyk:^[12] $\operatorname {disc} ({\mathcal {H}})=O({\sqrt {t\log n}})$ .

Special hypergraphs

Better bounds on the discrepancy are possible for hypergraphs with a special structure, such as:

Discrepancy of permutations - when the vertices are the integers 1,...,n, and the hyperedges are all the intervals of some m given permutations on the integers.
Geometric discrepancy - when the vertices are points in a Euclidean space, and the hyperedges are geometric objects, such as rectangles or half-spaces.

Arithmetic progressions (Roth, Sárközy, Beck, Matoušek & Spencer)
Six Standard Deviations Suffice (Spencer)

Major open problems

Komlós Conjecture ^[13]

Applications

Numerical Integration: Monte Carlo methods in high dimensions.
Computational Geometry: Divide and conquer algorithms.
Image Processing: Halftoning

Notes

1 2 J. Beck: "Roth's estimate of the discrepancy of integer sequences is nearly sharp", page 319-325. Combinatorica, 1, 1981
↑ K. F. Roth: "Remark concerning integer sequences", pages 257–260. Acta Arithmetica 9, 1964
↑ J. Spencer: "A remark on coloring integers", pages 43–44. Canadian Mathematical Bulletin 15, 1972.
↑ P. Erdős and J. Spencer: "Imbalances in k-colorations", pages 379–385. Networks 1, 1972.
↑ P. Erdős and J. Spencer: "Probabilistic Methods in Combinatorics." Budapest: Akadémiai Kiadó, 1974.
↑ J. Matoušek and J. Spencer: "Discrepancy in arithmetic progressions", pages 195–204. Journal of the American Mathematical Society 9, 1996.
↑ J. Matoušek: "Tight upper bound for the discrepancy of half-spaces", pages 593–601. Discrepancy and Computational Geometry 13, 1995.
↑ J. Beck and T. Fiala: "Integer making theorems", pages 1–8. Discrete Applied Mathematics 3, 1981.
↑ D. Bednarchak and M. Helm: "A note on the Beck-Fiala theorem", pages 147–149. Combinatorica 17, 1997.
↑ M. Helm: "On the Beck-Fiala theorem", page 207. Discrete Mathematics 207, 1999.
↑ B. Bukh: "An Improvement of the Beck–Fiala Theorem", pp. 380-398. Combinatorics, Probability and Computing 25, 2016.
↑ Banaszczyk, W. (1998), "Balancing vectors and Gaussian measure of n-dimensional convex bodies", Random Structures & Algorithms, 12: 351–360, doi : 10.1002/(SICI)1098-2418(199807)12:4<351::AID-RSA3>3.0.CO;2-S.
↑ Bansal, Nikhil; Dadush, Daniel; Garg, Shashwat (January 2019). "An Algorithm for Komlós Conjecture Matching Banaszczyk's Bound". SIAM Journal on Computing. 48 (2): 534–553. doi:10.1137/17M1126795. ISSN 0097-5397.

Related Research Articles

In graph theory, an expander graph is a sparse graph that has strong connectivity properties, quantified using vertex, edge or spectral expansion. Expander constructions have spawned research in pure and applied mathematics, with several applications to complexity theory, design of robust computer networks, and the theory of error-correcting codes.

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints. It is named after the mathematician Joseph-Louis Lagrange.

<span class="mw-page-title-main">Hypergraph</span> Generalization of graph theory

In mathematics, a hypergraph is a generalization of a graph in which an edge can join any number of vertices. In contrast, in an ordinary graph, an edge connects exactly two vertices.

In mathematics, a Cayley graph, also known as a Cayley color graph, Cayley diagram, group diagram, or color group, is a graph that encodes the abstract structure of a group. Its definition is suggested by Cayley's theorem, and uses a specified set of generators for the group. It is a central tool in combinatorial and geometric group theory. The structure and symmetry of Cayley graphs makes them particularly good candidates for constructing expander graphs.

In statistics, the Wishart distribution is a generalization of the gamma distribution to multiple dimensions. It is named in honor of John Wishart, who first formulated the distribution in 1928. Other names include Wishart ensemble, or Wishart–Laguerre ensemble, or LOE, LUE, LSE.

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices are of the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges are of the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

In mathematics, particularly in operator theory and C*-algebra theory, the continuous functional calculus is a functional calculus which allows the application of a continuous function to normal elements of a C*-algebra.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

<span class="mw-page-title-main">Noncentral chi-squared distribution</span> Noncentral generalization of the chi-squared distribution

In probability theory and statistics, the noncentral chi-squared distribution is a noncentral generalization of the chi-squared distribution. It often arises in the power analysis of statistical tests in which the null distribution is a chi-squared distribution; important examples of such tests are the likelihood-ratio tests.

In mathematics, a complex torus is a particular kind of complex manifold M whose underlying smooth manifold is a torus in the usual sense. Here N must be the even number 2n, where n is the complex dimension of M.

In graph theory, a graph is said to be a pseudorandom graph if it obeys certain properties that random graphs obey with high probability. There is no concrete definition of graph pseudorandomness, but there are many reasonable characterizations of pseudorandomness one can consider.

The expander mixing lemma intuitively states that the edges of certain $-regular graphs are evenly distributed throughout the graph. In particular, the number of edges between two vertex subsets and is always close to the expected number of edges between them in a random -regular graph, namely .$

In mathematics, the Schur orthogonality relations, which were proven by Issai Schur through Schur's lemma, express a central fact about representations of finite groups. They admit a generalization to the case of compact groups in general, and in particular compact Lie groups, such as the rotation group SO(3).

In mathematics, a Caccioppoli set is a set whose boundary is measurable and has a finite measure. A synonym is set of (locally) finite perimeter. Basically, a set is a Caccioppoli set if its characteristic function is a function of bounded variation.

In statistics, the Chapman–Robbins bound or Hammersley–Chapman–Robbins bound is a lower bound on the variance of estimators of a deterministic parameter. It is a generalization of the Cramér–Rao bound; compared to the Cramér–Rao bound, it is both tighter and applicable to a wider range of problems. However, it is usually more difficult to compute.

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In representation theory of mathematics, the Waldspurger formula relates the special values of two L-functions of two related admissible irreducible representations. Let $k$ be the base field, $f$ be an automorphic form over $k$ , $π$ be the representation associated via the Jacquet–Langlands correspondence with f. Goro Shimura (1976) proved this formula, when $and f is a cusp form; Günter Harder made the same discovery at the same time in an unpublished paper. Marie-France Vignéras (1980) proved this formula, when and f is a newform. Jean-Loup Waldspurger, for whom the formula is named, reproved and generalized the result of Vignéras in 1985 via a totally different method which was widely used thereafter by mathematicians to prove similar formulas.$

<span class="mw-page-title-main">Locally linear graph</span> Graph where every edge is in one triangle

In graph theory, a locally linear graph is an undirected graph in which every edge belongs to exactly one triangle. Equivalently, for each vertex of the graph, its neighbors are each adjacent to exactly one other neighbor, so the neighbors can be paired up into an induced matching. Locally linear graphs have also been called locally matched graphs. Their triangles form the hyperedges of triangle-free 3-uniform linear hypergraphs and the blocks of certain partial Steiner triple systems, and the locally linear graphs are exactly the Gaifman graphs of these hypergraphs or partial Steiner systems.

References

Beck, József; Chen, William W. L. (2009). Irregularities of Distribution. Cambridge University Press. ISBN 978-0-521-09300-2.
Chazelle, Bernard (2000). The Discrepancy Method: Randomness and Complexity . Cambridge University Press. ISBN 0-521-77093-9.
Doerr, Benjamin (2005). Integral Approximation (PDF) (Habilitation thesis). University of Kiel. OCLC 255383176 . Retrieved 20 October 2019.
Matoušek, Jiří (1999). Geometric Discrepancy: An Illustrated Guide. Springer. ISBN 3-540-65528-X.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[beck_roth_estimate-1] 1 2 J. Beck: "Roth's estimate of the discrepancy of integer sequences is nearly sharp", page 319-325. Combinatorica, 1, 1981

[2] K. F. Roth: "Remark concerning integer sequences", pages 257–260. Acta Arithmetica 9, 1964

[3] J. Spencer: "A remark on coloring integers", pages 43–44. Canadian Mathematical Bulletin 15, 1972.

[4] P. Erdős and J. Spencer: "Imbalances in k-colorations", pages 379–385. Networks 1, 1972.

[5] P. Erdős and J. Spencer: "Probabilistic Methods in Combinatorics." Budapest: Akadémiai Kiadó, 1974.

[6] J. Matoušek and J. Spencer: "Discrepancy in arithmetic progressions", pages 195–204. Journal of the American Mathematical Society 9, 1996.

[7] J. Matoušek: "Tight upper bound for the discrepancy of half-spaces", pages 593–601. Discrepancy and Computational Geometry 13, 1995.

[8] J. Beck and T. Fiala: "Integer making theorems", pages 1–8. Discrete Applied Mathematics 3, 1981.

[9] D. Bednarchak and M. Helm: "A note on the Beck-Fiala theorem", pages 147–149. Combinatorica 17, 1997.

[10] M. Helm: "On the Beck-Fiala theorem", page 207. Discrete Mathematics 207, 1999.

[11] B. Bukh: "An Improvement of the Beck–Fiala Theorem", pp. 380-398. Combinatorics, Probability and Computing 25, 2016.

[12] Banaszczyk, W. (1998), "Balancing vectors and Gaussian measure of n-dimensional convex bodies", Random Structures & Algorithms, 12: 351–360, doi : 10.1002/(SICI)1098-2418(199807)12:4<351::AID-RSA3>3.0.CO;2-S.

[13] Bansal, Nikhil; Dadush, Daniel; Garg, Shashwat (January 2019). "An Algorithm for Komlós Conjecture Matching Banaszczyk's Bound". SIAM Journal on Computing. 48 (2): 534–553. doi:10.1137/17M1126795. ISSN 0097-5397.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]