Condorcet's jury theorem

Last updated May 01, 2024

Condorcet's jury theorem is a political science theorem about the relative probability of a given group of individuals arriving at a correct decision. The theorem was first expressed by the Marquis de Condorcet in his 1785 work Essay on the Application of Analysis to the Probability of Majority Decisions.^[1]

The assumptions of the theorem are that a group wishes to reach a decision by majority vote. One of the two outcomes of the vote is correct, and each voter has an independent probability p of voting for the correct decision. The theorem asks how many voters we should include in the group. The result depends on whether p is greater than or less than 1/2:

If p is greater than 1/2 (each voter is more likely to vote correctly), then adding more voters increases the probability that the majority decision is correct. In the limit, the probability that the majority votes correctly approaches 1 as the number of voters increases.
On the other hand, if p is less than 1/2 (each voter is more likely to vote incorrectly), then adding more voters makes things worse: the optimal jury consists of a single voter.

Since Condorcet, many other researchers have proved various other jury theorems, relaxing some or all of Condorcet's assumptions.

Proofs

Proof 1: Calculating the probability that two additional voters change the outcome

To avoid the need for a tie-breaking rule, we assume n is odd. Essentially the same argument works for even n if ties are broken by adding a single voter.

Now suppose we start with n voters, and let m of these voters vote correctly.

Consider what happens when we add two more voters (to keep the total number odd). The majority vote changes in only two cases:

m was one vote too small to get a majority of the n votes, but both new voters voted correctly.
m was just equal to a majority of the n votes, but both new voters voted incorrectly.

The rest of the time, either the new votes cancel out, only increase the gap, or don't make enough of a difference. So we only care what happens when a single vote (among the first n) separates a correct from an incorrect majority.

Restricting our attention to this case, we can imagine that the first n-1 votes cancel out and that the deciding vote is cast by the n-th voter. In this case the probability of getting a correct majority is just p. Now suppose we send in the two extra voters. The probability that they change an incorrect majority to a correct majority is (1-p)p², while the probability that they change a correct majority to an incorrect majority is p(1-p)². The first of these probabilities is greater than the second if and only if p > 1/2, proving the theorem.

Proof 2: Calculating the probability that the decision is correct

This proof is direct; it just sums up the probabilities of the majorities. Each term of the sum multiplies the number of combinations of a majority by the probability of that majority. Each majority is counted using a combination, n items taken k at a time, where n is the jury size, and k is the size of the majority. Probabilities range from 0 (= the vote is always wrong) to 1 (= always right). Each person decides independently, so the probabilities of their decisions multiply. The probability of each correct decision is p. The probability of an incorrect decision, q, is the opposite of p, i.e. 1 − p. The power notation, i.e. $p^{x}$ is a shorthand for x multiplications of p.

Committee or jury accuracies can be easily estimated by using this approach in computer spreadsheets or programs.

As an example, let us take the simplest case of n = 3, p = 0.8. We need to show that 3 people have higher than 0.8 chance of being right. Indeed:

0.8 × 0.8 × 0.8 + 0.8 × 0.8 × 0.2 + 0.8 × 0.2 × 0.8 + 0.2 × 0.8 × 0.8 = 0.896.

Asymptotics

Asymptotics is “The Calculus of Approximations”. It is used to solve hard problems that cannot be solved exactly and to provide simpler forms of complicated results, from early results like Taylor's and Stirling's formulas to the prime number theorem. An important topic in the study of asymptotic is asymptotic distribution which is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. The probability of a correct majority decision P(n, p), when the individual probability p is close to 1/2 grows linearly in terms of p − 1/2. For n voters each one having probability p of deciding correctly and for odd n (where there are no possible ties):

P(n,p)=1/2+c_{1}(p-1/2)+c_{3}(p-1/2)^{3}+O\left((p-1/2)^{5}\right),

where

c_{1}={n \choose {\lfloor n/2\rfloor }}{\frac {\lfloor n/2\rfloor +1}{4^{\lfloor n/2\rfloor }}}={\sqrt {\frac {2n+1}{\pi }}}\left(1+{\frac {1}{16n^{2}}}+O(n^{-3})\right),

and the asymptotic approximation in terms of n is very accurate. The expansion is only in odd powers and $c_{3}<0$ . In simple terms, this says that when the decision is difficult (p close to 1/2), the gain by having n voters grows proportionally to ${\sqrt {n}}$ .^[2]

The theorem in other disciplines

The Condorcet jury theorem has recently been used to conceptualize score integration when several physician readers (radiologists, endoscopists, etc.) independently evaluate images for disease activity. This task arises in central reading performed during clinical trials and has similarities to voting. According to the authors, the application of the theorem can translate individual reader scores into a final score in a fashion that is both mathematically sound (by avoiding averaging of ordinal data), mathematically tractable for further analysis, and in a manner that is consistent with the scoring task at hand (based on decisions about the presence or absence of features, a subjective classification task)^[3]

The Condorcet jury theorem is also used in ensemble learning in the field of machine learning.^[4] An ensemble method combines the predictions of many individual classifiers by majority voting. Assuming that each of the individual classifiers predict with slightly greater than 50% accuracy and their predictions are independent, then the ensemble of their predictions will be far greater than their individual predictive scores.

Applicability to democratic processes

Many political theorists and philosophers use the Condorcet’s Jury Theorem (CJT) to defend democracy, see Brennan^[5] and references therein. Nevertheless, it is an empirical question whether the theorem holds in real life or not. Note that the CJT is a double-edged sword: it can either prove that majority rule is an (almost) perfect mechanism to aggregate information, when $p>1/2$ , or an (almost) perfect disaster, when $p<1/2$ . A disaster would mean that the wrong option is chosen systematically. Some authors have argued that we are in the latter scenario. For instance, Bryan Caplan has extensively argued that voters' knowledge is systematically biased toward (probably) wrong options. In the CJT setup, this could be interpreted as evidence for $p<1/2$ .

Recently, another approach to study the applicability of the CJT was taken.^[6] Instead of considering the homogeneous case, each voter is allowed to have a probability $p_{i}\in [0,1]$ , possibly different from other voters. This case was previously studied by Daniel Berend and Jacob Paroush^[7] and includes the classical theorem of Condorcet (when $p_{i}=p~~\forall ~i\in \mathbb {N}$ ) and other results, like the Miracle of Aggregation (when $p_{i}=1/2$ for most voters and $p_{i}=1$ for a small proportion of them). Then, following a Bayesian approach, the prior probability (in this case, a priori) of the thesis predicted by the theorem is estimated. That is, if we choose an arbitrary sequence of voters (i.e., a sequence $(p_{i})_{i\in \mathbb {N} }$ ), will the thesis of the CJT hold? The answer is no. More precisely, if a random sequence of $p_{i}$ is taken following an unbiased distribution that does not favor competence, $p_{i}>1/2$ , or incompetence, $p_{i}<1/2$ , then the thesis predicted by the theorem will not hold almost surely. With this new approach, proponents of the CJT should present strong evidence of competence, to overcome the low prior probability. That is, it is not only the case that there is evidence against competence (posterior probability), but also that we cannot expect the CJT to hold in the absence of any evidence (prior probability).

Notes

↑ Marquis de Condorcet (1785). Essai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix (PNG) (in French). Retrieved 2008-03-10.
↑ McLennan, Andrew (1998). "Consequences of the Condorcet Jury Theorem for Beneficial Information Aggregation by Rational Agents". The American Political Science Review. 92 (2): 413–418. doi:10.2307/2585673. ISSN 0003-0554.
↑ Gottlieb, Klaus; Hussain, Fez (2015-02-19). "Voting for Image Scoring and Assessment (VISA) - theory and application of a 2 + 1 reader algorithm to improve accuracy of imaging endpoints in clinical trials". BMC Medical Imaging. 15: 6. doi: 10.1186/s12880-015-0049-0 . ISSN 1471-2342. PMC 4349725 . PMID 25880066.
↑ "Random Forest". mlu-explain.github.io. Retrieved 2022-05-24.
↑ Brennan, Jason (2011). "Condorcet's Jury Theorem and the Optimum Number of Voters". Politics. 31 (2): 55–62. doi:10.1111/j.1467-9256.2011.01403.x. ISSN 0263-3957. S2CID 152938266.
↑ Romaniega Sancho, Álvaro (2022). "On the probability of the Condorcet Jury Theorem or the Miracle of Aggregation". Mathematical Social Sciences. 119: 41–55. arXiv: 2108.00733 . doi:10.1016/j.mathsocsci.2022.06.002. S2CID 249921504.
↑ Berend, Daniel; Paroush, Jacob (1998). "When is Condorcet's Jury Theorem valid?". Social Choice and Welfare. 15 (4): 481–488. doi:10.1007/s003550050118. ISSN 0176-1714. JSTOR 41106274. S2CID 120012958.

Related Research Articles

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

In social choice theory, a Condorcet paradox is a situation where majority rule behaves in a way that is self-contradictory. In such a situation, every possible choice is rejected by the electorate in favor of another, because there is always some other outcome that a majority of voters consider to be better.

In mathematics, the floor function (or greatest integer function) is the function that takes as input a real number $x$ , and gives as output the greatest integer less than or equal to $x$ , denoted $⌊ x ⌋$ or $floor(x)$ . Similarly, the ceiling function maps $x$ to the smallest integer greater than or equal to $x$ , denoted $⌈ x ⌉$ or $ceil(x)$ .

Arrow's impossibility theorem is a key impossibility theorem in social choice theory, showing that no ranked voting rule can produce a logically coherent ranking of more than two candidates. Specifically, no such rule can satisfy a key criterion of rational choice called independence of irrelevant alternatives: that a choice between $and should not depend on the quality of a third, unrelated outcome .$

Rounding or rounding off means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $23.4476 with $23.45, the fraction 312/937 with 1/3, or the expression √2 with 1.414.

In computational complexity theory, the time hierarchy theorems are important statements about time-bounded computation on Turing machines. Informally, these theorems say that given more time, a Turing machine can solve more problems. For example, there are problems that can be solved with n² time but not n time, where n is the input length.

Independence of irrelevant alternatives (IIA), also known as binary independence, the independence axiom, is an axiom of decision theory and economics describing a necessary condition for rational behavior. The axiom says that a choice between $and should not depend on the quality of a third, unrelated outcome .$

In number theory, a formula for primes is a formula generating the prime numbers, exactly and without exception. Formulas for calculating primes do exist, however, they are computationally very slow. A number of constraints are known, showing what such a "formula" can and cannot be.

In signal processing, reconstruction usually means the determination of an original continuous signal from a sequence of equally spaced samples.

Sperner's theorem, in discrete mathematics, describes the largest possible families of finite sets none of which contain any other sets in the family. It is one of the central results in extremal set theory. It is named after Emanuel Sperner, who published it in 1928.

In mathematics, a pairing function is a process to uniquely encode two natural numbers into a single natural number.

In political science and social choice theory, the median voter theorem states that if voters and candidates are distributed along a one-dimensional spectrum and voters have single peaked preferences, any voting method satisfying the Condorcet criterion will elect the candidate preferred by the median voter.

In computing, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another.

<span class="mw-page-title-main">Tournament (graph theory)</span> Directed graph where each vertex pair has one arc

A tournament is a directed graph (digraph) obtained by assigning a direction for each edge in an undirected complete graph. That is, it is an orientation of a complete graph, or equivalently a directed graph in which every pair of distinct vertices is connected by a directed edge with any one of the two possible orientations.

In mathematics, a Beatty sequence is the sequence of integers found by taking the floor of the positive multiples of a positive irrational number. Beatty sequences are named after Samuel Beatty, who wrote about them in 1926.

In mathematics, Legendre's formula gives an expression for the exponent of the largest power of a prime p that divides the factorial n!. It is named after Adrien-Marie Legendre. It is also sometimes known as de Polignac's formula, after Alphonse de Polignac.

Epistemic democracy refers to a range of views in political science and philosophy which see the value of democracy as based, at least in part, on its ability to make good or correct decisions. Epistemic democrats believe that the legitimacy or justification of democratic government should not be exclusively based on the intrinsic value of its procedures and how they embody or express values such as fairness, equality, or freedom. Instead, they claim that a political system based on political equality can be expected to make good political decisions, and possibly decisions better than any alternative form of government .

A major branch of social choice theory is devoted to the comparison of electoral systems, otherwise known as social choice functions. Viewed from the perspective of political science, electoral systems are rules for conducting elections and determining winners from the ballots cast. From the perspective of economics, mathematics, and philosophy, a social choice function is a mathematical function that determines how a society should make choices, given a collection of individual preferences.

A jury theorem is a mathematical theorem proving that, under certain assumptions, a decision attained using majority voting in a large group is more likely to be correct than a decision attained by a single expert. It serves as a formal argument for the idea of wisdom of the crowd, for decision of questions of fact by jury trial, and for democracy in general.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Marquis de Condorcet (1785). Essai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix (PNG) (in French). Retrieved 2008-03-10.

[2] McLennan, Andrew (1998). "Consequences of the Condorcet Jury Theorem for Beneficial Information Aggregation by Rational Agents". The American Political Science Review. 92 (2): 413–418. doi:10.2307/2585673. ISSN 0003-0554.

[3] Gottlieb, Klaus; Hussain, Fez (2015-02-19). "Voting for Image Scoring and Assessment (VISA) - theory and application of a 2 + 1 reader algorithm to improve accuracy of imaging endpoints in clinical trials". BMC Medical Imaging. 15: 6. doi: 10.1186/s12880-015-0049-0 . ISSN 1471-2342. PMC 4349725 . PMID 25880066.

[4] "Random Forest". mlu-explain.github.io. Retrieved 2022-05-24.

[5] Brennan, Jason (2011). "Condorcet's Jury Theorem and the Optimum Number of Voters". Politics. 31 (2): 55–62. doi:10.1111/j.1467-9256.2011.01403.x. ISSN 0263-3957. S2CID 152938266.

[6] Romaniega Sancho, Álvaro (2022). "On the probability of the Condorcet Jury Theorem or the Miracle of Aggregation". Mathematical Social Sciences. 119: 41–55. arXiv: 2108.00733 . doi:10.1016/j.mathsocsci.2022.06.002. S2CID 249921504.

[7] Berend, Daniel; Paroush, Jacob (1998). "When is Condorcet's Jury Theorem valid?". Social Choice and Welfare. 15 (4): 481–488. doi:10.1007/s003550050118. ISSN 0176-1714. JSTOR 41106274. S2CID 120012958.

[1]

[2]

[3]

[4]

[5]

[6]

[7]