Boschloo's test

Last updated February 13, 2024

Boschloo's test is a statistical hypothesis test for analysing 2x2 contingency tables. It examines the association of two Bernoulli distributed random variables and is a uniformly more powerful alternative to Fisher's exact test. It was proposed in 1970 by R. D. Boschloo.^[1]

Setting

A 2x2 contingency table visualizes $n$ independent observations of two binary variables $A$ and $B$ :

{\begin{array}{c|cc|c}&B=1&B=0&{\mbox{Total}}\\\hline A=1&x_{11}&x_{10}&n_{1}\\A=0&x_{01}&x_{00}&n_{0}\\\hline {\mbox{Total}}&s_{1}&s_{0}&n\\\end{array}}

The probability distribution of such tables can be classified into three distinct cases.^[2]

The row sums $n_{1},n_{0}$ and column sums $s_{1},s_{0}$ are fixed in advance and not random.
Then all $x_{ij}$ are determined by $x_{11}$ . If $A$ and $B$ are independent, $x_{11}$ follows a hypergeometric distribution with parameters $n,n_{1},s_{1}$ :
$x_{11}\sim {\mbox{Hypergeometric}}(n,n_{1},s_{1})$ .
The row sums $n_{1},n_{0}$ are fixed in advance but the column sums $s_{1},s_{0}$ are not.
Then all random parameters are determined by $x_{11}$ and $x_{01}$ and $x_{11},x_{01}$ follow a binomial distribution with probabilities $p_{1},p_{0}$ :
$x_{11}\sim B(n_{1},p_{1})$
$x_{01}\sim B(n_{0},p_{0})$
Only the total number $n$ is fixed but the row sums $n_{1},n_{0}$ and the column sums $s_{1},s_{0}$ are not.
Then the random vector $(x_{11},x_{10},x_{01},x_{00})$ follows a multinomial distribution with probability vector $(p_{11},p_{10},p_{01},p_{00})$ .

Fisher's exact test is designed for the first case and therefore an exact conditional test (because it conditions on the column sums). The typical example of such a case is the Lady tasting tea: A lady tastes 8 cups of tea with milk. In 4 of those cups the milk is poured in before the tea. In the other 4 cups the tea is poured in first. The lady tries to assign the cups to the two categories. Following our notation, the random variable $A$ represents the used method (1 = milk first, 0 = milk last) and $B$ represents the lady's guesses (1 = milk first guessed, 0 = milk last guessed). Then the row sums are the fixed numbers of cups prepared with each method: $n_{1}=4,n_{0}=4$ . The lady knows that there are 4 cups in each category, so will assign 4 cups to each method. Thus, the column sums are also fixed in advance: $s_{1}=4,s_{0}=4$ . If she is not able to tell the difference, $A$ and $B$ are independent and the number $x_{11}$ of correctly classified cups with milk first follows the hypergeometric distribution ${\mbox{Hypergeometric}}(8,4,4)$ .

Boschloo's test is designed for the second case and therefore an exact unconditional test. Examples of such a case are often found in medical research, where a binary endpoint is compared between two patient groups. Following our notation, $A=1$ represents the first group that receives some medication of interest. $A=0$ represents the second group that receives a placebo. $B$ indicates the cure of a patient (1 = cure, 0 = no cure). Then the row sums equal the group sizes and are usually fixed in advance. The column sums are the total number of cures respectively disease continuations and not fixed in advance.

An example for the third case can be constructed as follows: Simultaneously flip two distinguishable coins $A$ and $B$ and do this $n$ times. If we count the number of results in our 2x2 table (1 = head, 0 = tail), we neither know in advance how often coin $A$ shows head or tail (row sums random), nor do we know how often coin $B$ shows head or tail (column sums random).

Test hypothesis

The null hypothesis of Boschloo's one-tailed test (high values of $x_{1}$ favor the alternative hypothesis) is:

H_{0}:p_{1}\leq p_{0}

The null hypothesis of the one-tailed test can also be formulated in the other direction (small values of $x_{1}$ favor the alternative hypothesis):

H_{0}:p_{1}\geq p_{0}

The null hypothesis of the two-tailed test is:

H_{0}:p_{1}=p_{0}

There is no universal definition of the two-tailed version of Fisher's exact test.^[3] Since Boschloo's test is based on Fisher's exact test, a universal two-tailed version of Boschloo's test also doesn't exist. In the following we deal with the one-tailed test and $H_{0}:p_{1}\leq p_{0}$ .

Boschloo's idea

We denote the desired significance level by $\alpha$ . Fisher's exact test is a conditional test and appropriate for the first of the above mentioned cases. But if we treat the observed column sum $s_{1}$ as fixed in advance, Fisher's exact test can also be applied to the second case. The true size of the test then depends on the nuisance parameters $p_{1}$ and $p_{0}$ . It can be shown that the size maximum $\max \limits _{p_{1}\leq p_{0}}{\big (}{\mbox{size}}(p_{1},p_{0}){\big )}$ is taken for equal proportions $p=p_{1}=p_{0}$ ^[4] and is still controlled by $\alpha$ .^[1] However, Boschloo stated that for small sample sizes, the maximal size is often considerably smaller than $\alpha$ . This leads to an undesirable loss of power.

Boschloo proposed to use Fisher's exact test with a greater nominal level $\alpha ^{*}>\alpha$ . Here, $\alpha ^{*}$ should be chosen as large as possible such that the maximal size is still controlled by $\alpha$ : $\max \limits _{p\in [0,1]}{\big (}{\mbox{size}}(p){\big )}\leq \alpha$ . This method was especially advantageous at the time of Boschloo's publication because $\alpha ^{*}$ could be looked up for common values of $\alpha ,n_{1}$ and $n_{0}$ . This made performing Boschloo's test computationally easy.

Test statistic

The decision rule of Boschloo's approach is based on Fisher's exact test. An equivalent way of formulating the test is to use the p-value of Fisher's exact test as test statistic. Fisher's p-value is calculated from the hypergeometric distribution (for ease of notation we write $x_{1},x_{0}$ instead of $x_{11},x_{01}$ ):

p_{F}=1-F_{{\mbox{Hypergeometric}}(n,n_{1},x_{1}+x_{0})}(x_{1}-1)

The distribution of $p_{F}$ is determined by the binomial distributions of $x_{1}$ and $x_{0}$ and depends on the unknown nuisance parameter $p$ . For a specified significance level $\alpha ,$ the critical value of $p_{F}$ is the maximal value $\alpha ^{*}$ that satisfies $\max \limits _{p\in [0,1]}P(p_{F}\leq \alpha ^{*})\leq \alpha$ . The critical value $\alpha ^{*}$ is equal to the nominal level of Boschloo's original approach.

Modification

Boschloo's test deals with the unknown nuisance parameter $p$ by taking the maximum over the whole parameter space $[0,1]$ . The Berger & Boos procedure takes a different approach by maximizing $P(p_{F}\leq \alpha ^{*})$ over a $(1-\gamma )$ confidence interval of $p=p_{1}=p_{0}$ and adding $\gamma$ .^[5] $\gamma$ is usually a small value such as 0.001 or 0.0001. This results in a modified Boschloo's test which is also exact.^[6]

Comparison to other exact tests

All exact tests hold the specified significance level but can have varying power in different situations. Mehrotra et al. compared the power of some exact tests in different situations.^[6] The results regarding Boschloo's test are summarized in the following.

Modified Boschloo's test

Boschloo's test and the modified Boschloo's test have similar power in all considered scenarios. Boschloo's test has slightly more power in some cases, and vice versa in some other cases.

Fisher's exact test

Boschloo's test is by construction uniformly more powerful than Fisher's exact test. For small sample sizes (e.g. 10 per group) the power difference is large, ranging from 16 to 20 percentage points in the regarded cases. The power difference is smaller for greater sample sizes.

Exact Z-Pooled test

This test is based on the test statistic

Z_{P}(x_{1},x_{0})={\frac {{\hat {p}}_{1}-{\hat {p}}_{0}}{\sqrt {{\tilde {p}}(1-{\tilde {p}})({\frac {1}{n_{1}}}+{\frac {1}{n_{0}}})}}},

where ${\hat {p}}_{i}={\frac {x_{i}}{n_{i}}}$ are the group event rates and ${\tilde {p}}={\frac {x_{1}+x_{0}}{n_{1}+n_{0}}}$ is the pooled event rate.

The power of this test is similar to that of Boschloo's test in most scenarios. In some cases, the $Z$ -Pooled test has greater power, with differences mostly ranging from 1 to 5 percentage points. In very few cases, the difference goes up to 9 percentage points.

This test can also be modified by the Berger & Boos procedure. However, the resulting test has very similar power to the unmodified test in all scenarios.

Exact Z-Unpooled test

This test is based on the test statistic

Z_{U}(x_{1},x_{0})={\frac {{\hat {p}}_{1}-{\hat {p}}_{0}}{\sqrt {{\frac {{\hat {p}}_{1}(1-{\hat {p}}_{1})}{n_{1}}}+{\frac {{\hat {p}}_{0}(1-{\hat {p}}_{0})}{n_{0}}}}}},

where ${\hat {p}}_{i}={\frac {x_{i}}{n_{i}}}$ are the group event rates.

The power of this test is similar to that of Boschloo's test in many scenarios. In some cases, the $Z$ -Unpooled test has greater power, with differences ranging from 1 to 5 percentage points. However, in some other cases, Boschloo's test has noticeably greater power, with differences up to 68 percentage points.

This test can also be modified by the Berger & Boos procedure. The resulting test has similar power to the unmodified test in most scenarios. In some cases, the power is considerably improved by the modification but the overall power comparison to Boschloo's test remains unchanged.

Software

The calculation of Boschloo's test can be performed in following software:

The function scipy.stats.boschloo_exact from SciPy
Packages Exact and exact2x2 of the programming language R
StatXact

Related Research Articles

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

In mathematics, a series is, roughly speaking, the operation of adding infinitely many quantities, one after the other, to a given starting quantity. The study of series is a major part of calculus and its generalization, mathematical analysis. Series are used in most areas of mathematics, even for studying finite structures through generating functions. In addition to their ubiquity in mathematics, infinite series are also widely used in other quantitative disciplines such as physics, computer science, statistics and finance.

In mathematics, the classic Möbius inversion formula is a relation between pairs of arithmetic functions, each defined from the other by sums over divisors. It was introduced into number theory in 1832 by August Ferdinand Möbius.

<span class="mw-page-title-main">Taylor's theorem</span> Approximation of a function by a truncated power series

In calculus, Taylor's theorem gives an approximation of a $-times differentiable function around a given point by a polynomial of degree, called the -th-order Taylor polynomial . For a smooth function, the Taylor polynomial is the truncation at the order of the Taylor series of the function. The first-order Taylor polynomial is the linear approximation of the function, and the second-order Taylor polynomial is often referred to as the quadratic approximation . There are several versions of Taylor's theorem, some giving explicit estimates of the approximation error of the function by its Taylor polynomial.$

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial, and many other types of observable phenomena; the principle originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population. The Pareto principle or "80-20 rule" stating that 80% of outcomes are due to 20% of causes was named in honour of Pareto, but the concepts are distinct, and only Pareto distributions with shape value of log₄5 ≈ 1.16 precisely reflect it. Empirical observation has shown that this 80-20 distribution fits a wide range of cases, including natural phenomena and human activities.

The Liouville lambda function, denoted by λ(n) and named after Joseph Liouville, is an important arithmetic function. Its value is +1 if n is the product of an even number of prime numbers, and −1 if it is the product of an odd number of primes.

In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter". In particular, a statistic is sufficient for a family of probability distributions if the sample from which it is calculated gives no additional information than the statistic, as to which of those probability distributions is the sampling distribution.

<span class="mw-page-title-main">Hypergeometric distribution</span> Discrete probability distribution

In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of $successes in draws, without replacement, from a finite population of size that contains exactly objects with that feature, wherein each draw is either a success or a failure. In contrast, the binomial distribution describes the probability of successes in draws with replacement.$

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

Student's t-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are significantly different. In many cases, a Z-test will yield very similar results to a t-test since the latter converges to the former as the size of the dataset increases.

Multi-index notation is a mathematical notation that simplifies formulas used in multivariable calculus, partial differential equations and the theory of distributions, by generalising the concept of an integer index to an ordered tuple of indices.

Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, and is one of a class of exact tests, so called because the significance of the deviation from a null hypothesis can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests.

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments. In other words, a binomial proportion confidence interval is an interval estimate of a success probability p when only the number of experiments n and the number of successes n_S are known.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst $of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.$

In the mathematical theory of special functions, the Pochhammer k-symbol and the k-gamma function, introduced by Rafael Díaz and Eddy Pariguan are generalizations of the Pochhammer symbol and gamma function. They differ from the Pochhammer symbol and gamma function in that they can be related to a general arithmetic progression in the same manner as those are related to the sequence of consecutive integers.

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In mathematics, Jacobi polynomials $are a class of classical orthogonal polynomials. They are orthogonal with respect to the weight on the interval . The Gegenbauer polynomials, and thus also the Legendre, Zernike and Chebyshev polynomials, are special cases of the Jacobi polynomials.$

In mathematics, the Fox–Wright function (also known as Fox–Wright Psi function, not to be confused with Wright Omega function) is a generalisation of the generalised hypergeometric function _pF_q(z) based on ideas of Charles Fox (1928) and E. Maitland Wright (1935):

In number theory, the prime omega functions $and count the number of prime factors of a natural number Thereby counts each distinct prime factor, whereas the related function counts the total number of prime factors of honoring their multiplicity. That is, if we have a prime factorization of of the form for distinct primes, then the respective prime omega functions are given by and . These prime factor counting functions have many important number theoretic relations.$

References

1 2 Boschloo R.D. (1970). "Raised Conditional Level of Significance for the 2x2-table when Testing the Equality of Two Probabilities". Statistica Neerlandica. 24: 1–35. doi:10.1111/j.1467-9574.1970.tb00104.x.
↑ Lydersen, S., Fagerland, M.W. and Laake, P. (2009). "Recommended tests for association in 2×2 tables". Statist. Med. 28 (7): 1159–1175. doi:10.1002/sim.3531. PMID 19170020. S2CID 3900997.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Martín Andrés, A, and I. Herranz Tejedor (1995). "Is Fisher's exact test very conservative?". Computational Statistics and Data Analysis. 19 (5): 579–591. doi:10.1016/0167-9473(94)00013-9.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Finner, H, and Strassburger, K (2002). "Structural properties of UMPU-tests for 2x2 tables and some applications". Journal of Statistical Planning and Inference. 104: 103–120. doi:10.1016/S0378-3758(01)00122-7.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Berger, R L, and Boos, D D (1994). "P Values Maximized Over a Confidence Set for the Nuisance Parameter". Journal of the American Statistical Association. 89 (427): 1012–1016. doi:10.2307/2290928. JSTOR 2290928.{{cite journal}}: CS1 maint: multiple names: authors list (link)
1 2 Mehrotra, D V, Chan, I S F, and Berger, R L (2003). "A cautionary note on exact unconditional inference for a difference between two independent binomial proportions". Biometrics. 59 (2): 441–450. doi: 10.1111/1541-0420.00051 . PMID 12926729. S2CID 28556526.{{cite journal}}: CS1 maint: multiple names: authors list (link)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Boschloo-1] 1 2 Boschloo R.D. (1970). "Raised Conditional Level of Significance for the 2x2-table when Testing the Equality of Two Probabilities". Statistica Neerlandica. 24: 1–35. doi:10.1111/j.1467-9574.1970.tb00104.x.

[Lydersen-2] Lydersen, S., Fagerland, M.W. and Laake, P. (2009). "Recommended tests for association in 2×2 tables". Statist. Med. 28 (7): 1159–1175. doi:10.1002/sim.3531. PMID 19170020. S2CID 3900997.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[MartinAndres-3] Martín Andrés, A, and I. Herranz Tejedor (1995). "Is Fisher's exact test very conservative?". Computational Statistics and Data Analysis. 19 (5): 579–591. doi:10.1016/0167-9473(94)00013-9.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Finner-4] Finner, H, and Strassburger, K (2002). "Structural properties of UMPU-tests for 2x2 tables and some applications". Journal of Statistical Planning and Inference. 104: 103–120. doi:10.1016/S0378-3758(01)00122-7.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[BergerBoos-5] Berger, R L, and Boos, D D (1994). "P Values Maximized Over a Confidence Set for the Nuisance Parameter". Journal of the American Statistical Association. 89 (427): 1012–1016. doi:10.2307/2290928. JSTOR 2290928.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Mehrotra-6] 1 2 Mehrotra, D V, Chan, I S F, and Berger, R L (2003). "A cautionary note on exact unconditional inference for a difference between two independent binomial proportions". Biometrics. 59 (2): 441–450. doi: 10.1111/1541-0420.00051 . PMID 12926729. S2CID 28556526.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[1]

[2]

[3]

[4]

[5]

[6]