This article may be too technical for most readers to understand.(November 2012) |

In statistics, **McNemar's test** is a statistical test used on paired nominal data. It is applied to 2 × 2 contingency tables with a dichotomous trait, with matched pairs of subjects, to determine whether the row and column marginal frequencies are equal (that is, whether there is "marginal homogeneity"). It is named after Quinn McNemar, who introduced it in 1947.^{ [1] } An application of the test in genetics is the transmission disequilibrium test for detecting linkage disequilibrium.^{ [2] }

- Definition
- Variations
- Examples
- Discussion
- Information in the pairings
- Related tests
- See also
- References
- External links

The commonly used parameters to assess a diagnostic test in medical sciences are sensitivity and specificity. Sensitivity (or recall) is the ability of a test to correctly identify the people with disease. Specificity (or precision) is the ability of the test to correctly identify those without the disease.

Now presume two tests are performed on the same group of patients. And also presume that these tests have identical sensitivity and specificity. In this situation one is carried away by these findings and presume that both the tests are equivalent. However this may not be the case. For this we have to study the patients with disease and patients without disease (by a reference test). We also have to find out where these two tests disagree with each other. This is precisely the basis of McNemar's test. This test compares the sensitivity and specificity of two diagnostic tests on the same group of patients.^{ [3] }

The test is applied to a 2 × 2 contingency table, which tabulates the outcomes of two tests on a sample of *N* subjects, as follows.

Test 2 positive | Test 2 negative | Row total | |

Test 1 positive | a | b | a + b |

Test 1 negative | c | d | c + d |

Column total | a + c | b + d | N |

The null hypothesis of marginal homogeneity states that the two marginal probabilities for each outcome are the same, i.e. *p*_{a} + *p*_{b} = *p*_{a} + *p*_{c} and *p*_{c} + *p*_{d} = *p*_{b} + *p*_{d}.

Thus the null and alternative hypotheses are^{ [1] }

Here *p*_{a}, etc., denote the theoretical probability of occurrences in cells with the corresponding label.

The McNemar test statistic is:

Under the null hypothesis, with a sufficiently large number of discordants (cells b and c), has a chi-squared distribution with 1 degree of freedom. If the result is significant, this provides sufficient evidence to reject the null hypothesis, in favour of the alternative hypothesis that *p _{b}* ≠

If either *b* or *c* is small (*b* + *c* < 25) then is not well-approximated by the chi-squared distribution. ^{[ citation needed ]} An exact binomial test can then be used, where *b* is compared to a binomial distribution with size parameter *n* = *b* + *c* and *p* = 0.5. Effectively, the exact binomial test evaluates the imbalance in the discordants *b* and *c*. To achieve a two-sided P-value, the P-value of the extreme tail should be multiplied by 2. For *b* ≥ *c*:

which is simply twice the binomial distribution cumulative distribution function with *p* = 0.5 and *n* = *b* + *c*.

Edwards^{ [4] } proposed the following continuity corrected version of the McNemar test to approximate the binomial exact-P-value:

The mid-P McNemar test (mid-p binomial test) is calculated by subtracting half the probability of the observed *b* from the exact one-sided P-value, then double it to obtain the two-sided mid-P-value:^{ [5] }^{ [6] }

This is equivalent to:

where the second term is the binomial distribution probability mass function and *n* = *b* + *c*. Binomial distribution functions are readily available in common software packages and the McNemar mid-P test can easily be calculated.^{ [6] }

The traditional advice has been to use the exact binomial test when *b* + *c* < 25. However, simulations have shown both the exact binomial test and the McNemar test with continuity correction to be overly conservative.^{ [6] } When *b* + *c* < 6, the exact-P-value always exceeds the common significance level 0.05. The original McNemar test was most powerful, but often slightly liberal. The mid-P version was almost as powerful as the asymptotic McNemar test and was not found to exceed the nominal significance level.

In the first example, a researcher attempts to determine if a drug has an effect on a particular disease. Counts of individuals are given in the table, with the diagnosis (disease: *present* or *absent*) before treatment given in the rows, and the diagnosis after treatment in the columns. The test requires the same subjects to be included in the before-and-after measurements (matched pairs).

After: present | After: absent | Row total | |

Before: present | 101 | 121 | 222 |

Before: absent | 59 | 33 | 92 |

Column total | 160 | 154 | 314 |

In this example, the null hypothesis of "marginal homogeneity" would mean there was no effect of the treatment. From the above data, the McNemar test statistic:

has the value 21.35, which is extremely unlikely to form the distribution implied by the null hypothesis (*P* < 0.001). Thus the test provides strong evidence to reject the null hypothesis of no treatment effect.

A second example illustrates differences between the asymptotic McNemar test and alternatives.^{ [6] } The data table is formatted as before, with different numbers in the cells:

After: present | After: absent | Row total | |

Before: present | 59 | 6 | 65 |

Before: absent | 16 | 80 | 96 |

Column total | 75 | 86 | 161 |

With these data, the sample size (161 patients) is not small, however results from the McNemar test and other versions are different. The exact binomial test gives *P* = 0.053 and McNemar's test with continuity correction gives = 3.68 and *P* = 0.055. The asymptotic McNemar's test gives = 4.55 and *P* = 0.033 and the mid-P McNemar's test gives *P* = 0.035. Both the McNemar's test and mid-P version provide stronger evidence for a statistically significant treatment effect in this second example.

An interesting observation when interpreting McNemar's test is that the elements of the main diagonal do not contribute to the decision about whether (in the above example) pre- or post-treatment condition is more favourable. Thus, the sum *b* + *c* can be small and statistical power of the tests described above can be low even though the number of pairs *a* + *b* + *c* + *d* is large (see second example above).

An extension of McNemar's test exists in situations where independence does not necessarily hold between the pairs; instead, there are clusters of paired data where the pairs in a cluster may not be independent, but independence holds between different clusters.^{ [7] } An example is analyzing the effectiveness of a dental procedure; in this case, a pair corresponds to the treatment of an individual tooth in patients who might have multiple teeth treated; the effectiveness of treatment of two teeth in the same patient is not likely to be independent, but the treatment of two teeth in different patients is more likely to be independent.^{ [8] }

In the 1970s, it was conjectured that retaining one's tonsils might protect against Hodgkin's lymphoma. John Rice wrote:^{ [9] }

85 Hodgkin's patients [...] had a sibling of the same sex who was free of the disease and whose age was within 5 years of the patient's. These investigators presented the following table:

They calculated a chi-squared statistic [...] [they] had made an error in their analysis by ignoring the pairings.[...] [their] samples were not independent, because the siblings were paired [...] we set up a table that exhibits the pairings:

It is to the second table that McNemar's test can be applied. Notice that the sum of the numbers in the second table is 85—the number of *pairs* of siblings—whereas the sum of the numbers in the first table is twice as big, 170—the number of individuals. The second table gives more information than the first. The numbers in the first table can be found by using the numbers in the second table, but not vice versa. The numbers in the first table give only the marginal totals of the numbers in the second table.

- The binomial sign test gives an exact test for the McNemar's test.
- The Cochran's Q test is an extension of the McNemar's test for more than two "treatments".
- The Liddell's exact test is an exact alternative to McNemar's test.
^{ [10] }^{ [11] } - The Stuart–Maxwell test is different generalization of the McNemar test, used for testing marginal homogeneity in a square table with more than two rows/columns.
^{ [12] }^{ [13] }^{ [14] } - The Bhapkar's test (1966) is a more powerful alternative to the Stuart–Maxwell test,
^{ [15] }^{ [16] }but it tends to be liberal. Competitive alternatives to the extant methods are available.^{ [17] } - The McNemar's test is a special case of the Cochran–Mantel–Haenszel test; it is equivalent to a CMH test with one stratum for the each of the N pairs and, in each stratum, a 2x2 table showing the paired binary responses.
^{ [18] }

In statistics, the **likelihood-ratio test** assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after imposing some constraint. If the constraint is supported by the observed data, the two likelihoods should not differ by more than sampling error. Thus the likelihood-ratio test tests whether this ratio is significantly different from one, or equivalently whether its natural logarithm is significantly different from zero.

In probability theory and statistics, the **chi-squared distribution** with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squared distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals. This distribution is sometimes called the **central chi-squared distribution**, a special case of the more general noncentral chi-squared distribution.

In analytic number theory and related branches of mathematics, a complex-valued arithmetic function is a **Dirichlet character of modulus ** if for all integers and :

**Pearson's chi-squared test** is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900. In contexts where it is important to improve a distinction between the test statistic and its distribution, names similar to *Pearson χ-squared* test or statistic are used.

A **chi-squared test** is a statistical hypothesis test that is valid to perform when the test statistic is chi-squared distributed under the null hypothesis, specifically Pearson's chi-squared test and variants thereof. Pearson's chi-squared test is used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table.

In statistics, **Yates's correction for continuity** is used in certain situations when testing for independence in a contingency table. It aims at correcting the error introduced by assuming that the discrete probabilities of frequencies in the table can be approximated by a continuous distribution (chi-squared). In some cases, Yates's correction may adjust too far, and so its current use is limited.

An ** F-test** is any statistical test in which the test statistic has an

In null hypothesis significance testing, the ** p-value** is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct. A very small

**Fisher's exact test** is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, and is one of a class of exact tests, so called because the significance of the deviation from a null hypothesis can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests.

In statistics, ** G-tests** are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.

In statistics, a **contingency table** is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business intelligence, engineering, and scientific research. They provide a basic picture of the interrelation between two variables and can help find interactions between them. The term *contingency table* was first used by Karl Pearson in "On the Theory of Contingency and Its Relation to Association and Normal Correlation", part of the *Drapers' Company Research Memoirs Biometric Series I* published in 1904.

In statistics, the **binomial test** is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories using sample data.

The **Kruskal–Wallis test** by ranks, **Kruskal–Wallis H test**, or

A **test statistic** is a statistic used in statistical hypothesis testing. A hypothesis test is typically specified in terms of a test statistic, considered as a numerical summary of a data-set that reduces the data to one value that can be used to perform the hypothesis test. In general, a test statistic is selected or defined in such a way as to quantify, within observed data, behaviours that would distinguish the null from the alternative hypothesis, where such an alternative is prescribed, or that would characterize the null hypothesis if there is no explicitly stated alternative hypothesis.

A **permutation test** is an exact test, a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under all possible rearrangements of the observed data points. Permutation test are, therefore, a form of resampling. In other words, the method by which treatments are allocated to subjects in an experimental design is mirrored in the analysis of that design. If the labels are exchangeable under the null hypothesis, then the resulting tests yield exact significance levels; see also exchangeability. Confidence intervals can then be derived from the tests. The theory has evolved from the works of Ronald Fisher and E. J. G. Pitman in the 1930s.

In statistics, an **exact (significance) test** is a test where if the null hypothesis is true, then all assumptions made during the derivation of the distribution of the test statistic are met. Using an exact test provides a significance test that keeps the type I error rate of the test at the desired significance level of the test. For example, an exact test at a significance level of , when repeating the test over many samples where the null hypothesis is true, will reject at most of the time. It is opposed to an *approximate test* in which the desired type I error rate is only approximately kept, while this approximation may be made as close to as desired by making the sample size big enough.

The **transmission disequilibrium test** (**TDT**) was proposed by Spielman, McGinnis and Ewens (1993) as a family-based association test for the presence of genetic linkage between a genetic marker and a trait. It is an application of McNemar's test.

In statistics, the **Cochran–Mantel–Haenszel test** (CMH) is a test used in the analysis of stratified or matched categorical data. It allows an investigator to test the association between a binary predictor or treatment and a binary outcome such as case or control status while taking into account the stratification. Unlike the McNemar test which can only handle pairs, the CMH test handles arbitrary strata size. It is named after William G. Cochran, Nathan Mantel and William Haenszel. Extensions of this test to a categorical response and/or to several groups are commonly called Cochran–Mantel–Haenszel statistics. It is often used in observational studies where random assignment of subjects to different treatments cannot be controlled, but confounding covariates can be measured.

In statistics **Wilks' theorem** offers an asymptotic distribution of the log-likelihood ratio statistic, which can be used to produce confidence intervals for maximum-likelihood estimates or as a test statistic for performing the likelihood-ratio test.

**Boschloo's test** is a statistical hypothesis test for analysing 2x2 contingency tables. It examines the association of two Bernoulli distributed random variables and is a uniformly more powerful alternative to Fisher's exact test. It was proposed in 1970 by R. D. Boschloo.

- 1 2 McNemar, Quinn (June 18, 1947). "Note on the sampling error of the difference between correlated proportions or percentages".
*Psychometrika*.**12**(2): 153–157. doi:10.1007/BF02295996. PMID 20254758. - ↑ Spielman RS; McGinnis RE; Ewens WJ (Mar 1993). "Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM)".
*Am J Hum Genet*.**52**(3): 506–16. PMC 1682161 . PMID 8447318. - ↑ Hawass, N E (April 1997). "Comparing the sensitivities and specificities of two diagnostic procedures performed on the same group of patients".
*The British Journal of Radiology*.**70**(832): 360–366. doi:10.1259/bjr.70.832.9166071. ISSN 0007-1285. PMID 9166071. - ↑ Edwards, A (1948). "Note on the "correction for continuity" in testing the significance of the difference between correlated proportions".
*Psychometrika*.**13**(3): 185–187. doi:10.1007/bf02289261. PMID 18885738. - ↑ Lancaster, H.O. (1961). "Significance tests in discrete distributions".
*J Am Stat Assoc*.**56**(294): 223–234. doi:10.1080/01621459.1961.10482105. - 1 2 3 4 Fagerland, M.W.; Lydersen, S.; Laake, P. (2013). "The McNemar test for binary matched-pairs data: mid-p and asymptotic are better than exact conditional".
*BMC Medical Research Methodology*.**13**: 91. doi:10.1186/1471-2288-13-91. PMC 3716987 . PMID 23848987. - ↑ Yang, Z.; Sun, X.; Hardin, J.W. (2010). "A note on the tests for clustered matched-pair binary data".
*Biometrical Journal*.**52**(5): 638–652. doi:10.1002/bimj.201000035. PMID 20976694. - ↑ Durkalski, V.L.; Palesch, Y.Y.; Lipsitz, S.R.; Rust, P.F. (2003). "Analysis of clustered matched-pair data".
*Statistics in Medicine*.**22**(15): 2417–28. doi:10.1002/sim.1438. PMID 12872299. Archived from the original on January 5, 2013. Retrieved April 1, 2009. - ↑ Rice, John (1995).
*Mathematical Statistics and Data Analysis*(Second ed.). Belmont, California: Duxbury Press. pp. 492–494. ISBN 978-0-534-20934-6. - ↑ Liddell, D. (1976). "Practical Tests of 2 × 2 Contingency Tables".
*Journal of the Royal Statistical Society*.**25**(4): 295–304. JSTOR 2988087. - ↑ "Maxwell's test, McNemar's test, Kappa test". Rimarcik.com. Retrieved 2012-11-22.
- ↑ Sun, Xuezheng; Yang, Zhao (2008). "Generalized McNemar's Test for Homogeneity of the Marginal Distributions" (PDF). SAS Global Forum.
- ↑ Stuart, Alan (1955). "A Test for Homogeneity of the Marginal Distributions in a Two-Way Classification".
*Biometrika*.**42**(3/4): 412–416. doi:10.1093/biomet/42.3-4.412. JSTOR 2333387. - ↑ Maxwell, A.E. (1970). "Comparing the Classification of Subjects by Two Independent Judges".
*The British Journal of Psychiatry*.**116**(535): 651–655. doi:10.1192/bjp.116.535.651. PMID 5452368. - ↑ "McNemar Tests of Marginal Homogeneity". John-uebersax.com. 2006-08-30. Retrieved 2012-11-22.
- ↑ Bhapkar, V.P. (1966). "A Note on the Equivalence of Two Test Criteria for Hypotheses in Categorical Data".
*Journal of the American Statistical Association*.**61**(313): 228–235. doi:10.1080/01621459.1966.10502021. JSTOR 2283057. - ↑ Yang, Z.; Sun, X.; Hardin, J.W. (2012). "Testing Marginal Homogeneity in Matched-Pair Polytomous Data".
*Therapeutic Innovation & Regulatory Science*.**46**(4): 434–438. doi:10.1177/0092861512442021. - ↑ Agresti, Alan (2002).
*Categorical Data Analysis*(PDF). Hooken, New Jersey: John Wiley & Sons, Inc. p. 413. ISBN 978-0-471-36093-3.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.