False coverage rate

Last updated

In statistics, a false coverage rate (FCR) is the average rate of false coverage, i.e. not covering the true parameters, among the selected intervals.

Contents

The FCR gives a simultaneous coverage at a (1  α)×100% level for all of the parameters considered in the problem. The FCR has a strong connection to the false discovery rate (FDR). Both methods address the problem of multiple comparisons, FCR from confidence intervals (CIs) and FDR from P-value's point of view.

FCR was needed because of dangers caused by selective inference. Researchers and scientists tend to report or highlight only the portion of data that is considered significant without clearly indicating the various hypothesis that were considered. It is therefore necessary to understand how the data is falsely covered. There are many FCR procedures which can be used depending on the length of the CI – Bonferroni-selected–Bonferroni-adjusted, [ citation needed ] Adjusted BH-Selected CIs (Benjamini and Yekutieli 2005 [1] ). The incentive of choosing one procedure over another is to ensure that the CI is as narrow as possible and to keep the FCR. For microarray experiments and other modern applications, there are a huge number of parameters, often tens of thousands or more and it is very important to choose the most powerful procedure.

The FCR was first introduced by Daniel Yekutieli in his PhD thesis in 2001. [2]

Definitions

Not keeping the FCR means when , where is the number of true null hypotheses, is the number of rejected hypothesis, is the number of false positives, and is the significance level. Intervals with simultaneous coverage probability can control the FCR to be bounded by .

Classification of multiple hypothesis tests

The following table defines the possible outcomes when testing multiple null hypotheses. Suppose we have a number m of null hypotheses, denoted by: H1, H2, ..., Hm. Using a statistical test, we reject the null hypothesis if the test is declared significant. We do not reject the null hypothesis if the test is non-significant. Summing each type of outcome over all Hi  yields the following random variables:

Null hypothesis is true (H0)Alternative hypothesis is true (HA)Total
Test is declared significantVSR
Test is declared non-significantUT
Totalm

In m hypothesis tests of which are true null hypotheses, R is an observable random variable, and S, T, U, and V are unobservable random variables.

The problems addressed by FCR

Selection

Selection causes reduced average coverage. Selection can be presented as conditioning on an event defined by the data and may affect the coverage probability of a CI for a single parameter. Equivalently, the problem of selection changes the basic sense of P-values. FCR procedures consider that the goal of conditional coverage following any selection rule for any set of (unknown) values for the parameters is impossible to achieve. A weaker property when it comes to selective CIs is possible and will avoid false coverage statements. FCR is a measure of interval coverage following selection. Therefore, even though a 1  α CI does not offer selective (conditional) coverage, the probability of constructing a no covering CI is at most α, where

Selection and multiplicity

When facing both multiplicity (inference about multiple parameters) and selection, not only is the expected proportion of coverage over selected parameters at 1−α not equivalent to the expected proportion of no coverage at α, but also the latter can no longer be ensured by constructing marginal CIs for each selected parameter. FCR procedures solve this by taking the expected proportion of parameters not covered by their CIs among the selected parameters, where the proportion is 0 if no parameter is selected. This false coverage-statement rate (FCR) is a property of any procedure that is defined by the way in which parameters are selected and the way in which the multiple intervals are constructed.

Controlling procedures

Bonferroni procedure (Bonferroni-selected–Bonferroni-adjusted) for simultaneous CI

Simultaneous CIs with Bonferroni procedure when we have m parameters, each marginal CI constructed at the 1 − α/m level. Without selection, these CIs offer simultaneous coverage, in the sense that the probability that all CIs cover their respective parameters is at least 1 − α. unfortunately, even such a strong property does not ensure the conditional confidence property following selection.

FCR for Bonferroni-selected–Bonferroni-adjusted simultaneous CI

The Bonferroni–Bonferroni procedure cannot offer conditional coverage, however it does control the FCR at <α In fact it does so too well, in the sense that the FCR is much too close to 0 for large values of θ. Intervals selection is based on Bonferroni testing, and Bonferroni CIs are then constructed. The FCR is estimated as, the proportion of intervals failing to cover their respective parameters among the constructed CIs is calculated (setting the proportion to 0 when none are selected). Where selection is based on unadjusted individual testing and unadjusted CIs are constructed.

FCR-adjusted BH-selected CIs

In BH procedure for FDR after sorting the p values P(1) ≤ • • • P(m) and calculating R = max{ j : P( j) ≤ jq/m}, the R null hypotheses for which P(i) ≤ Rq/m are rejected. If testing is done using the Bonferroni procedure, then the lower bound of the FCR may drop well below the desired level q, implying that the intervals are too long. In contrast, applying the following procedure, which combines the general procedure with the FDR controlling testing in the BH procedure, also yields a lower bound for the FCR, q/2 ≤ FCR. This procedure is sharp in the sense that for some configurations, the FCR approaches q.

1. Sort the p values used for testing the m hypotheses regarding the parameters, P(1) ≤ • • • ≤P(m).

2. Calculate R = max{i : P(i) ≤ iq/m}.

3. Select the R parameters for which P(i) ≤ Rq/m, corresponding to the rejected hypotheses.

4. Construct a 1  Rq/m CI for each parameter selected.

See also

Related Research Articles

Biostatistics are the development and application of statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experiments and the interpretation of the results.

In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

<span class="mw-page-title-main">Confidence interval</span> Range to estimate an unknown parameter

In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated confidence level; the 95% confidence level is most common, but other levels, such as 90% or 99%, are sometimes used. The confidence level represents the long-run proportion of corresponding CIs that contain the true value of the parameter. For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter's true value.

In null-hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Reporting p-values of statistical tests is common practice in academic publications of many quantitative fields. Since the precise meaning of p-value is hard to grasp, misuse is widespread and has been a major topic in metascience.

In statistics, Duncan's new multiple range test (MRT) is a multiple comparison procedure developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple comparison procedures that use the studentized range statistic qr to compare sets of means.

In statistics, the false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the FDR, which is the expected proportion of "discoveries" that are false. Equivalently, the FDR is the expected ratio of the number of false positive classifications to the total number of positive classifications. The total number of rejections of the null include both the number of false positives (FP) and true positives (TP). Simply put, FDR = FP /. FDR-controlling procedures provide less stringent control of Type I errors compared to family-wise error rate (FWER) controlling procedures, which control the probability of at least one Type I error. Thus, FDR-controlling procedures have greater power, at the cost of increased numbers of Type I errors.

In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem. Bonferroni correction is the simplest method for counteracting this; however, it is a conservative method that gives greater risk of failure to reject a false null hypothesis than other methods, as it ignores potentially valuable information, such as the distribution of p-values across all comparisons.

<span class="mw-page-title-main">Multiple comparisons problem</span> Problem where one considers a set of inferences simultaneously based on the observed values

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values.

In statistics, the Holm–Bonferroni method, also called the Holm method or Bonferroni–Holm method, is used to counteract the problem of multiple comparisons. It is intended to control the family-wise error rate (FWER) and offers a simple test uniformly more powerful than the Bonferroni correction. It is named after Sture Holm, who codified the method, and Carlo Emilio Bonferroni.

In statistics, the closed testing procedure is a general method for performing more than one hypothesis test simultaneously.

Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or proportion of findings in the data. Frequentist-inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded.

A confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Similarly, a prediction band is used to represent the uncertainty about the value of a new data-point on the curve, but subject to noise. Confidence and prediction bands are often used as part of the graphical presentation of results of a regression analysis.

In statistics, when performing multiple comparisons, a false positive ratio is the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate is calculated as the ratio between the number of negative events wrongly categorized as positive and the total number of actual negative events.

The Newman–Keuls or Student–Newman–Keuls (SNK)method is a stepwise multiple comparisons procedure used to identify sample means that are significantly different from each other. It was named after Student (1927), D. Newman, and M. Keuls. This procedure is often used as a post-hoc test whenever a significant difference between three or more sample means has been revealed by an analysis of variance (ANOVA). The Newman–Keuls method is similar to Tukey's range test as both procedures use studentized range statistics. Unlike Tukey's range test, the Newman–Keuls method uses different critical values for different pairs of mean comparisons. Thus, the procedure is more likely to reveal significant differences between group means and to commit type I errors by incorrectly rejecting a null hypothesis when it is true. In other words, the Neuman-Keuls procedure is more powerful but less conservative than Tukey's range test.

In statistics, the Šidák correction, or Dunn–Šidák correction, is a method used to counteract the problem of multiple comparisons. It is a simple method to control the family-wise error rate. When all null hypotheses are true, the method provides familywise error control that is exact for tests that are stochastically independent, is conservative for tests that are positively dependent, and is liberal for tests that are negatively dependent. It is credited to a 1967 paper by the statistician and probabilist Zbyněk Šidák.

One of the application of Student's t-test is to test the location of one sequence of independent and identically distributed random variables. If we want to test the locations of multiple sequences of such variables, Šidák correction should be applied in order to calibrate the level of the Student's t-test. Moreover, if we want to test the locations of nearly infinitely many sequences of variables, then Šidák correction should be used, but with caution. More specifically, the validity of Šidák correction depends on how fast the number of sequences goes to infinity.

In statistical hypothesis testing, specifically multiple hypothesis testing, the q-value provides a means to control the positive false discovery rate (pFDR). Just as the p-value gives the expected false positive rate obtained by rejecting the null hypothesis for any result with an equal or smaller p-value, the q-value gives the expected pFDR obtained by rejecting the null hypothesis for any result with an equal or smaller q-value.

The harmonic mean p-value(HMP) is a statistical technique for addressing the multiple comparisons problem that controls the strong-sense family-wise error rate. It improves on the power of Bonferroni correction by performing combined tests, i.e. by testing whether groups of p-values are statistically significant, like Fisher's method. However, it avoids the restrictive assumption that the p-values are independent, unlike Fisher's method. Consequently, it controls the false positive rate when tests are dependent, at the expense of less power when tests are independent. Besides providing an alternative to approaches such as Bonferroni correction that controls the stringent family-wise error rate, it also provides an alternative to the widely-used Benjamini-Hochberg procedure (BH) for controlling the less-stringent false discovery rate. This is because the power of the HMP to detect significant groups of hypotheses is greater than the power of BH to detect significant individual hypotheses.

References

Footnotes

  1. Benjamini, Yoav; Yekutieli, Daniel (March 2005). "False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters" (pdf). Journal of the American Statistical Association. 100 (469): 71–93. doi:10.1198/016214504000001907.
  2. Theoretical Results Needed for Applying the False Discovery Rate in Statistical Problems. April, 2001 (Section 3.2, Page 51)

Other Sources