False discovery rate

Last updated

In statistics, the false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the FDR, which is the expected proportion of "discoveries" (rejected null hypotheses) that are false (incorrect rejections of the null). [1] Equivalently, the FDR is the expected ratio of the number of false positive classifications (false discoveries) to the total number of positive classifications (rejections of the null). The total number of rejections of the null include both the number of false positives (FP) and true positives (TP). Simply put, FDR = FP / (FP + TP). FDR-controlling procedures provide less stringent control of Type I errors compared to family-wise error rate (FWER) controlling procedures (such as the Bonferroni correction), which control the probability of at least one Type I error. Thus, FDR-controlling procedures have greater power, at the cost of increased numbers of Type I errors. [2]

Contents

History

Technological motivations

The modern widespread use of the FDR is believed to stem from, and be motivated by, the development in technologies that allowed the collection and analysis of a large number of distinct variables in several individuals (e.g., the expression level of each of 10,000 different genes in 100 different persons). [3] By the late 1980s and 1990s, the development of "high-throughput" sciences, such as genomics, allowed for rapid data acquisition. This, coupled with the growth in computing power, made it possible to seamlessly perform a very high number of statistical tests on a given data set. The technology of microarrays was a prototypical example, as it enabled thousands of genes to be tested simultaneously for differential expression between two biological conditions. [4]

As high-throughput technologies became common, technological and/or financial constraints led researchers to collect datasets with relatively small sample sizes (e.g. few individuals being tested) and large numbers of variables being measured per sample (e.g. thousands of gene expression levels). In these datasets, too few of the measured variables showed statistical significance after classic correction for multiple tests with standard multiple comparison procedures. This created a need within many scientific communities to abandon FWER and unadjusted multiple hypothesis testing for other ways to highlight and rank in publications those variables showing marked effects across individuals or treatments that would otherwise be dismissed as non-significant after standard correction for multiple tests. In response to this, a variety of error rates have been proposed—and become commonly used in publications—that are less conservative than FWER in flagging possibly noteworthy observations. The FDR is useful when researchers are looking for "discoveries" that will give them followup work (E.g.: detecting promising genes for followup studies), and are interested in controlling the proportion of "false leads" they are willing to accept.

Literature

The FDR concept was formally described by Yoav Benjamini and Yosef Hochberg in 1995 [1] (BH procedure) as a less conservative and arguably more appropriate approach for identifying the important few from the trivial many effects tested. The FDR has been particularly influential, as it was the first alternative to the FWER to gain broad acceptance in many scientific fields (especially in the life sciences, from genetics to biochemistry, oncology and plant sciences). [3] In 2005, the Benjamini and Hochberg paper from 1995 was identified as one of the 25 most-cited statistical papers. [5]

Prior to the 1995 introduction of the FDR concept, various precursor ideas had been considered in the statistics literature. In 1979, Holm proposed the Holm procedure, [6] a stepwise algorithm for controlling the FWER that is at least as powerful as the well-known Bonferroni adjustment. This stepwise algorithm sorts the p-values and sequentially rejects the hypotheses starting from the smallest p-values.

Benjamini (2010) said that the false discovery rate, [3] and the paper Benjamini and Hochberg (1995), had its origins in two papers concerned with multiple testing:

The BH procedure was proven to control the FDR for independent tests in 1995 by Benjamini and Hochberg. [1] In 1986, R. J. Simes offered the same procedure as the "Simes procedure", in order to control the FWER in the weak sense (under the intersection null hypothesis) when the statistics are independent. [10]

Definitions

Based on definitions below we can define Q as the proportion of false discoveries among the discoveries (rejections of the null hypothesis):

.

where is the number of false discoveries and is the number of true discoveries.

The false discovery rate (FDR) is then simply: [1]

where is the expected value of . The goal is to keep FDR below a given threshold q. To avoid division by zero, is defined to be 0 when . Formally, . [1]

Classification of multiple hypothesis tests

The following table defines the possible outcomes when testing multiple null hypotheses. Suppose we have a number m of null hypotheses, denoted by: H1, H2, ..., Hm. Using a statistical test, we reject the null hypothesis if the test is declared significant. We do not reject the null hypothesis if the test is non-significant. Summing each type of outcome over all Hi  yields the following random variables:

Null hypothesis is true (H0)Alternative hypothesis is true (HA)Total
Test is declared significantVSR
Test is declared non-significantUT
Totalm

In m hypothesis tests of which are true null hypotheses, R is an observable random variable, and S, T, U, and V are unobservable random variables.

Controlling procedures

The settings for many procedures is such that we have null hypotheses tested and their corresponding p-values. We list these p-values in ascending order and denote them by . A procedure that goes from a small test-statistic to a large one will be called a step-up procedure. In a similar way, in a "step-down" procedure we move from a large corresponding test statistic to a smaller one.

Benjamini–Hochberg procedure

The Benjamini-Hochberg procedure applied to a set of m = 20 ascendingly ordered pvalues, with a false discovery control level a = 0.05. The p-values of the rejected null hypothesis (i.e. declared discoveries) are colored in red. Note that there are rejected p-values which are above the rejection line (in blue) since all null hypothesis of p-values which are ranked before the pvalue of the last intersection are rejected. The approximations MFDR = 0.02625 and AFDR = 0.00730, here. Benjamini-Hochberg-correction.png
The Benjamini-Hochberg procedure applied to a set of m = 20 ascendingly ordered pvalues, with a false discovery control level α = 0.05. The p-values of the rejected null hypothesis (i.e. declared discoveries) are colored in red. Note that there are rejected p-values which are above the rejection line (in blue) since all null hypothesis of p-values which are ranked before the pvalue of the last intersection are rejected. The approximations MFDR = 0.02625 and AFDR = 0.00730, here.

The Benjamini–Hochberg procedure (BH step-up procedure) controls the FDR at level . [1] It works as follows:

  1. For a given , find the largest k such that
  2. Reject the null hypothesis (i.e., declare discoveries) for all for

Geometrically, this corresponds to plotting vs. k (on the y and x axes respectively), drawing the line through the origin with slope , and declaring discoveries for all points on the left, up to, and including the last point that is not above the line.

The BH procedure is valid when the m tests are independent, and also in various scenarios of dependence, but is not universally valid. [11] It also satisfies the inequality:

If an estimator of is inserted into the BH procedure, it is no longer guaranteed to achieve FDR control at the desired level. [3] Adjustments may be needed in the estimator and several modifications have been proposed. [12] [13] [14] [15]

Note that the mean for these m tests is , the Mean(FDR ) or MFDR, adjusted for m independent or positively correlated tests (see AFDR below). The MFDR expression here is for a single recomputed value of and is not part of the Benjamini and Hochberg method.

Benjamini–Yekutieli procedure

The Benjamini–Yekutieli procedure controls the false discovery rate under arbitrary dependence assumptions. [11] This refinement modifies the threshold and finds the largest k such that:

Note that can be approximated by using the Taylor series expansion and the Euler–Mascheroni constant ():

Using MFDR and formulas above, an adjusted MFDR (or AFDR) is the minimum of the mean for m dependent tests, i.e., . Another way to address dependence is by bootstrapping and rerandomization. [4] [16] [17]

Storey-Tibshirani procedure

Schematic representation of the Storey-Tibshirani procedure for correcting for multiple hypothesis testing, assuming correctly calculated p-values. y-axis is frequency. Storey-Tibshirani procedure.png
Schematic representation of the Storey-Tibshirani procedure for correcting for multiple hypothesis testing, assuming correctly calculated p-values. y-axis is frequency.

In the Storey-Tibshirani procedure, q-values are used for controlling the FDR.

Properties

Adaptive and scalable

Using a multiplicity procedure that controls the FDR criterion is adaptive and scalable. Meaning that controlling the FDR can be very permissive (if the data justify it), or conservative (acting close to control of FWER for sparse problem) - all depending on the number of hypotheses tested and the level of significance. [3]

The FDR criterion adapts so that the same number of false discoveries (V) will have different implications, depending on the total number of discoveries (R). This contrasts with the family-wise error rate criterion. For example, if inspecting 100 hypotheses (say, 100 genetic mutations or SNPs for association with some phenotype in some population):

The FDR criterion is scalable in that the same proportion of false discoveries out of the total number of discoveries (Q), remains sensible for different number of total discoveries (R). For example:

Dependency among the test statistics

Controlling the FDR using the linear step-up BH procedure, at level q, has several properties related to the dependency structure between the test statistics of the m null hypotheses that are being corrected for. If the test statistics are:

Proportion of true hypotheses

If all of the null hypotheses are true (), then controlling the FDR at level q guarantees control over the FWER (this is also called "weak control of the FWER"): , simply because the event of rejecting at least one true null hypothesis is exactly the event , and the event is exactly the event (when , by definition). [1] But if there are some true discoveries to be made () then FWER FDR. In that case there will be room for improving detection power. It also means that any procedure that controls the FWER will also control the FDR.

Average power

The average power of the Benjamini-Hochberg procedure can be computed analytically [18]

The discovery of the FDR was preceded and followed by many other types of error rates. These include:

False coverage rate

The false coverage rate (FCR) is, in a sense, the FDR analog to the confidence interval. FCR indicates the average rate of false coverage, namely, not covering the true parameters, among the selected intervals. The FCR gives a simultaneous coverage at a level for all of the parameters considered in the problem. Intervals with simultaneous coverage probability 1−q can control the FCR to be bounded by q. There are many FCR procedures such as: Bonferroni-Selected–Bonferroni-Adjusted,[ citation needed ] Adjusted BH-Selected CIs (Benjamini and Yekutieli (2005)), [24] Bayes FCR (Yekutieli (2008)),[ citation needed ] and other Bayes methods. [25]

Bayesian approaches

Connections have been made between the FDR and Bayesian approaches (including empirical Bayes methods), [21] [26] [27] thresholding wavelets coefficients and model selection, [28] [29] [30] [31] and generalizing the confidence interval into the false coverage statement rate (FCR). [24]

See also

Related Research Articles

In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

In null-hypothesis significance testing, the -value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Even though reporting p-values of statistical tests is common practice in academic publications of many quantitative fields, misinterpretation and misuse of p-values is widespread and has been a major topic in mathematics and metascience. In 2016, the American Statistical Association (ASA) made a formal statement that "p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone" and that "a p-value, or statistical significance, does not measure the size of an effect or the importance of a result" or "evidence regarding a model or hypothesis." That said, a 2019 task force by ASA has issued a statement on statistical significance and replicability, concluding with: "p-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data."

In statistics, the Neyman–Pearson lemma describes the existence and uniqueness of the likelihood ratio as a uniformly most powerful test in certain contexts. It was introduced by Jerzy Neyman and Egon Pearson in a paper in 1933. The Neyman–Pearson lemma is part of the Neyman–Pearson theory of statistical testing, which introduced concepts like errors of the second kind, power function, and inductive behavior. The previous Fisherian theory of significance testing postulated only one hypothesis. By introducing a competing hypothesis, the Neyman–Pearsonian flavor of statistical testing allows investigating the two types of errors. The trivial cases where one always rejects or accepts the null hypothesis are of little interest but it does prove that one must not relinquish control over one type of error while calibrating the other. Neyman and Pearson accordingly proceeded to restrict their attention to the class of all level tests while subsequently minimizing type II error, traditionally denoted by . Their seminal paper of 1933, including the Neyman–Pearson lemma, comes at the end of this endeavor, not only showing the existence of tests with the most power that retain a prespecified level of type I error, but also providing a way to construct such tests. The Karlin-Rubin theorem extends the Neyman–Pearson lemma to settings involving composite hypotheses with monotone likelihood ratios.

The Wilcoxon signed-rank test is a non-parametric rank test for statistical hypothesis testing used either to test the location of a population based on a sample of data, or to compare the locations of two populations using two matched samples. The one-sample version serves a purpose similar to that of the one-sample Student's t-test. For two matched samples, it is a paired difference test like the paired Student's t-test. The Wilcoxon test can be a good alternative to the t-test when population means are not of interest; for example, when one wishes to test whether a population's median is nonzero, or whether there is a better than 50% chance that a sample from one population is greater than a sample from another population.

In statistics, Duncan's new multiple range test (MRT) is a multiple comparison procedure developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple comparison procedures that use the studentized range statistic qr to compare sets of means.

In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors when performing multiple hypotheses tests.

In statistical hypothesis testing, a type I error, or a false positive, is the rejection of the null hypothesis when it is actually true. For example, an innocent person may be convicted. A type II error, or a false negative, is the failure to reject a null hypothesis that is actually false. For example: a guilty person may be not convicted.

<span class="mw-page-title-main">Fisher's method</span> Statistical method

In statistics, Fisher's method, also known as Fisher's combined probability test, is a technique for data fusion or "meta-analysis" (analysis of analyses). It was developed by and named for Ronald Fisher. In its basic form, it is used to combine the results from several independence tests bearing upon the same overall hypothesis (H0).

In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem.

<span class="mw-page-title-main">Multiple comparisons problem</span> Statistical interpretation with many tests

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or estimates a subset of parameters selected based on the observed values.

In statistics, the Holm–Bonferroni method, also called the Holm method or Bonferroni–Holm method, is used to counteract the problem of multiple comparisons. It is intended to control the family-wise error rate (FWER) and offers a simple test uniformly more powerful than the Bonferroni correction. It is named after Sture Holm, who codified the method, and Carlo Emilio Bonferroni.

In statistics, when performing multiple comparisons, a false positive ratio is the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate is calculated as the ratio between the number of negative events wrongly categorized as positive and the total number of actual negative events.

The Newman–Keuls or Student–Newman–Keuls (SNK)method is a stepwise multiple comparisons procedure used to identify sample means that are significantly different from each other. It was named after Student (1927), D. Newman, and M. Keuls. This procedure is often used as a post-hoc test whenever a significant difference between three or more sample means has been revealed by an analysis of variance (ANOVA). The Newman–Keuls method is similar to Tukey's range test as both procedures use studentized range statistics. Unlike Tukey's range test, the Newman–Keuls method uses different critical values for different pairs of mean comparisons. Thus, the procedure is more likely to reveal significant differences between group means and to commit type I errors by incorrectly rejecting a null hypothesis when it is true. In other words, the Neuman-Keuls procedure is more powerful but less conservative than Tukey's range test.

In statistics, a false coverage rate (FCR) is the average rate of false coverage, i.e. not covering the true parameters, among the selected intervals.

In statistics, the Šidák correction, or Dunn–Šidák correction, is a method used to counteract the problem of multiple comparisons. It is a simple method to control the family-wise error rate. When all null hypotheses are true, the method provides familywise error control that is exact for tests that are stochastically independent, conservative for tests that are positively dependent, and liberal for tests that are negatively dependent. It is credited to a 1967 paper by the statistician and probabilist Zbyněk Šidák. The Šidák method can be used to determine the statistical significance, and evaluate adjusted P value and confidence intervals.

One of the application of Student's t-test is to test the location of one sequence of independent and identically distributed random variables. If we want to test the locations of multiple sequences of such variables, Šidák correction should be applied in order to calibrate the level of the Student's t-test. Moreover, if we want to test the locations of nearly infinitely many sequences of variables, then Šidák correction should be used, but with caution. More specifically, the validity of Šidák correction depends on how fast the number of sequences goes to infinity.

Misuse of p-values is common in scientific research and scientific education. p-values are often used or interpreted incorrectly; the American Statistical Association states that p-values can indicate how incompatible the data are with a specified statistical model. From a Neyman–Pearson hypothesis testing approach to statistical inferences, the data obtained by comparing the p-value to a significance level will yield one of two results: either the null hypothesis is rejected, or the null hypothesis cannot be rejected at that significance level. From a Fisherian statistical testing approach to statistical inferences, a low p-value means either that the null hypothesis is true and a highly improbable event has occurred or that the null hypothesis is false.

<i>q</i>-value (statistics) Statistical hypothesis testing measure

In statistical hypothesis testing, specifically multiple hypothesis testing, the q-value in the Storey-Tibshirani procedure provides a means to control the positive false discovery rate (pFDR). Just as the p-value gives the expected false positive rate obtained by rejecting the null hypothesis for any result with an equal or smaller p-value, the q-value gives the expected pFDR obtained by rejecting the null hypothesis for any result with an equal or smaller q-value.

In statistical hypothesis testing, the error exponent of a hypothesis testing procedure is the rate at which the probabilities of Type I and Type II decay exponentially with the size of the sample used in the test. For example, if the probability of error of a test decays as , where is the sample size, the error exponent is .

The harmonic mean p-value(HMP) is a statistical technique for addressing the multiple comparisons problem that controls the strong-sense family-wise error rate (this claim has been disputed). It improves on the power of Bonferroni correction by performing combined tests, i.e. by testing whether groups of p-values are statistically significant, like Fisher's method. However, it avoids the restrictive assumption that the p-values are independent, unlike Fisher's method. Consequently, it controls the false positive rate when tests are dependent, at the expense of less power (i.e. a higher false negative rate) when tests are independent. Besides providing an alternative to approaches such as Bonferroni correction that controls the stringent family-wise error rate, it also provides an alternative to the widely-used Benjamini-Hochberg procedure (BH) for controlling the less-stringent false discovery rate. This is because the power of the HMP to detect significant groups of hypotheses is greater than the power of BH to detect significant individual hypotheses.

References

  1. 1 2 3 4 5 6 7 8 9 Benjamini Y, Hochberg Y (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing". Journal of the Royal Statistical Society, Series B . 57 (1): 289–300. MR   1325392.
  2. Shaffer, J P (January 1995). "Multiple Hypothesis Testing". Annual Review of Psychology. 46 (1): 561–584. doi:10.1146/annurev.ps.46.020195.003021. S2CID   7696063. Gale   A16629837.
  3. 1 2 3 4 5 6 7 Benjamini Y (2010). "Discovering the false discovery rate". Journal of the Royal Statistical Society, Series B. 72 (4): 405–416. doi: 10.1111/j.1467-9868.2010.00746.x .
  4. 1 2 Storey JD, Tibshirani R (August 2003). "Statistical significance for genomewide studies". Proceedings of the National Academy of Sciences of the United States of America. 100 (16): 9440–5. Bibcode:2003PNAS..100.9440S. doi: 10.1073/pnas.1530509100 . PMC   170937 . PMID   12883005.
  5. Ryan TP, Woodall WH (2005). "The most-cited statistical papers". Journal of Applied Statistics. 32 (5): 461–474. Bibcode:2005JApSt..32..461R. doi:10.1080/02664760500079373. S2CID   109615204.
  6. Holm S (1979). "A simple sequentially rejective multiple test procedure". Scandinavian Journal of Statistics. 6 (2): 65–70. JSTOR   4615733. MR   0538597.
  7. Schweder T, Spjøtvoll E (1982). "Plots of P-values to evaluate many tests simultaneously". Biometrika. 69 (3): 493–502. doi:10.1093/biomet/69.3.493.
  8. Hochberg Y, Benjamini Y (July 1990). "More powerful procedures for multiple significance testing". Statistics in Medicine. 9 (7): 811–8. doi:10.1002/sim.4780090710. PMID   2218183.
  9. 1 2 Soric B (June 1989). "Statistical "Discoveries" and Effect-Size Estimation". Journal of the American Statistical Association. 84 (406): 608–610. doi:10.1080/01621459.1989.10478811. JSTOR   2289950.
  10. Simes RJ (1986). "An improved Bonferroni procedure for multiple tests of significance". Biometrika. 73 (3): 751–754. doi:10.1093/biomet/73.3.751.
  11. 1 2 3 4 5 Benjamini Y, Yekutieli D (2001). "The control of the false discovery rate in multiple testing under dependency". Annals of Statistics. 29 (4): 1165–1188. doi: 10.1214/aos/1013699998 . MR   1869245.
  12. Storey JD, Taylor JE, Siegmund D (2004). "Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach". Journal of the Royal Statistical Society, Series B. 66: 187–205. doi: 10.1111/j.1467-9868.2004.00439.x . S2CID   12646251.
  13. Benjamini Y, Krieger AM, Yekutieli D (2006). "Adaptive linear step-up procedures that control the false discovery rate". Biometrika. 93 (3): 491–507. doi:10.1093/biomet/93.3.491.
  14. Gavrilov Y, Benjamini Y, Sarkar SK (2009). "An adaptive step-down procedure with proven FDR control under independence". The Annals of Statistics. 37 (2): 619. arXiv: 0903.5373 . doi:10.1214/07-AOS586. S2CID   16913244.
  15. Blanchard G, Roquain E (2008). "Two simple sufficient conditions for FDR control". Electronic Journal of Statistics. 2: 963–992. arXiv: 0802.1406 . doi:10.1214/08-EJS180. S2CID   16662020.
  16. Yekutieli D, Benjamini Y (1999). "Resampling based False Discovery Rate controlling procedure for dependent test statistics". J. Statist. Planng Inf. 82 (1–2): 171–196. doi:10.1016/S0378-3758(99)00041-5.
  17. van der Laan MJ, Dudoit S (2007). Multiple Testing Procedures with Applications to Genomics. New York: Springer.
  18. Glueck, Deborah H; Mandel, Jan; Karimpour-Fard, Anis; Hunter, Lawrence; Muller, Keith E (30 January 2008). "Exact Calculations of Average Power for the Benjamini-Hochberg Procedure". The International Journal of Biostatistics. 4 (1): Article 11. doi:10.2202/1557-4679.1103. PMC   3020656 . PMID   21243075.
  19. Sarkar SK (2007). "Stepup procedures controlling generalized FWER and generalized FDR". The Annals of Statistics. 35 (6): 2405–20. arXiv: 0803.2934 . doi:10.1214/009053607000000398. S2CID   14784911.
  20. Sarkar SK, Guo W (June 2009). "On a generalized false discovery rate". The Annals of Statistics. 37 (3): 1545–65. arXiv: 0906.3091 . doi:10.1214/08-AOS617. JSTOR   30243677. S2CID   15746841.
  21. 1 2 Efron B (2008). "Microarrays, empirical Bayes and the two groups model". Statistical Science. 23: 1–22. arXiv: 0808.0603 . doi:10.1214/07-STS236. S2CID   8417479.
  22. 1 2 Storey JD (2002). "A direct approach to false discovery rates" (PDF). Journal of the Royal Statistical Society, Series B . 64 (3): 479–498. CiteSeerX   10.1.1.320.7131 . doi:10.1111/1467-9868.00346. S2CID   122987911.
  23. Benjamini Y (December 2010). "Simultaneous and selective inference: Current successes and future challenges". Biometrical Journal. Biometrische Zeitschrift. 52 (6): 708–21. doi:10.1002/bimj.200900299. PMID   21154895. S2CID   8806192.
  24. 1 2 Benjamini Y, Yekutieli Y (2005). "False discovery rate controlling confidence intervals for selected parameters". Journal of the American Statistical Association. 100 (469): 71–80. doi:10.1198/016214504000001907. S2CID   23202143.
  25. Zhao Z, Gene Hwang JT (2012). "Empirical Bayes false coverage rate controlling confidence intervals". Journal of the Royal Statistical Society, Series B. 74 (5): 871–891. doi:10.1111/j.1467-9868.2012.01033.x. hdl: 10.1111/j.1467-9868.2012.01033.x . S2CID   111420152.
  26. Storey JD (2003). "The positive false discovery rate: A Bayesian interpretation and the q-value". Annals of Statistics . 31 (6): 2013–2035. doi: 10.1214/aos/1074290335 .
  27. Efron B (2010). Large-Scale Inference. Cambridge University Press. ISBN   978-0-521-19249-1.
  28. Abramovich F, Benjamini Y, Donoho D, Johnstone IM (2006). "Adapting to unknown sparsity by controlling the false discovery rate". Annals of Statistics. 34 (2): 584–653. arXiv: math/0505374 . Bibcode:2005math......5374A. doi:10.1214/009053606000000074. S2CID   7581060.
  29. Donoho D, Jin J (2006). "Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data". Annals of Statistics. 34 (6): 2980–3018. arXiv: math/0602311 . Bibcode:2006math......2311D. doi:10.1214/009053606000000920. S2CID   9080115.
  30. Benjamini Y, Gavrilov Y (2009). "A simple forward selection procedure based on false discovery rate control". Annals of Applied Statistics. 3 (1): 179–198. arXiv: 0905.2819 . Bibcode:2009arXiv0905.2819B. doi:10.1214/08-AOAS194. S2CID   15719154.
  31. Donoho D, Jin JS (2004). "Higher criticism for detecting sparse heterogeneous mixtures". Annals of Statistics. 32 (3): 962–994. arXiv: math/0410072 . Bibcode:2004math.....10072D. doi:10.1214/009053604000000265. S2CID   912325.