This article may be too technical for most readers to understand.(January 2026) |
The Berk-Jones test (or BJ test) refers to a class of non-parametric goodness-of-fit statistical tests used to determine whether a set of observed data follow a specific probability distribution. Introduced by Robert H. Berk and Douglas H. Jones in 1979, the test is specifically designed to be more powerful than the Kolmogorov-Smirnov test in the tails of the distribution. [1]
In statistical hypothesis testing, a goodness-of-fit test compares an empirical distribution function (EDF) to a theoretical cumulative distribution function (CDF). While the Kolmogorov-Smirnov test is arguably the most well-known method, it is often criticized for its lack of sensitivity to deviations occurring at the extremes (tails) of the distribution.
The Berk-Jones test addresses this shortcoming by adopting a "pointwise" maximum likelihood ratio approach. It still belongs to the family of supremum-type statistics but also incorporates information-theoretic properties, specifically the Kullback-Leibler divergence.
Let be independent and identically distributed (i.i.d.) random variables with an empirical distribution function . We wish to test the null hypothesis .
The Berk-Jones statistic is defined as:
where is the binary Kullback-Leibler divergence (or relative entropy), defined as:
The statistic can also be expressed in terms of the order statistics . For a fixed , follows a Binomial distribution . The BJ statistic effectively finds the that maximizes the likelihood ratio for testing the success probability of this binomial. [2]
The Berk-Jones test is closely related to the Higher Criticism statistic (HC), originally suggested by John Tukey in 1976 and further developed in 2004 by David Donoho and Jianshun Jin. Both tests are used for sparse signal detection.
While the BJ test is based on the likelihood ratio, the HC statistic is based on a standardized distance (essentially a Z score approach):
Asymptotically (as approaches infinity), the BJ test and HC are equivalent in their ability to detect sparse mixtures. Specifically, they both belong to the Rényi family of statistics. However, the BJ test is over viewed as the "information-theoretic version" of HC. When applying a Taylor expansion to the Kullback-Leibler divergence around , the first non-vanishing term in the expansion is proportional to the square of the HC statistic: [3]
This shows that the Higher Criticism statistic is essentially a local quadratic approximation of the Berk-Jones statistic.
In high-dimensional settings where the signal is "rare and weak," both tests have the same detection boundary (the point at which it becomes mathematically possible to distinguish signal from noise). However, the BJ test is often preferred in finite samples of the interval more naturally than the -score denominator in HC.
The effectiveness of the Berk-Jones test compared with other methods depends heavily on the nature of the deviation from the null hypothesis.
| Test Method | Sensitivity Focus | Best Use Case |
|---|---|---|
| Kolmogorov–Smirnov | Median / Center | General deviations in the bulk of the data. |
| Anderson–Darling | Weighted Tails | Moderate tail sensitivity; common in engineering. |
| Berk–Jones | Extreme Tails | Sparse signals; likelihood-ratio based optimality. |
| Higher Criticism | Extreme Tails | Sparse signals; -score based optimality. |
The Berk-Jones test is widely used in: