Logrank test

Last updated

The logrank test, or log-rank test, is a hypothesis test to compare the survival distributions of two samples. It is a nonparametric test and appropriate to use when the data are right skewed and censored (technically, the censoring must be non-informative). It is widely used in clinical trials to establish the efficacy of a new treatment in comparison with a control treatment when the measurement is the time to event (such as the time from initial treatment to a heart attack). The test is sometimes called the Mantel–Cox test. The logrank test can also be viewed as a time-stratified Cochran–Mantel–Haenszel test.

Contents

The test was first proposed by Nathan Mantel and was named the logrank test by Richard and Julian Peto. [1] [2] [3]

Definition

The logrank test statistic compares estimates of the hazard functions of the two groups at each observed event time. It is constructed by computing the observed and expected number of events in one of the groups at each observed event time and then adding these to obtain an overall summary across all-time points where there is an event.

Consider two groups of patients, e.g., treatment vs. control. Let be the distinct times of observed events in either group. Let and be the number of subjects "at risk" (who have not yet had an event or been censored) at the start of period in the groups, respectively. Let and be the observed number of events in the groups at time . Finally, define and .

The null hypothesis is that the two groups have identical hazard functions, . Hence, under , for each group , follows a hypergeometric distribution with parameters , , . This distribution has expected value and variance .

For all , the logrank statistic compares to its expectation under . It is defined as

     (for or )

By the central limit theorem, the distribution of each converges to that of a standard normal distribution as approaches infinity and therefore can be approximated by the standard normal distribution for a sufficiently large . An improved approximation can be obtained by equating this quantity to Pearson type I or II (beta) distributions with matching first four moments, as described in Appendix B of the Peto and Peto paper. [2]

Asymptotic distribution

If the two groups have the same survival function, the logrank statistic is approximately standard normal. A one-sided level test will reject the null hypothesis if where is the upper quantile of the standard normal distribution. If the hazard ratio is , there are total subjects, is the probability a subject in either group will eventually have an event (so that is the expected number of events at the time of the analysis), and the proportion of subjects randomized to each group is 50%, then the logrank statistic is approximately normal with mean and variance 1. [4] For a one-sided level test with power , the sample size required is where and are the quantiles of the standard normal distribution.

Joint distribution

Suppose and are the logrank statistics at two different time points in the same study ( earlier). Again, assume the hazard functions in the two groups are proportional with hazard ratio and and are the probabilities that a subject will have an event at the two time points where . and are approximately bivariate normal with means and and correlation . Calculations involving the joint distribution are needed to correctly maintain the error rate when the data are examined multiple times within a study by a Data Monitoring Committee.

Relationship to other statistics

Test assumptions

The logrank test is based on the same assumptions as the Kaplan-Meier survival curve—namely, that censoring is unrelated to prognosis, the survival probabilities are the same for subjects recruited early and late in the study, and the events happened at the times specified. Deviations from these assumptions matter most if they are satisfied differently in the groups being compared, for example if censoring is more likely in one group than another. [5]

See also

Related Research Articles

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Weibull distribution</span> Continuous probability distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice René Fréchet and first applied by Rosin & Rammler (1933) to describe a particle size distribution.

Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative (objective) prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher information matrix:

<span class="mw-page-title-main">Generalized inverse Gaussian distribution</span>

In probability theory and statistics, the generalized inverse Gaussian distribution (GIG) is a three-parameter family of continuous probability distributions with probability density function

<span class="mw-page-title-main">Inverse Gaussian distribution</span> Family of continuous probability distributions

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one's hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In statistics, a power transform is a family of functions applied to create a monotonic transformation of data using power functions. It is a data transformation technique used to stabilize variance, make the data more normal distribution-like, improve the validity of measures of association, and for other data stabilization procedures.

In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.

<span class="mw-page-title-main">Fréchet distribution</span>

The Fréchet distribution, also known as inverse Weibull distribution, is a special case of the generalized extreme value distribution. It has the cumulative distribution function

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

<span class="mw-page-title-main">Conway–Maxwell–Poisson distribution</span> Probability distribution

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

<span class="mw-page-title-main">Normal-inverse-gamma distribution</span>

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume.

In probability theory and statistics, the noncentral beta distribution is a continuous probability distribution that is a noncentral generalization of the (central) beta distribution.

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution.

<span class="mw-page-title-main">Kaniadakis logistic distribution</span> Probability distribution

The Kaniadakis Logistic distribution is a generalized version of the Logistic distribution associated with the Kaniadakis statistics. It is one example of a Kaniadakis distribution. The κ-Logistic probability distribution describes the population kinetics behavior of bosonic or fermionic character.

References

  1. Mantel, Nathan (1966). "Evaluation of survival data and two new rank order statistics arising in its consideration". Cancer Chemotherapy Reports. 50 (3): 163–70. PMID   5910392.
  2. 1 2 Peto, Richard; Peto, Julian (1972). "Asymptotically Efficient Rank Invariant Test Procedures". Journal of the Royal Statistical Society, Series A. Blackwell Publishing. 135 (2): 185–207. doi:10.2307/2344317. hdl: 10338.dmlcz/103602 . JSTOR   2344317.
  3. Harrington, David (2005). "Linear Rank Tests in Survival Analysis". Encyclopedia of Biostatistics. Wiley Interscience. doi:10.1002/0470011815.b2a11047. ISBN   047084907X.
  4. Schoenfeld, D (1981). "The asymptotic properties of nonparametric tests for comparing survival distributions". Biometrika. 68 (1): 316–319. doi:10.1093/biomet/68.1.316. JSTOR   2335833.
  5. Bland, J. M.; Altman, D. G. (2004). "The logrank test". BMJ. 328 (7447): 1073. doi:10.1136/bmj.328.7447.1073. PMC   403858 . PMID   15117797.