Accelerated failure time model

Last updated

In the statistical area of survival analysis, an accelerated failure time model (AFT model) is a parametric model that provides an alternative to the commonly used proportional hazards models. Whereas a proportional hazards model assumes that the effect of a covariate is to multiply the hazard by some constant, an AFT model assumes that the effect of a covariate is to accelerate or decelerate the life course of a disease by some constant. There is strong basic science evidence from C. elegans experiments by Stroustrup et al. [1] indicating that AFT models are the correct model for biological survival processes.

Contents

Model specification

In full generality, the accelerated failure time model can be specified as [2]

where denotes the joint effect of covariates, typically . (Specifying the regression coefficients with a negative sign implies that high values of the covariates increase the survival time, but this is merely a sign convention; without a negative sign, they increase the hazard.)

This is satisfied if the probability density function of the event is taken to be ; it then follows for the survival function that . From this it is easy[ citation needed ] to see that the moderated life time is distributed such that and the unmoderated life time have the same distribution. Consequently, can be written as

where the last term is distributed as , i.e., independently of . This reduces the accelerated failure time model to regression analysis (typically a linear model) where represents the fixed effects, and represents the noise. Different distributions of imply different distributions of , i.e., different baseline distributions of the survival time. Typically, in survival-analytic contexts, many of the observations are censored: we only know that , not . In fact, the former case represents survival, while the later case represents an event/death/censoring during the follow-up. These right-censored observations can pose technical challenges for estimating the model, if the distribution of is unusual.

The interpretation of in accelerated failure time models is straightforward: means that everything in the relevant life history of an individual happens twice as fast. For example, if the model concerns the development of a tumor, it means that all of the pre-stages progress twice as fast as for the unexposed individual, implying that the expected time until a clinical disease is 0.5 of the baseline time. However, this does not mean that the hazard function is always twice as high - that would be the proportional hazards model.

Statistical issues

Unlike proportional hazards models, in which Cox's semi-parametric proportional hazards model is more widely used than parametric models, AFT models are predominantly fully parametric i.e. a probability distribution is specified for . (Buckley and James [3] proposed a semi-parametric AFT but its use is relatively uncommon in applied research; in a 1992 paper, Wei [4] pointed out that the Buckley–James model has no theoretical justification and lacks robustness, and reviewed alternatives.) This can be a problem, if a degree of realistic detail is required for modelling the distribution of a baseline lifetime. Hence, technical developments in this direction would be highly desirable.

When a frailty term is incorporated in the survival model, the regression parameter estimates from AFT models are robust to omitted covariates, unlike proportional hazards models. They are also less affected by the choice of probability distribution for the frailty term. [5] [6]

The results of AFT models are easily interpreted. [7] For example, the results of a clinical trial with mortality as the endpoint could be interpreted as a certain percentage increase in future life expectancy on the new treatment compared to the control. So a patient could be informed that he would be expected to live (say) 15% longer if he took the new treatment. Hazard ratios can prove harder to explain in layman's terms.

Distributions used in AFT models

The log-logistic distribution provides the most commonly used AFT model[ citation needed ]. Unlike the Weibull distribution, it can exhibit a non-monotonic hazard function which increases at early times and decreases at later times. It is somewhat similar in shape to the log-normal distribution but it has heavier tails. The log-logistic cumulative distribution function has a simple closed form, which becomes important computationally when fitting data with censoring. For the censored observations one needs the survival function, which is the complement of the cumulative distribution function, i.e. one needs to be able to evaluate .

The Weibull distribution (including the exponential distribution as a special case) can be parameterised as either a proportional hazards model or an AFT model, and is the only family of distributions to have this property. The results of fitting a Weibull model can therefore be interpreted in either framework. However, the biological applicability of this model may be limited by the fact that the hazard function is monotonic, i.e. either decreasing or increasing.

Any distribution on a multiplicatively closed group, such as the positive real numbers, is suitable for an AFT model. Other distributions include the log-normal, gamma, hypertabastic, Gompertz distribution, and inverse Gaussian distributions, although they are less popular than the log-logistic, partly as their cumulative distribution functions do not have a closed form. Finally, the generalized gamma distribution is a three-parameter distribution that includes the Weibull, log-normal and gamma distributions as special cases.

Related Research Articles

<span class="mw-page-title-main">Weibull distribution</span> Continuous probability distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.

In survival analysis, the hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions characterised by two distinct levels of a treatment variable of interest. For example, in a clinical study of a drug, the treated population may die at twice the rate per unit time of the control population. The hazard ratio would be 2, indicating a higher hazard of death from the treatment.

Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory, reliability analysis or reliability engineering in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In convex analysis, a non-negative function f : RnR+ is logarithmically concave if its domain is a convex set, and if it satisfies the inequality

In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

In statistics, a semiparametric model is a statistical model that has parametric and nonparametric components.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one's hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated.

In statistics, censoring is a condition in which the value of a measurement or observation is only partially known.

<span class="mw-page-title-main">Log-logistic distribution</span> Continuous probability distribution for a non-negative random variable

In probability and statistics, the log-logistic distribution is a continuous probability distribution for a non-negative random variable. It is used in survival analysis as a parametric model for events whose rate increases initially and decreases later, as, for example, mortality rate from cancer following diagnosis or treatment. It has also been used in hydrology to model stream flow and precipitation, in economics as a simple model of the distribution of wealth or income, and in networking to model the transmission times of data considering both the network and the software.

More colloquially, a first passage time in a stochastic system, is the time taken for a state variable to reach a certain value. Understanding this metric allows one to further understand the physical system under observation, and as such has been the topic of research in very diverse fields, from economics to ecology.

The survival function is a function that gives the probability that a patient, device, or other object of interest will survive past a certain time. The survival function is also known as the survivor function or reliability function. The term reliability function is common in engineering while the term survival function is used in a broader range of applications, including human mortality. The survival function is the complementary cumulative distribution function of the lifetime. Sometimes complementary cumulative distribution functions are called survival functions in general.

Probability distribution fitting or simply distribution fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon. The aim of distribution fitting is to predict the probability or to forecast the frequency of occurrence of the magnitude of the phenomenon in a certain interval.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

In probability theory and statistics, the discrete Weibull distribution is the discrete variant of the Weibull distribution. The Discrete Weibull Distribution, first introduced by Toshio Nakagawa and Shunji Osaki, is a discrete analog of the continuous Weibull distribution, predominantly used in reliability engineering. It is particularly applicable for modeling failure data measured in discrete units like cycles or shocks. This distribution provides a versatile tool for analyzing scenarios where the timing of events is counted in distinct intervals, making it distinctively useful in fields that deal with discrete data patterns and reliability analysis. The discrete Weibull distribution is infinitely divisible only for .

<span class="mw-page-title-main">Hyperbolastic functions</span> Mathematical functions

The hyperbolastic functions, also known as hyperbolastic growth models, are mathematical functions that are used in medical statistical modeling. These models were originally developed to capture the growth dynamics of multicellular tumor spheres, and were introduced in 2005 by Mohammad Tabatabai, David Williams, and Zoran Bursac. The precision of hyperbolastic functions in modeling real world problems is somewhat due to their flexibility in their point of inflection. These functions can be used in a wide variety of modeling problems such as tumor growth, stem cell proliferation, pharma kinetics, cancer growth, sigmoid activation function in neural networks, and epidemiological disease progression or regression.

Hypertabastic survival models were introduced in 2007 by Mohammad Tabatabai, Zoran Bursac, David Williams, and Karan Singh. This distribution can be used to analyze time-to-event data in biomedical and public health areas and normally called survival analysis. In engineering, the time-to-event analysis is referred to as reliability theory and in business and economics it is called duration analysis. Other fields may use different names for the same analysis. These survival models are applicable in many fields such as biomedical, behavioral science, social science, statistics, medicine, bioinformatics, medical informatics, data science especially in machine learning, computational biology, business economics, engineering, and commercial entities. They not only look at the time to event, but whether or not the event occurred. These time-to-event models can be applied in a variety of applications for instance, time after diagnosis of cancer until death, comparison of individualized treatment with standard care in cancer research, time until an individual defaults on loans, relapsed time for drug and smoking cessation, time until property sold after being put on the market, time until an individual upgrades to a new phone, time until job relocation, time until bones receive microscopic fractures when undergoing different stress levels, time from marriage until divorce, time until infection due to catheter, and time from bridge completion until first repair.

Recurrent event analysis is a branch of survival analysis that analyzes the time until recurrences occur, such as recurrences of traits or diseases. Recurrent events are often analyzed in social sciences and medical studies, for example recurring infections, depressions or cancer recurrences. Recurrent event analysis attempts to answer certain questions, such as: how many recurrences occur on average within a certain time interval? Which factors are associated with a higher or lower risk of recurrence?

References

  1. Stroustrup, Nicholas (16 January 2016). "The temporal scaling of Caenorhabditis elegans ageing". Nature. 530 (7588): 103–107. doi:10.1038/nature16550. PMC   4828198 .
  2. Kalbfleisch & Prentice (2002). The Statistical Analysis of Failure Time Data (2nd ed.). Hoboken, NJ: Wiley Series in Probability and Statistics.
  3. Buckley, Jonathan; James, Ian (1979), "Linear regression with censored data", Biometrika, 66 (3): 429–436, doi:10.1093/biomet/66.3.429, JSTOR   2335161
  4. Wei, L. J. (1992). "The accelerated failure time model: A useful alternative to the cox regression model in survival analysis". Statistics in Medicine. 11 (14–15): 1871–1879. doi:10.1002/sim.4780111409. PMID   1480879.
  5. Lambert, Philippe; Collett, Dave; Kimber, Alan; Johnson, Rachel (2004), "Parametric accelerated failure time models with random effects and an application to kidney transplant survival", Statistics in Medicine, 23 (20): 3177–3192, doi:10.1002/sim.1876, PMID   15449337
  6. Keiding, N.; Andersen, P. K.; Klein, J. P. (1997). "The Role of Frailty Models and Accelerated Failure Time Models in Describing Heterogeneity Due to Omitted Covariates". Statistics in Medicine. 16 (1–3): 215–224. doi:10.1002/(SICI)1097-0258(19970130)16:2<215::AID-SIM481>3.0.CO;2-J. PMID   9004393.
  7. Kay, Richard; Kinnersley, Nelson (2002), "On the use of the accelerated failure time model as an alternative to the proportional hazards model in the treatment of time to event data: A case study in influenza", Drug Information Journal, 36 (3): 571–579, doi:10.1177/009286150203600312

Further reading