**Proportional hazards models** are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one's hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated (or decelerated).

Survival models can be viewed as consisting of two parts: the underlying baseline hazard function, often denoted , describing how the risk of event per time unit changes over time at *baseline* levels of covariates; and the effect parameters, describing how the hazard varies in response to explanatory covariates. A typical medical example would include covariates such as treatment assignment, as well as patient characteristics such as age at start of study, gender, and the presence of other diseases at start of study, in order to reduce variability and/or control for confounding.

The *proportional hazards condition*^{ [1] } states that covariates are multiplicatively related to the hazard. In the simplest case of stationary coefficients, for example, a treatment with a drug may, say, halve a subject's hazard at any given time , while the baseline hazard may vary. Note however, that this does not double the lifetime of the subject; the precise effect of the covariates on the lifetime depends on the type of . The covariate is not restricted to binary predictors; in the case of a continuous covariate , it is typically assumed that the hazard responds exponentially; each unit increase in results in proportional scaling of the hazard.

The Cox partial likelihood, shown below, is obtained by using Breslow's estimate of the baseline hazard function, plugging it into the full likelihood and then observing that the result is a product of two factors. The first factor is the partial likelihood shown below, in which the baseline hazard has "canceled out". The second factor is free of the regression coefficients and depends on the data only through the censoring pattern. The effect of covariates estimated by any proportional hazards model can thus be reported as hazard ratios.

Sir David Cox observed that if the proportional hazards assumption holds (or, is assumed to hold) then it is possible to estimate the effect parameter(s) without any consideration of the hazard function. This approach to survival data is called application of the * Cox proportional hazards model*,

Let *X*_{i} = (*X*_{i1}, … , *X*_{ip}) be the realized values of the covariates for subject *i*. The hazard function for the Cox proportional hazards model has the form

This expression gives the hazard function at time *t* for subject *i* with covariate vector (explanatory variables) *X*_{i}.

The likelihood of the event to be observed occurring for subject *i* at time *Y*_{i} can be written as:

where *θ*_{j} = exp(*X*_{j} ⋅ *β*) and the summation is over the set of subjects *j* where the event has not occurred before time *Y*_{i} (including subject *i* itself). Obviously 0 < *L*_{i}(β) ≤ 1. This is a partial likelihood: the effect of the covariates can be estimated without the need to model the change of the hazard over time.

Treating the subjects as if they were statistically independent of each other, the joint probability of all realized events^{ [5] } is the following partial likelihood, where the occurrence of the event is indicated by *C*_{i} = 1:

The corresponding log partial likelihood is

This function can be maximized over *β* to produce maximum partial likelihood estimates of the model parameters.

The partial score function is

and the Hessian matrix of the partial log likelihood is

Using this score function and Hessian matrix, the partial likelihood can be maximized using the Newton-Raphson algorithm. The inverse of the Hessian matrix, evaluated at the estimate of *β*, can be used as an approximate variance-covariance matrix for the estimate, and used to produce approximate standard errors for the regression coefficients.

Several approaches have been proposed to handle situations in which there are ties in the time data. *Breslow's method* describes the approach in which the procedure described above is used unmodified, even when ties are present. An alternative approach that is considered to give better results is *Efron's method*.^{ [6] } Let *t*_{j} denote the unique times, let *H*_{j} denote the set of indices *i* such that *Y*_{i} = *t*_{j} and *C*_{i} = 1, and let *m*_{j} = |*H*_{j}|. Efron's approach maximizes the following partial likelihood.

The corresponding log partial likelihood is

the score function is

and the Hessian matrix is

where

Note that when *H*_{j} is empty (all observations with time *t*_{j} are censored), the summands in these expressions are treated as zero.

Extensions to time dependent variables, time dependent strata, and multiple events per subject, can be incorporated by the counting process formulation of Andersen and Gill.^{ [7] } One example of the use of hazard models with time-varying regressors is estimating the effect of unemployment insurance on unemployment spells.^{ [8] }^{ [9] }

In addition to allowing time-varying covariates (i.e., predictors), the Cox model may be generalized to time-varying coefficients as well. That is, the proportional effect of a treatment may vary with time; e.g. a drug may be very effective if administered within one month of morbidity, and become less effective as time goes on. The hypothesis of no change with time (stationarity) of the coefficient may then be tested. Details and software (R package) are available in Martinussen and Scheike (2006).^{ [10] }^{ [11] } The application of the Cox model with time-varying covariates is considered in reliability mathematics.^{ [12] }

In this context, it could also be mentioned that it is theoretically possible to specify the effect of covariates by using additive hazards,^{ [13] } i.e. specifying

If such additive hazards models are used in situations where (log-)likelihood maximization is the objective, care must be taken to restrict to non-negative values. Perhaps as a result of this complication, such models are seldom seen. If the objective is instead least squares the non-negativity restriction is not strictly required.

The Cox model may be specialized if a reason exists to assume that the baseline hazard follows a particular form. In this case, the baseline hazard is replaced by a given function. For example, assuming the hazard function to be the *Weibull* hazard function gives the *Weibull proportional hazards model*.

Incidentally, using the Weibull baseline hazard is the only circumstance under which the model satisfies both the proportional hazards, and accelerated failure time models.

The generic term *parametric proportional hazards models* can be used to describe proportional hazards models in which the hazard function is specified. The Cox proportional hazards model is sometimes called a * semiparametric model * by contrast.

Some authors use the term *Cox proportional hazards model* even when specifying the underlying hazard function,^{ [14] } to acknowledge the debt of the entire field to David Cox.

The term *Cox regression model* (omitting *proportional hazards*) is sometimes used to describe the extension of the Cox model to include time-dependent factors. However, this usage is potentially ambiguous since the Cox proportional hazards model can itself be described as a regression model.

There is a relationship between proportional hazards models and Poisson regression models which is sometimes used to fit approximate proportional hazards models in software for Poisson regression. The usual reason for doing this is that calculation is much quicker. This was more important in the days of slower computers but can still be useful for particularly large data sets or complex problems. Laird and Olivier (1981)^{ [15] } provide the mathematical details. They note, "we do not assume [the Poisson model] is true, but simply use it as a device for deriving the likelihood." McCullagh and Nelder's^{ [16] } book on generalized linear models has a chapter on converting proportional hazards models to generalized linear models.

In high-dimension, when number of covariates p is large compared to the sample size n, the LASSO method is one of the classical model-selection strategies. Tibshirani (1997) has proposed a Lasso procedure for the proportional hazard regression parameter.^{ [17] } The Lasso estimator of the regression parameter β is defined as the minimizer of the opposite of the Cox partial log-likelihood under an L^{1}-norm type constraint.

There has been theoretical progress on this topic recently.^{ [18] }^{ [19] }^{ [20] }^{ [21] }

- ↑ Breslow, N. E. (1975). "Analysis of Survival Data under the Proportional Hazards Model".
*International Statistical Review / Revue Internationale de Statistique*.**43**(1): 45–57. doi:10.2307/1402659. JSTOR 1402659. - ↑ Cox, David R (1972). "Regression Models and Life-Tables".
*Journal of the Royal Statistical Society, Series B*.**34**(2): 187–220. JSTOR 2985181. MR 0341758. - ↑ Reid, N. (1994). "A Conversation with Sir David Cox".
*Statistical Science*.**9**(3): 439–455. doi: 10.1214/ss/1177010394 . - ↑ Cox, D. R. (1997).
*Some remarks on the analysis of survival data*. the First Seattle Symposium of Biostatistics: Survival Analysis. - ↑ "Each failure contributes to the likelihood function", Cox (1972), page 191.
- ↑ Efron, Bradley (1974). "The Efficiency of Cox's Likelihood Function for Censored Data".
*Journal of the American Statistical Association*.**72**(359): 557–565. doi:10.1080/01621459.1977.10480613. JSTOR 2286217. - ↑ Andersen, P.; Gill, R. (1982). "Cox's regression model for counting processes, a large sample study".
*Annals of Statistics*.**10**(4): 1100–1120. doi: 10.1214/aos/1176345976 . JSTOR 2240714. - ↑ Meyer, B. D. (1990). "Unemployment Insurance and Unemployment Spells" (PDF).
*Econometrica*.**58**(4): 757–782. doi:10.2307/2938349. JSTOR 2938349. - ↑ Bover, O.; Arellano, M.; Bentolila, S. (2002). "Unemployment Duration, Benefit Duration, and the Business Cycle" (PDF).
*The Economic Journal*.**112**(479): 223–265. doi:10.1111/1468-0297.00034. - ↑ Martinussen; Scheike (2006).
*Dynamic Regression Models for Survival Data*. Springer. doi:10.1007/0-387-33960-4. ISBN 978-0-387-20274-7. - ↑ "timereg: Flexible Regression Models for Survival Data".
*CRAN*. - ↑ Wu, S.; Scarf, P. (2015). "Decline and repair, and covariate effects" (PDF).
*European Journal of Operational Research*.**244**(1): 219–226. doi:10.1016/j.ejor.2015.01.041. - ↑ Cox, D. R. (1997).
*Some remarks on the analysis of survival data*. the First Seattle Symposium of Biostatistics: Survival Analysis. - ↑ Bender, R.; Augustin, T.; Blettner, M. (2006). "Generating survival times to simulate Cox proportional hazards models".
*Statistics in Medicine*.**24**(11): 1713–1723. doi:10.1002/sim.2369. PMID 16680804. - ↑ Nan Laird and Donald Olivier (1981). "Covariance Analysis of Censored Survival Data Using Log-Linear Analysis Techniques".
*Journal of the American Statistical Association*.**76**(374): 231–240. doi:10.2307/2287816. JSTOR 2287816. - ↑ P. McCullagh and J. A. Nelder (2000). "Chapter 13: Models for Survival Data".
*Generalized Linear Models*(Second ed.). Boca Raton, Florida: Chapman & Hall/CRC. ISBN 978-0-412-31760-6. (Second edition 1989; first CRC reprint 1999.) - ↑ Tibshirani, R. (1997). "The Lasso method for variable selection in the Cox model".
*Statistics in Medicine*.**16**(4): 385–395. CiteSeerX 10.1.1.411.8024 . doi:10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3. - ↑ Bradić, J.; Fan, J.; Jiang, J. (2011). "Regularization for Cox's proportional hazards model with NP-dimensionality".
*Annals of Statistics*.**39**(6): 3092–3120. arXiv: 1010.5233 . doi:10.1214/11-AOS911. PMC 3468162 . PMID 23066171. - ↑ Bradić, J.; Song, R. (2015). "Structured Estimation in Nonparametric Cox Model".
*Electronic Journal of Statistics*.**9**(1): 492–534. arXiv: 1207.4510 . doi:10.1214/15-EJS1004. - ↑ Kong, S.; Nan, B. (2014). "Non-asymptotic oracle inequalities for the high-dimensional Cox regression via Lasso".
*Statistica Sinica*.**24**(1): 25–42. arXiv: 1204.1992 . doi:10.5705/ss.2012.240. PMC 3916829 . PMID 24516328. - ↑ Huang, J.; Sun, T.; Ying, Z.; Yu, Y.; Zhang, C. H. (2011). "Oracle inequalities for the lasso in the Cox model".
*The Annals of Statistics*.**41**(3): 1142–1165. arXiv: 1306.4847 . doi:10.1214/13-AOS1098. PMC 3786146 . PMID 24086091.

In statistics, the **likelihood function** measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters. It is formed from the joint probability distribution of the sample, but viewed and used as a function of the parameters only, thus treating the random variables as fixed at the observed values.

In probability theory and statistics, the **exponential distribution** is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In probability theory and statistics, the **Weibull distribution** is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Fréchet (1927) and first applied by Rosin & Rammler (1933) to describe a particle size distribution.

In probability theory and statistics, the **gamma distribution** is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are two different parameterizations in common use:

- With a shape parameter
*k*and a scale parameter*θ*. - With a shape parameter
*α*=*k*and an inverse scale parameter*β*= 1/*θ*, called a rate parameter.

**Survival analysis** is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called **reliability theory** or **reliability analysis** in engineering, **duration analysis** or **duration modelling** in economics, and **event history analysis** in sociology. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

In mathematics, the **associated Legendre polynomials** are the canonical solutions of the **general Legendre equation**

In statistics, **Poisson regression** is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable *Y* has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

In statistics, a **semiparametric model** is a statistical model that has parametric and nonparametric components.

In mathematics, the **Weyl character formula** in representation theory describes the characters of irreducible representations of compact Lie groups in terms of their highest weights. It was proved by Hermann Weyl. There is a closely related formula for the character of an irreducible representation of a semisimple Lie algebra. In Weyl's approach to the representation theory of connected compact Lie groups, the proof of the character formula is a key step in proving that every dominant integral element actually arises as the highest weight of some irreducible representation. Important consequences of the character formula are the Weyl dimension formula and the Kostant multiplicity formula.

The **Wigner D-matrix** is a unitary matrix in an irreducible representation of the groups SU(2) and SO(3). The complex conjugate of the D-matrix is an eigenfunction of the Hamiltonian of spherical and symmetric rigid rotors. The matrix was introduced in 1927 by Eugene Wigner. D stands for *Darstellung*, which means "representation" in German.

A **ratio distribution** is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables *X* and *Y*, the distribution of the random variable *Z* that is formed as the ratio *Z* = *X*/*Y* is a *ratio distribution*.

In the statistical area of survival analysis, an **accelerated failure time model ** is a parametric model that provides an alternative to the commonly used proportional hazards models. Whereas a proportional hazards model assumes that the effect of a covariate is to multiply the hazard by some constant, an AFT model assumes that the effect of a covariate is to accelerate or decelerate the life course of a disease by some constant. This is especially appealing in a technical context where the 'disease' is a result of some mechanical process with a known sequence of intermediary stages.

In statistics, **principal component regression** (**PCR**) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

In statistics and machine learning, **lasso** is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term.

In probability and statistics, the **generalized beta distribution** is a continuous probability distribution with five parameters, including more than thirty named distributions as limiting or special cases. It has been used in the modeling of income distribution, stock returns, as well as in regression analysis. The **exponential generalized beta (EGB) distribution** follows directly from the GB and generalizes other common distributions.

In statistics, the **variance function** is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

**Conditional logistic regression** is an extension of logistic regression that allows one to take into account stratification and matching. Its main field of application is observational studies and in particular epidemiology. It was devised in 1978 by Norman Breslow, Nicholas Day, Katherine Halvorsen, Ross L. Prentice and C. Sabai. It is the most flexible and general procedure for matched data.

In statistics, the class of **vector generalized linear models** (**VGLMs**) was proposed to enlarge the scope of models catered for by generalized linear models (**GLMs**). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a *link function*. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

In econometrics, the **information matrix test** is used to determine whether a regression model is misspecified. The test was developed by Halbert White, who observed that in a correctly specified model and under standard regularity assumptions, the Fisher information matrix can be expressed in either of two ways: as the outer product of the gradient, or as a function of the Hessian matrix of the log-likelihood function.

**Batch normalization** is a method used to make artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling. It was proposed by Sergey Ioffe and Christian Szegedy in 2015.

- Bagdonavicius, V.; Levuliene, R.; Nikulin, M. (2010). "Goodness-of-fit Criteria for the Cox model from Left Truncated and Right Censored Data".
*Journal of Mathematical Sciences*.**167**(4): 436–443. doi:10.1007/s10958-010-9929-6. - Cox, D. R.; Oakes, D. (1984).
*Analysis of Survival Data*. New York: Chapman & Hall. ISBN 978-0412244902. - Collett, D. (2003).
*Modelling Survival Data in Medical Research*(2nd ed.). Boca Raton: CRC. ISBN 978-1584883258. - Gouriéroux, Christian (2000). "Duration Models".
*Econometrics of Qualitative Dependent Variables*. New York: Cambridge University Press. pp. 284–362. ISBN 978-0-521-58985-7. - Singer, Judith D.; Willett, John B. (2003). "Fitting Cox Regression Models".
*Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence*. New York: Oxford University Press. pp. 503–542. ISBN 978-0-19-515296-8. - Therneau, T. M.; Grambsch, P. M. (2000).
*Modeling Survival Data: Extending the Cox Model*. New York: Springer. ISBN 978-0387987842.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.