Hazard ratio

Last updated

In survival analysis, the hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions characterised by two distinct levels of a treatment variable of interest. For example, in a clinical study of a drug, the treated population may die at twice the rate per unit time[ clarify ] of the control population. The hazard ratio would be 2, indicating a higher hazard of death from the treatment.

Contents

For example, a scientific paper might use an HR to state something such as: "Adequate COVID-19 vaccination status was associated with significantly decreased risk for the composite of severe COVID-19 or mortality with a[n] HR of 0.20 (95% CI, 0.17–0.22)." [1] In essence, the hazard for the composite outcome was 80% lower among the vaccinated relative to those who were unvaccinated in the same study. So, for a hazardous outcome (e.g., severe disease or death), an HR below 1 indicates that the treatment (e.g., vaccination) is protective against the outcome of interest. In other cases, an HR greater than 1 indicates the treatment is favorable. For example, if the outcome is actually favorable (e.g., accepting a job offer to end a spell of unemployment), an HR greater than 1 indicates that seeking a job is favorable to not seeking one (if "treatment" is defined as seeking a job). [2]

Hazard ratios differ from relative risks (RRs) and odds ratios (ORs) in that RRs and ORs are cumulative over an entire study, using a defined endpoint, while HRs represent instantaneous risk over the study time period, or some subset thereof. Hazard ratios suffer somewhat less from selection bias with respect to the endpoints chosen and can indicate risks that happen before the endpoint.

Definition and derivation

Regression models are used to obtain hazard ratios and their confidence intervals. [3]

The instantaneous hazard rate is the limit of the number of events per unit time divided by the number at risk, as the time interval approaches 0:

where N(t) is the number at risk at the beginning of an interval. A hazard is the probability that a patient fails between and , given that they have survived up to time , divided by , as approaches zero. [4]

The hazard ratio is the effect on this hazard rate of a difference, such as group membership (for example, treatment or control, male or female), as estimated by regression models that treat the logarithm of the HR as a function of a baseline hazard and a linear combination of explanatory variables:

Such models are generally classed proportional hazards regression models; the best known being the Cox proportional hazards model, [3] [5] and the exponential, Gompertz and Weibull parametric models.

For two groups that differ only in treatment condition, the ratio of the hazard functions is given by , where is the estimate of treatment effect derived from the regression model. This hazard ratio, that is, the ratio between the predicted hazard for a member of one group and that for a member of the other group, is given by holding everything else constant, i.e. assuming proportionality of the hazard functions. [4]

For a continuous explanatory variable, the same interpretation applies to a unit difference. Other HR models have different formulations and the interpretation of the parameter estimates differs accordingly.

Interpretation

Kaplan-Meier curve illustrating overall survival based on volume of brain metastases. Elaimy et al. (2011) Kaplan-Meier curve Tumor Volume Size.png
Kaplan-Meier curve illustrating overall survival based on volume of brain metastases. Elaimy et al. (2011)

In its simplest form, the hazard ratio can be interpreted as the chance of an event occurring in the treatment arm divided by the chance of the event occurring in the control arm, or vice versa, of a study. The resolution of these endpoints are usually depicted using Kaplan–Meier survival curves. These curves relate the proportion of each group where the endpoint has not been reached. The endpoint could be any dependent variable associated with the covariate (independent variable), e.g. death, remission of disease or contraction of disease. The curve represents the odds of an endpoint having occurred at each point in time (the hazard). The hazard ratio is simply the relationship between the instantaneous hazards in the two groups and represents, in a single number, the magnitude of distance between the Kaplan–Meier plots. [7]

Hazard ratios do not reflect a time unit of the study. The difference between hazard-based and time-based measures is akin to the difference between the odds of winning a race and the margin of victory. [3] When a study reports one hazard ratio per time period, it is assumed that difference between groups was proportional. Hazard ratios become meaningless when this assumption of proportionality is not met. [7] [ page needed ]

If the proportional hazard assumption holds, a hazard ratio of one means equivalence in the hazard rate of the two groups, whereas a hazard ratio other than one indicates difference in hazard rates between groups. The researcher indicates the probability of this sample difference being due to chance by reporting the probability associated with some test statistic. [8] For instance, the from the Cox-model or the log-rank test might then be used to assess the significance of any differences observed in these survival curves. [9]

Conventionally, probabilities lower than 0.05 are considered significant and researchers provide a 95% confidence interval for the hazard ratio, e.g. derived from the standard deviation of the Cox-model regression coefficient, i.e. . [9] [10] Statistically significant hazard ratios cannot include unity (one) in their confidence intervals. [7]

The proportional hazards assumption

The proportional hazards assumption for hazard ratio estimation is strong and often unreasonable. [11] Complications, adverse effects and late effects are all possible causes of change in the hazard rate over time. For instance, a surgical procedure may have high early risk, but excellent long term outcomes.[ citation needed ]

If the hazard ratio between groups remain constant, this is not a problem for interpretation. However, interpretation of hazard ratios become impossible when selection bias exists between groups. For instance, a particularly risky surgery might result in the survival of a systematically more robust group who would have fared better under any of the competing treatment conditions, making it look as if the risky procedure was better. Follow-up time is also important. A cancer treatment associated with better remission rates might on follow-up be associated with higher relapse rates. The researchers' decision about when to follow up is arbitrary and may lead to very different reported hazard ratios. [12]

The hazard ratio and survival

Hazard ratios are often treated as a ratio of death probabilities. [4] For example, a hazard ratio of 2 is thought to mean that a group has twice the chance of dying than a comparison group. In the Cox-model, this can be shown to translate to the following relationship between group survival functions: (where r is the hazard ratio). [4] Therefore, with a hazard ratio of 2, if (20% survived at time t), (4% survived at t). The corresponding death probabilities are 0.8 and 0.96. [11] It should be clear that the hazard ratio is a relative measure of effect and tells us nothing about absolute risk. [13] [ page needed ]

While hazard ratios allow for hypothesis testing, they should be considered alongside other measures for interpretation of the treatment effect, e.g. the ratio of median times (median ratio) at which treatment and control group participants are at some endpoint. If the analogy of a race is applied, the hazard ratio is equivalent to the odds that an individual in the group with the higher hazard reaches the end of the race first. The probability of being first can be derived from the odds, which is the probability of being first divided by the probability of not being first:

; conversely, .

In the previous example, a hazard ratio of 2 corresponds to a 67% chance of an early death. The hazard ratio does not convey information about how soon the death will occur. [3]

The hazard ratio, treatment effect and time-based endpoints

Treatment effect depends on the underlying disease related to survival function, not just the hazard ratio. Since the hazard ratio does not give us direct time-to-event information, researchers have to report median endpoint times and calculate the median endpoint time ratio by dividing the control group median value by the treatment group median value.[ citation needed ]

While the median endpoint ratio is a relative speed measure, the hazard ratio is not. [3] The relationship between treatment effect and the hazard ratio is given as . A statistically important, but practically insignificant effect can produce a large hazard ratio, e.g. a treatment increasing the number of one-year survivors in a population from one in 10,000 to one in 1,000 has a hazard ratio of 10. It is unlikely that such a treatment would have had much impact on the median endpoint time ratio, which likely would have been close to unity, i.e. mortality was largely the same regardless of group membership and clinically insignificant.[ citation needed ]

By contrast, a treatment group in which 50% of infections are resolved after one week (versus 25% in the control) yields a hazard ratio of two. If it takes ten weeks for all cases in the treatment group and half of cases in the control group to resolve, the ten-week hazard ratio remains at two, but the median endpoint time ratio is ten, a clinically significant difference.

See also

Related Research Articles

The likelihood function is the joint probability of observed data viewed as a function of the parameters of a statistical model.

<span class="mw-page-title-main">Least squares</span> Approximation method in statistics

The method of least squares is a parameters estimation method in regression analysis based on minimizing the sum of the squares of the residuals made in the results of each individual equation.

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

<span class="mw-page-title-main">Deming regression</span> Algorithm for the line of best fit for a two-dimensional dataset

In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model that tries to find the line of best fit for a two-dimensional dataset. It differs from the simple linear regression in that it accounts for errors in observations on both the x- and the y- axis. It is a special case of total least squares, which allows for any number of predictors and a more complicated error structure.

An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently, the ratio of the odds of B in the presence of A and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event. If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A raises the odds of B. Conversely, if the OR is less than 1, then A and B are negatively correlated, and the presence of one event reduces the odds of the other event.

Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

In finance, the beta is a statistic that measures the expected increase or decrease of an individual stock price in proportion to movements of the stock market as a whole. Beta can be used to indicate the contribution of an individual asset to the market risk of a portfolio when it is added in small quantity. It refers to an asset's non-diversifiable risk, systematic risk, or market risk. Beta is not a measure of idiosyncratic risk.

<span class="mw-page-title-main">Regression dilution</span> Statistical bias in linear regressions

Regression dilution, also known as regression attenuation, is the biasing of the linear regression slope towards zero, caused by errors in the independent variable.

<span class="mw-page-title-main">Simple linear regression</span> Linear regression model with a single explanatory variable

In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

<span class="mw-page-title-main">Relative risk</span> Measure of association used in epidemiology

The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association between the exposure and the outcome.

In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one's hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated.

The logrank test, or log-rank test, is a hypothesis test to compare the survival distributions of two samples. It is a nonparametric test and appropriate to use when the data are right skewed and censored. It is widely used in clinical trials to establish the efficacy of a new treatment in comparison with a control treatment when the measurement is the time to event. The test is sometimes called the Mantel–Cox test. The logrank test can also be viewed as a time-stratified Cochran–Mantel–Haenszel test.

<span class="mw-page-title-main">Log-logistic distribution</span>

In probability and statistics, the log-logistic distribution is a continuous probability distribution for a non-negative random variable. It is used in survival analysis as a parametric model for events whose rate increases initially and decreases later, as, for example, mortality rate from cancer following diagnosis or treatment. It has also been used in hydrology to model stream flow and precipitation, in economics as a simple model of the distribution of wealth or income, and in networking to model the transmission times of data considering both the network and the software.

In the statistical area of survival analysis, an accelerated failure time model is a parametric model that provides an alternative to the commonly used proportional hazards models. Whereas a proportional hazards model assumes that the effect of a covariate is to multiply the hazard by some constant, an AFT model assumes that the effect of a covariate is to accelerate or decelerate the life course of a disease by some constant. This is especially appealing in a technical context where the 'disease' is a result of some mechanical process with a known sequence of intermediary stages.

The mathematical principles of reinforcement (MPR) constitute of a set of mathematical equations set forth by Peter Killeen and his colleagues attempting to describe and predict the most fundamental aspects of behavior.

More colloquially, a first passage time in a stochastic system, is the time taken for a state variable to reach a certain value. Understanding this metric allows one to further understand the physical system under observation, and as such has been the topic of research in very diverse fields, from economics to ecology.

<span class="mw-page-title-main">Hyperbolastic functions</span> Mathematical functions

The hyperbolastic functions, also known as hyperbolastic growth models, are mathematical functions that are used in medical statistical modeling. These models were originally developed to capture the growth dynamics of multicellular tumor spheres, and were introduced in 2005 by Mohammad Tabatabai, David Williams, and Zoran Bursac. The precision of hyperbolastic functions in modeling real world problems is somewhat due to their flexibility in their point of inflection. These functions can be used in a wide variety of modeling problems such as tumor growth, stem cell proliferation, pharma kinetics, cancer growth, sigmoid activation function in neural networks, and epidemiological disease progression or regression.

Hypertabastic survival models were introduced in 2007 by Mohammad Tabatabai, Zoran Bursac, David Williams, and Karan Singh. This distribution can be used to analyze time-to-event data in biomedical and public health areas and normally called survival analysis. In engineering, the time-to event analysis is referred to as reliability theory and in business and economics it is called duration analysis. Other fields may use different names for the same analysis. These survival models are applicable in many fields such as biomedical, behavioral science, social science, statistics, medicine, bioinformatics, medicalinformatics, data science especially in machine learning, computational biology, business economics, engineering, and commercial entities. They not only look at the time to event, but whether or not the event occurred. These time-to-event models can be applied in a variety of applications for instance, time after diagnosis of cancer until death, comparison of individualized treatment with standard care in cancer research, time until an individual defaults on loans, relapsed time for drug and smoking cessation, time until property sold after being put on the market, time until an individual upgrades to a new phone, time until job relocation, time until bones receive microscopic fractures when undergoing different stress levels, time from marriage until divorce, time until infection due to catheter, and time from bridge completion until first repair.

Recurrent event analysis is a branch of survival analysis that analyzes the time until recurrences occur, such as recurrences of traits or diseases. Recurrent events are often analysed in social sciences and medical studies, for example recurring infections, depressions or cancer recurrences. Recurrent event analysis attempts to answer certain questions, such as: how many recurrences occur on average within a certain time interval? Which factors are associated with a higher or lower risk of recurrence?

References

  1. Najjar-Debbiny, R.; Gronich, N.; Weber, G.; Khoury, J.; Amar, M.; Stein, N.; Goldstein, L. H.; Saliba, W. (2 June 2022). "Effectiveness of Paxlovid in Reducing Severe COVID-19 and Mortality in High Risk Patients". Clinical Infectious Diseases. 76 (3): e342–e349. doi:10.1093/cid/ciac443. PMC   9214014 . PMID   35653428.
  2. Flinn, C.; Heckman, J. (1982). "New Methods for Analyzing Labor Force Structural Dynamics" (PDF). Journal of Econometrics. 18 (1): 115–168. doi:10.1016/0304-4076(82)90097-5. S2CID   16100294 via Elsevier Science Direct.
  3. 1 2 3 4 5 Spruance, Spotswood; Julia E. Reid, Michael Grace, Matthew Samore (August 2004). "Hazard Ratio in Clinical Trials". Antimicrobial Agents and Chemotherapy. 48 (8): 2787–2792. doi:10.1128/AAC.48.8.2787-2792.2004. PMC   478551 . PMID   15273082.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  4. 1 2 3 4 L. Douglas Case; Gretchen Kimmick, Electra D. Paskett, Kurt Lohmana, Robert Tucker (June 2002). "Interpreting Measures of Treatment Effect in Cancer Clinical Trials". The Oncologist. 7 (3): 181–187. doi: 10.1634/theoncologist.7-3-181 . PMID   12065789. S2CID   46520247 . Retrieved 7 December 2012.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  5. Cox, D. R. (1972). "Regression-Models and Life-Tables" (PDF). Journal of the Royal Statistical Society. B (Methodological). 34 (2): 187–220. Archived from the original (PDF) on 20 June 2013. Retrieved 5 December 2012.
  6. Elaimy, Ameer; Alexander R Mackay, Wayne T Lamoreaux, Robert K Fairbanks, John J Demakas, Barton S Cooke, Benjamin J Peressini, John T Holbrook, Christopher M Lee (5 July 2011). "Multimodality treatment of brain metastases: an institutional survival analysis of 275 patients". World Journal of Surgical Oncology. 9 (69): 69. doi: 10.1186/1477-7819-9-69 . PMC   3148547 . PMID   21729314.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  7. 1 2 3 Brody, Tom (2011). Clinical Trials: Study Design, Endpoints and Biomarkers, Drug Safety, and FDA and ICH Guidelines. Academic Press. pp. 165–168. ISBN   9780123919137.
  8. Motulsky, Harvey (2010). Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. Oxford University Press. pp. 210–218. ISBN   9780199730063.
  9. 1 2 Geoffrey R. Norman; David L. Streiner (2008). Biostatistics: The Bare Essentials. PMPH-USA. pp. 283–287. ISBN   9781550093476 . Retrieved 7 December 2012.
  10. David G. Kleinbaum; Mitchel Klein (2005). Survival Analysis: A Self-Learning Text (2 ed.). Springer. ISBN   9780387239187 . Retrieved 7 December 2012.[ page needed ]
  11. 1 2 Cantor, Alan (2003). Sas Survival Analysis Techniques for Medical Research. SAS Institute. pp. 111–150. ISBN   9781590471357.
  12. Hernán, Miguel (January 2010). "The Hazards of Hazard Ratios". Epidemiology. The Changing Face of Epidemiology. 21 (1): 13–15. doi:10.1097/EDE.0b013e3181c1ea43. PMC   3653612 . PMID   20010207.
  13. Newman, Stephan (2003). Biostatistical Methods in Epidemiology. John Wiley & Sons. ISBN   9780471461609.[ page needed ]