Relative risk

Last updated
The group exposed to treatment (left) has half the risk (RR = 4/8 = 0.5) of an adverse outcome (black) compared to the unexposed group (right). Illustration of risk reduction.svg
The group exposed to treatment (left) has half the risk (RR = 4/8 = 0.5) of an adverse outcome (black) compared to the unexposed group (right).

The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association between the exposure and the outcome. [1]

Contents

Statistical use and meaning

Relative risk is used in the statistical analysis of the data of ecological, cohort, medical and intervention studies, to estimate the strength of the association between exposures (treatments or risk factors) and outcomes. [2] Mathematically, it is the incidence rate of the outcome in the exposed group, , divided by the rate of the unexposed group, . [3] As such, it is used to compare the risk of an adverse outcome when receiving a medical treatment versus no treatment (or placebo), or for environmental risk factors. For example, in a study examining the effect of the drug apixaban on the occurrence of thromboembolism, 8.8% of placebo-treated patients experienced the disease, but only 1.7% of patients treated with the drug did, so the relative risk is .19 (1.7/8.8): patients receiving apixaban had 19% the disease risk of patients receiving the placebo. [4] In this case, apixaban is a protective factor rather than a risk factor, because it reduces the risk of disease.

Assuming the causal effect between the exposure and the outcome, values of relative risk can be interpreted as follows: [2]

As always, correlation does not mean causation; the causation could be reversed, or they could both be caused by a common confounding variable. The relative risk of having cancer when in the hospital versus at home, for example, would be greater than 1, but that is because having cancer causes people to go to the hospital.

Usage in reporting

Relative risk is commonly used to present the results of randomized controlled trials. [5] This can be problematic if the relative risk is presented without the absolute measures, such as absolute risk, or risk difference. [6] In cases where the base rate of the outcome is low, large or small values of relative risk may not translate to significant effects, and the importance of the effects to the public health can be overestimated. Equivalently, in cases where the base rate of the outcome is high, values of the relative risk close to 1 may still result in a significant effect, and their effects can be underestimated. Thus, presentation of both absolute and relative measures is recommended. [7]

Inference

Relative risk can be estimated from a 2×2 contingency table:

 Group
Intervention (I) Control (C)
Events (E)IECE
Non-events (N)INCN

The point estimate of the relative risk is

The sampling distribution of the is closer to normal than the distribution of RR, [8] with standard error

The confidence interval for the is then

where is the standard score for the chosen level of significance. [9] [10] To find the confidence interval around the RR itself, the two bounds of the above confidence interval can be exponentiated. [9]

In regression models, the exposure is typically included as an indicator variable along with other factors that may affect risk. The relative risk is usually reported as calculated for the mean of the sample values of the explanatory variables.[ citation needed ]

Comparison to the odds ratio

Risk ratio vs odds ratio Risk Ratio vs Odds Ratio.svg
Risk ratio vs odds ratio

The relative risk is different from the odds ratio, although the odds ratio asymptotically approaches the relative risk for small probabilities of outcomes. If IE is substantially smaller than IN, then IE/(IE + IN) IE/IN. Similarly, if CE is much smaller than CN, then CE/(CN + CE) CE/CN. Thus, under the rare disease assumption

In practice the odds ratio is commonly used for case-control studies, as the relative risk cannot be estimated. [1]

In fact, the odds ratio has much more common use in statistics, since logistic regression, often associated with clinical trials, works with the log of the odds ratio, not relative risk. Because the (natural log of the) odds of a record is estimated as a linear function of the explanatory variables, the estimated odds ratio for 70-year-olds and 60-year-olds associated with the type of treatment would be the same in logistic regression models where the outcome is associated with drug and age, although the relative risk might be significantly different.[ citation needed ]

Since relative risk is a more intuitive measure of effectiveness, the distinction is important especially in cases of medium to high probabilities. If action A carries a risk of 99.9% and action B a risk of 99.0% then the relative risk is just over 1, while the odds associated with action A are more than 10 times higher than the odds with B.[ citation needed ]

In statistical modelling, approaches like Poisson regression (for counts of events per unit exposure) have relative risk interpretations: the estimated effect of an explanatory variable is multiplicative on the rate and thus leads to a relative risk. Logistic regression (for binary outcomes, or counts of successes out of a number of trials) must be interpreted in odds-ratio terms: the effect of an explanatory variable is multiplicative on the odds and thus leads to an odds ratio.[ citation needed ]

Bayesian interpretation

We could assume a disease noted by , and no disease noted by , exposure noted by , and no exposure noted by . The relative risk can be written as

This way the relative risk can be interpreted in Bayesian terms as the posterior ratio of the exposure (i.e. after seeing the disease) normalized by the prior ratio of exposure. [11] If the posterior ratio of exposure is similar to that of the prior, the effect is approximately 1, indicating no association with the disease, since it didn't change beliefs of the exposure. If on the other hand, the posterior ratio of exposure is smaller or higher than that of the prior ratio, then the disease has changed the view of the exposure danger, and the magnitude of this change is the relative risk.

Numerical example

Example of risk reduction
QuantityExperimental group (E) Control group (C)Total
Events (E)EE = 15CE = 100115
Non-events (N)EN = 135CN = 150285
Total subjects (S)ES = EE + EN = 150CS = CE + CN = 250400
Event rate (ER) EER = EE / ES = 0.1, or 10% CER = CE / CS = 0.4, or 40%
Variable Abbr. FormulaValue
Absolute risk reduction ARRCEREER0.3, or 30%
Number needed to treat NNT1 / (CEREER)3.33
Relative risk (risk ratio)RREER / CER0.25
Relative risk reduction RRR(CEREER) / CER, or 1 RR0.75, or 75%
Preventable fraction among the unexposed PFu(CEREER) / CER0.75
Odds ratio OR(EE / EN) / (CE / CN)0.167

See also

Related Research Articles

In probability theory and statistics, Bayes' theorem, named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to an individual of a known age to be assessed more accurately by conditioning it relative to their age, rather than simply assuming that the individual is typical of the population as a whole.

<span class="mw-page-title-main">Epidemiology</span> Study of health and disease within a population

Epidemiology is the study and analysis of the distribution, patterns and determinants of health and disease conditions in a defined population.

<span class="mw-page-title-main">Naive Bayes classifier</span> Probabilistic classification algorithm

In statistics, naive Bayes classifiers are a family of linear "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features. They are among the simplest Bayesian network models, but coupled with kernel density estimation, they can achieve high accuracy levels.

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

In survival analysis, the hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions characterised by two distinct levels of a treatment variable of interest. For example, in a clinical study of a drug, the treated population may die at twice the rate per unit time of the control population. The hazard ratio would be 2, indicating a higher hazard of death from the treatment.

An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently, the ratio of the odds of B in the presence of A and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event. If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A raises the odds of B. Conversely, if the OR is less than 1, then A and B are negatively correlated, and the presence of one event reduces the odds of the other event.

Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

<span class="mw-page-title-main">Regression dilution</span> Statistical bias in linear regressions

Regression dilution, also known as regression attenuation, is the biasing of the linear regression slope towards zero, caused by errors in the independent variable.

<span class="mw-page-title-main">Kelly criterion</span> Formula for bet sizing that maximizes the expected logarithmic value

In probability theory, the Kelly criterion is a formula for sizing a bet. The Kelly bet size is found by maximizing the expected value of the logarithm of wealth, which is equivalent to maximizing the expected geometric growth rate. It assumes that the expected returns are known and is optimal for a bettor who values their wealth logarithmically. J. L. Kelly Jr, a researcher at Bell Labs, described the criterion in 1956. Under the stated assumptions, the Kelly criterion leads to higher wealth than any other strategy in the long run.

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables.

Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one's hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated.

<span class="mw-page-title-main">Risk difference</span>

The risk difference (RD), excess risk, or attributable risk is the difference between the risk of an outcome in the exposed group and the unexposed group. It is computed as , where is the incidence in the exposed group, and is the incidence in the unexposed group. If the risk of an outcome is increased by the exposure, the term absolute risk increase (ARI) is used, and computed as . Equivalently, if the risk of an outcome is decreased by the exposure, the term absolute risk reduction (ARR) is used, and computed as .

The rare disease assumption is a mathematical assumption in epidemiologic case-control studies where the hypothesis tests the association between an exposure and a disease. It is assumed that, if the prevalence of the disease is low, then the odds ratio (OR) approaches the relative risk (RR). The idea was first demonstrated by Jerome Cornfield.

Population impact measures (PIMs) are biostatistical measures of risk and benefit used in epidemiological and public health research. They are used to describe the impact of health risks and benefits in a population, to inform health policy.

In statistics, the Cochran–Mantel–Haenszel test (CMH) is a test used in the analysis of stratified or matched categorical data. It allows an investigator to test the association between a binary predictor or treatment and a binary outcome such as case or control status while taking into account the stratification. Unlike the McNemar test, which can only handle pairs, the CMH test handles arbitrary strata size. It is named after William G. Cochran, Nathan Mantel and William Haenszel. Extensions of this test to a categorical response and/or to several groups are commonly called Cochran–Mantel–Haenszel statistics. It is often used in observational studies where random assignment of subjects to different treatments cannot be controlled, but confounding covariates can be measured.

Risk management in Indian banks is a relatively newer practice, but has already shown to increase efficiency in governing of these banks as such procedures tend to increase the corporate governance of a financial institution. In times of volatility and fluctuations in the market, financial institutions need to prove their mettle by withstanding the market variations and achieve sustainability in terms of growth and well as have a stable share value. Hence, an essential component of risk management framework would be to mitigate all the risks and rewards of the products and service offered by the bank. Thus the need for an efficient risk management framework is paramount in order to factor in internal and external risks.

<span class="mw-page-title-main">Diagnostic odds ratio</span>

In medical testing with binary classification, the diagnostic odds ratio (DOR) is a measure of the effectiveness of a diagnostic test. It is defined as the ratio of the odds of the test being positive if the subject has a disease relative to the odds of the test being positive if the subject does not have the disease.

Log-linear analysis is a technique used in statistics to examine the relationship between more than two categorical variables. The technique is used for both hypothesis testing and model building. In both these uses, models are tested to find the most parsimonious model that best accounts for the variance in the observed frequencies.

Conditional logistic regression is an extension of logistic regression that allows one to account for stratification and matching. Its main field of application is observational studies and in particular epidemiology. It was devised in 1978 by Norman Breslow, Nicholas Day, Katherine Halvorsen, Ross L. Prentice and C. Sabai. It is the most flexible and general procedure for matched data.

References

  1. 1 2 Sistrom CL, Garvan CW (January 2004). "Proportions, odds, and risk". Radiology. 230 (1): 12–9. doi:10.1148/radiol.2301031028. PMID   14695382.
  2. 1 2 Carneiro, Ilona. (2011). Introduction to epidemiology. Howard, Natasha. (2nd ed.). Maidenhead, Berkshire: Open University Press. p. 27. ISBN   978-0-335-24462-1. OCLC   773348873.
  3. Bruce, Nigel, 1955- (29 November 2017). Quantitative methods for health research : a practical interactive guide to epidemiology and statistics. Pope, Daniel, 1969-, Stanistreet, Debbi, 1963- (Second ed.). Hoboken, NJ. p. 199. ISBN   978-1-118-66526-8. OCLC   992438133.{{cite book}}: CS1 maint: location missing publisher (link) CS1 maint: multiple names: authors list (link)
  4. Motulsky, Harvey (2018). Intuitive biostatistics : a nonmathematical guide to statistical thinking (Fourth ed.). New York. p. 266. ISBN   978-0-19-064356-0. OCLC   1006531983.{{cite book}}: CS1 maint: location missing publisher (link)
  5. Nakayama T, Zaman MM, Tanaka H (April 1998). "Reporting of attributable and relative risks, 1966-97". Lancet. 351 (9110): 1179. doi:10.1016/s0140-6736(05)79123-6. PMID   9643696. S2CID   28195147.
  6. Noordzij M, van Diepen M, Caskey FC, Jager KJ (April 2017). "Relative risk versus absolute risk: one cannot be interpreted without the other". Nephrology, Dialysis, Transplantation. 32 (suppl_2): ii13–ii18. doi: 10.1093/ndt/gfw465 . PMID   28339913.
  7. Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG (March 2010). "CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials". BMJ. 340: c869. doi:10.1136/bmj.c869. PMC   2844943 . PMID   20332511.
  8. "Standard errors, confidence intervals, and significance tests". StataCorp LLC.
  9. 1 2 Szklo, Moyses; Nieto, F. Javier (2019). Epidemiology : beyond the basics (4th. ed.). Burlington, Massachusetts: Jones & Bartlett Learning. p. 488. ISBN   9781284116595. OCLC   1019839414.
  10. Katz, D.; Baptista, J.; Azen, S. P.; Pike, M. C. (1978). "Obtaining Confidence Intervals for the relative risk in Cohort Studies". Biometrics. 34 (3): 469–474. doi:10.2307/2530610. JSTOR   2530610.
  11. Armitage P, Berry G, Matthews JN (2002). Armitage P, Berry G, Matthews J (eds.). Statistical Methods in Medical Research. p. 1168. doi:10.1002/9780470773666. ISBN   978-0-470-77366-6. PMC   1812060 .{{cite book}}: |journal= ignored (help)