Risk score

Last updated October 23, 2022

Risk score (or risk scoring) is the name given to a general practice in applied statistics, bio-statistics, econometrics and other related disciplines, of creating an easily calculated number (the score) that reflects the level of risk in the presence of some risk factors (e.g. risk of mortality or disease in the presence of symptoms or genetic profile, risk financial loss considering credit and financial history, etc.).

Simple to calculate: In many cases all you need to calculate a score is a pen and a piece of paper (although some scores use rely on more sophisticated or less transparent calculations that require a computer program).
Easily interpreted: The result of the calculation is a single number, and higher score usually means higher risk. Furthermore, many scoring methods enforce some form of monotonicity along the measured risk factors to allow a straight forward interpretation of the score (e.g. risk of mortality only increases with age, risk of payment default only increase with the amount of total debt the customer has, etc.).
Actionable: Scores are designed around a set of possible actions that should be taken as a result of the calculated score. Effective score-based policies can be designed and executed by setting thresholds on the value of the score and associating them with escalating actions.

Formal definition

A typical scoring method is composed of 3 components:^[1]

A set of consistent rules (or weights) that assign a numerical value ("points") to each risk factor that reflect our estimation of underlying risk.
A formula (typically a simple sum of all accumulated points) that calculates the score.
A set of thresholds that helps to translate the calculated score into a level of risk, or an equivalent formula or set of rules to translate the calculated score back into probabilities (leaving the nominal evaluation of severity to the practitioner).

Items 1 & 2 can be achieved by using some form of regression, that will provide both the risk estimation and the formula to calculate the score. Item 3 requires setting an arbitrary set of thresholds and will usually involve expert opinion.

Estimating risk with GLM

Risk score are designed to represent an underlying probability of an adverse event denoted $\lbrace Y=1\rbrace$ given a vector of $P$ explaining variables $\mathbf {X}$ containing measurements of the relevant risk factors. In order to establish the connection between the risk factors and the probability we estimate a set of weights $\beta$ is estimated using a generalized linear model:

{\begin{aligned}\operatorname {E} (\mathbf {Y} |\mathbf {X} )=\mathbf {P} (\mathbf {Y} =1|\mathbf {X} )=g^{-1}(\mathbf {X} \beta )\end{aligned}}

Where $g^{-1}:\mathbb {R} \rightarrow [0,1]$ is a real-valued, monotonically increasing function that maps the values of the linear predictor $\mathbf {X} \beta$ to the interval $[0,1]$ . GLM methods typically uses the logit or probit as the link function.

Estimating risk with other methods

While it's possible to estimate $\mathbf {P} (\mathbf {Y} =1|\mathbf {X} )$ using other statistical or machine learning methods, the requirements of simplicity and easy interpretation (and monotonicity per risk factor) make most of these methods difficult to use for scoring in this context:

With more sophisticated methods it becomes difficult to attribute simple weights for each risk factor and to provide a simple formula for the calculation of the score. A notable exception are tree-based methods like CART, that can provide a simple set of decision rules and calculations, but cannot ensure the monotonicity of the scale across the different risk factors.
The fact that we are estimating underlying risk across the population, and therefore cannot tag people in advance on an ordinal scale (we can't know in advance if a person belongs to a "high risk" group, we only see observed incidences^{[ spelling? ]}) classification methods are only relevant if we want to classify people into 2 groups or 2 possible actions.

Constructing the score

When using GLM, the set of estimated weights $\beta$ can be used to assign different values (or "points") to different values of the risk factors in $\mathbf {X}$ (continuous or nominal as indicators). The score can then be expressed as a weighted sum:

{\begin{aligned}{\text{Score}}=\mathbf {X} \beta =\sum _{j=1}^{P}\mathbf {X} _{j}\beta _{j}\end{aligned}}

Some scoring methods will translate the score into probabilities by using $g^{-1}$ (e.g. SAPS II score ^[2] that gives an explicit function to calculate mortality from the score^[3]) or a look-up table (e.g. ABCD² score ^[4]^[5] or the ISM7 (NI) Scorecard^[6]). This practice makes the process of obtaining the score more complicated computationally but has the advantage of translating an arbitrary number to a more familiar scale of 0 to 1.
The columns of $\mathbf {X}$ can represent complex transformations of the risk factors (including multiple interactions) and not just the risk factors themselves.
The values of $\beta$ are sometimes scaled or rounded to allow working with integers instead of very small fractions (making the calculation simpler). While scaling has no impact ability of the score to estimate risk, rounding has the potential of disrupting the "optimality" of the GLM estimation.

Making score-based decisions

Let $\mathbf {A} =\lbrace \mathbf {a} _{1},...,\mathbf {a} _{m}\rbrace$ denote a set of $m\geq 2$ "escalating" actions available for the decision maker (e.g. for credit risk decisions: $\mathbf {a} _{1}$ = "approve automatically", $\mathbf {a} _{2}$ = "require more documentation and check manually", $\mathbf {a} _{3}$ = "decline automatically"). In order to define a decision rule, we want to define a map between different values of the score and the possible decisions in $\mathbf {A}$ . Let $\tau =\lbrace \tau _{1},...\tau _{m-1}\rbrace$ be a partition of $\mathbb {R}$ into $m$ consecutive, non-overlapping intervals, such that $\tau _{1}<\tau _{2}<\ldots <\tau _{m-1}$ .

The map is defined as follows:

{\begin{aligned}{\text{If Score}}\in [\tau _{j-1},\tau _{j})\rightarrow {\text{Take action }}\mathbf {a} _{j}\end{aligned}}

The values of $\tau$ are set based on expert opinion, the type and prevalence of the measured risk, consequences of miss-classification, etc. For example, a risk of 9 out of 10 will usually be considered as "high risk", but a risk of 7 out of 10 can be considered either "high risk" or "medium risk" depending on context.
The definition of the intervals is on right open-ended intervals but can be equivalently defined using left open ended intervals $(\tau _{j-1},\tau _{j}]$ .
For scoring methods that are already translated the score into probabilities we either define the partition $\tau$ directly on the interval $[0,1]$ or translate the decision criteria into $[g^{-1}(\tau _{j-1}),g^{-1}(\tau _{j}))$ , and the monotonicity of $g$ ensures a 1-to-1 translation.

Examples

Biostatistics

(see more examples on the category page Category:Medical scoring system)

Financial industry

The primary use of scores in the financial sector is for Credit scorecards, or credit scores:

In many countries (such as the US) credit score are calculated by commercial entities and therefore the exact method is not public knowledge (for example the Bankruptcy risk score, FICO score and others). Credit scores in Australia and UK are often calculated by using logistic regression to estimate probability of default, and are therefore a type of risk score.
Other financial industries, such as the insurance industry also use scoring methods, but the exact implementation remains a trade secret, except for some rare cases^[6]

Social Sciences

COMPAS score for recidivism, as reverse-engineered by ProPublica^[7] using logistic regression and Cox's proportional hazard model.

Related Research Articles

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

The likelihood function is the joint probability of the observed data viewed as a function of the parameters of the chosen statistical model.

<span class="mw-page-title-main">Fokker–Planck equation</span> Partial differential equation

In statistical mechanics, the Fokker–Planck equation is a partial differential equation that describes the time evolution of the probability density function of the velocity of a particle under the influence of drag forces and random forces, as in Brownian motion. The equation can be generalized to other observables as well.

In statistics, the logistic model is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

An instanton is a notion appearing in theoretical and mathematical physics. An instanton is a classical solution to equations of motion with a finite, non-zero action, either in quantum mechanics or in quantum field theory. More precisely, it is a solution to the equations of motion of the classical field theory on a Euclidean spacetime.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a specialization of generalized least squares.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of $independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.$

The softmax function, also known as softargmax or normalized exponential function, converts a vector of $K$ real numbers into a probability distribution of $K$ possible outcomes. It is a generalization of the logistic function to multiple dimensions, and used in multinomial logistic regression. The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes, based on Luce's choice axiom.

In many-body theory, the term Green's function is sometimes used interchangeably with correlation function, but refers specifically to correlators of field operators or creation and annihilation operators.

In probability theory and statistics, the normal-gamma distribution is a bivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and precision.

In mathematics – specifically, in stochastic analysis – an Itô diffusion is a solution to a specific type of stochastic differential equation. That equation is similar to the Langevin equation used in physics to describe the Brownian motion of a particle subjected to a potential in a viscous fluid. Itô diffusions are named after the Japanese mathematician Kiyosi Itô.

The partition function or configuration integral, as used in probability theory, information theory and dynamical systems, is a generalization of the definition of a partition function in statistical mechanics. It is a special case of a normalizing constant in probability theory, for the Boltzmann distribution. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated probability measure, the Gibbs measure, has the Markov property. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks, and applications such as genomics, corpus linguistics and artificial intelligence, which employ Markov networks, and Markov logic networks. The Gibbs measure is also the unique measure that has the property of maximizing the entropy for a fixed expectation value of the energy; this underlies the appearance of the partition function in maximum entropy methods and the algorithms derived therefrom.

The Swendsen–Wang algorithm is the first non-local or cluster algorithm for Monte Carlo simulation for large systems near criticality. It has been introduced by Robert Swendsen and Jian-Sheng Wang in 1987 at Carnegie Mellon.

The Generalized Additive Model for Location, Scale and Shape (GAMLSS) is an approach to statistical modelling and learning. GAMLSS is a modern distribution-based approach to (semiparametric) regression. A parametric distribution is assumed for the response (target) variable but the parameters of this distribution can vary according to explanatory variables using linear, nonlinear or smooth functions. In machine learning parlance, GAMLSS is a form of supervised machine learning.

In probability theory, tau-leaping, or τ-leaping, is an approximate method for the simulation of a stochastic system. It is based on the Gillespie algorithm, performing all reactions for an interval of length tau before updating the propensity functions. By updating the rates less often this sometimes allows for more efficient simulation and thus the consideration of larger systems.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

In premixed turbulent combustion, Bray–Moss–Libby (BML) model is a closure model for a scalar field, built on the assumption that the reaction sheet is infinitely thin compared with the turbulent scales, so that the scalar can be found either at the state of burnt gas or unburnt gas. The model is named after Kenneth Bray, J. B. Moss and Paul A. Libby.

References

Hastie, T. J.; Tibshirani, R. J. (1990). Generalized Additive Models. Chapman & Hall/CRC. ISBN 978-0-412-34390-2.

↑ Toren, Yizhar (2011). "Ordinal Risk-Group Classification". arXiv: 1012.5487 [stat.ML].
↑ Le Gall, JR; Lemeshow, S; Saulnier, F (1993). "A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study". JAMA. 270 (24): 2957–63. doi:10.1001/jama.1993.03510240069035. PMID 8254858.
↑ "Simplified Acute Physiology Score (SAPS II) Calculator - ClinCalc.com". clincalc.com. Retrieved August 20, 2018.
↑ Johnston SC; Rothwell PM; Nguyen-Huynh MN; Giles MF; Elkins JS; Bernstein AL; Sidney S. "Validation and refinement of scores to predict very early stroke risk after transient ischaemic attack" Lancet (2007): 369(9558):283-292
↑ "ABCD² Score for TIA". www.mdcalc.com. Retrieved December 16, 2018.
1 2 "ISM7 (NI) Scorecard, Allstate Property & Casualty Company" (PDF). Retrieved December 16, 2018.
↑ "How We Analyzed the COMPAS Recidivism Algorithm" . Retrieved December 16, 2018.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[ORGC-1] Toren, Yizhar (2011). "Ordinal Risk-Group Classification". arXiv: 1012.5487 [stat.ML].

[2] Le Gall, JR; Lemeshow, S; Saulnier, F (1993). "A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study". JAMA. 270 (24): 2957–63. doi:10.1001/jama.1993.03510240069035. PMID 8254858.

[3] "Simplified Acute Physiology Score (SAPS II) Calculator - ClinCalc.com". clincalc.com. Retrieved August 20, 2018.

[4] Johnston SC; Rothwell PM; Nguyen-Huynh MN; Giles MF; Elkins JS; Bernstein AL; Sidney S. "Validation and refinement of scores to predict very early stroke risk after transient ischaemic attack" Lancet (2007): 369(9558):283-292

[5] "ABCD² Score for TIA". www.mdcalc.com. Retrieved December 16, 2018.

[AllState2006-6] 1 2 "ISM7 (NI) Scorecard, Allstate Property & Casualty Company" (PDF). Retrieved December 16, 2018.

[7] "How We Analyzed the COMPAS Recidivism Algorithm" . Retrieved December 16, 2018.

[1]

[2]

[3]

[4]

[5]

[6]

[7]