A risk matrix is a matrix that is used during risk assessment to define the level of risk by considering the category of likelihood (often confused with one of its possible quantitative metrics, i.e. the probability) against the category of consequence severity. This is a simple mechanism to increase visibility of risks and assist management decision making. [1]
Risk is the lack of certainty about the outcome of making a particular choice. Statistically, the level of downside risk can be calculated as the product of the probability that harm occurs (e.g., that an accident happens) multiplied by the severity of that harm (i.e., the average amount of harm or more conservatively the maximum credible amount of harm). In practice, the risk matrix is a useful approach where either the probability or the harm severity cannot be estimated with accuracy and precision.
Although standard risk matrices exist in certain contexts (e.g. US DoD, NASA, ISO), [2] [3] [4] individual projects and organizations may need to create their own or tailor an existing risk matrix. For example, the harm severity can be categorized as:
The likelihood of harm occurring might be categorized as 'certain', 'likely', 'possible', 'unlikely' and 'rare'. However it must be considered that very low likelihood may not be very reliable.
The resulting risk matrix could be:
Likelihood | Harm severity | |||
---|---|---|---|---|
Minor | Marginal | Critical | Catastrophic | |
Certain | High | High | Very high | Very high |
Likely | Medium | High | High | Very high |
Possible | Low | Medium | High | Very high |
Unlikely | Low | Medium | Medium | High |
Rare | Low | Low | Medium | Medium |
Eliminated | Eliminated |
The company or organization then would calculate what levels of risk they can take with different events. This would be done by weighing the risk of an event occurring against the cost to implement safety and the benefit gained from it.
The following is an example matrix of possible personal injuries, with particular accidents allocated to appropriate cells within the matrix:
Impact Likelihood | Negligible | Marginal | Critical | Catastrophic |
---|---|---|---|---|
Certain | Stubbing toe | |||
Likely | Fall | |||
Possible | Major car accident | |||
Unlikely | Aircraft crash | |||
Rare | Major tsunami |
The risk matrix is approximate and can often be challenged. For example, the likelihood of death in an aircraft crash is about 1:11 million [5] but death by motor vehicle is 1:5000, [5] but nobody usually survives a plane crash, so it is far more catastrophic[ citation needed ].
On January 30 1978, [6] a new version of US Department of Defense Instruction 6055.1 ("Department of Defense Occupational Safety and Health Program") was released. It is said to have been an important step towards the development of the risk matrix. [7]
In August 1978, business textbook author David E Hussey defined an investment "risk matrix" with risk on one axis, and profitability on the other. The values on the risk axis were determined by first determining risk impact and risk probability values in a manner identical to completing a 7 x 7 version of the modern risk matrix. [8]
A 5 x 4 version of the risk matrix was defined by the US Department of Defense on March 30 1984, in "MIL-STD-882B System Safety Program Requirements". [9] [10]
The risk matrix was in use by the acquisition reengineering team at the US Air Force Electronic Systems Center in 1995. [11]
Huihui Ni, An Chen and Ning Chen proposed some refinements of the approach in 2010. [12]
In 2019, the three most popular forms of the matrix were:
Other standards are also in use. [14]
In his article 'What's Wrong with Risk Matrices?', [15] Tony Cox argues that risk matrices experience several problematic mathematical features making it harder to assess risks. These are:
Thomas, Bratvold, and Bickel [16] demonstrate that risk matrices produce arbitrary risk rankings. Rankings depend upon the design of the risk matrix itself, such as how large the bins are and whether or not one uses an increasing or decreasing scale. In other words, changing the scale can change the answer.
An additional problem is the imprecision used on the categories of likelihood. For example; 'certain', 'likely', 'possible', 'unlikely' and 'rare' are not hierarchically related. A better choice might be obtained through use of the same base term, such as 'extremely common', 'very common', 'fairly common', 'less common', 'very uncommon', 'extremely uncommon' or a similar hierarchy on a base "frequency" term.[ citation needed ]
Another common problem is to assign rank indices to the matrix axes and multiply the indices to get a "risk score". While this seems intuitive, it results in an uneven distribution.[ citation needed ]
Douglas W. Hubbard and Richard Seiersen take the general research from Cox, Thomas, Bratvold, and Bickel, and provide specific discussion in the realm of cybersecurity risk. They point out that since 61% of cybersecurity professionals use some form of risk matrix, this can be a serious problem. Hubbard and Seiersen consider these problems in the context of other measured human errors and conclude that "The errors of the experts are simply further exacerbated by the additional errors introduced by the scales and matrices themselves. We agree with the solution proposed by Thomas et al. There is no need for cybersecurity (or other areas of risk analysis that also use risk matrices) to reinvent well-established quantitative methods used in many equally complex problems." [17]
Risk management is the identification, evaluation, and prioritization of risks, followed by the minimization, monitoring, and control of the impact or probability of those risks occurring.
Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.
Fault tree analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.
Risk assessment determines possible mishaps, their likelihood and consequences, and the tolerances for such events. The results of this process may be expressed in a quantitative or qualitative fashion. Risk assessment is an inherent part of a broader risk management strategy to help reduce any potential risk-related consequences.
Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. A FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability is defined as the probability that a product, system, or service will perform its intended function adequately for a specified period of time, OR will operate in a defined environment without failure. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.
Probabilistic risk assessment (PRA) is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity or the effects of stressors on the environment.
A hazard analysis is one of many methods that may be used to assess risk. At its core, the process entails describing a system object that intends to conduct some activity. During the performance of that activity, an adverse event may be encountered that could cause or contribute to an occurrence. Finally, that occurrence will result in some outcome that may be measured in terms of the degree of loss or harm. This outcome may be measured on a continuous scale, such as an amount of monetary loss, or the outcomes may be categorized into various levels of severity.
Failure mode effects and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).
IEC 61508 is an international standard published by the International Electrotechnical Commission (IEC) consisting of methods on how to apply, design, deploy and maintain automatic protection systems called safety-related systems. It is titled Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems.
Accident analysis is a process carried out in order to determine the cause or causes of an accident so as to prevent further accidents of a similar kind. It is part of accident investigation or incident investigation. These analyses may be performed by a range of experts, including forensic scientists, forensic engineers or health and safety advisers. Accident investigators, particularly those in the aircraft industry, are colloquially known as "tin-kickers". Health and safety and patient safety professionals prefer using the term "incident" in place of the term "accident". Its retrospective nature means that accident analysis is primarily an exercise of directed explanation; conducted using the theories or methods the analyst has to hand, which directs the way in which the events, aspects, or features of accident phenomena are highlighted and explained. These analyses are also invaluable in determining ways to prevent future incidents from occurring. They provide good insight by determining root causes, into what failures occurred that lead to the incident.
A job safety analysis (JSA) is a procedure that helps integrate accepted safety and health principles and practices into a particular task or job operation. The goal of a JSA is to identify potential hazards of a specific role and recommend procedures to control or prevent these hazards.
Information technology risk, IT risk, IT-related risk, or cyber risk is any risk relating to information technology. While information has long been appreciated as a valuable and important asset, the rise of the knowledge economy and the Digital Revolution has led to organizations becoming increasingly dependent on information, information processing and especially IT. Various events or incidents that compromise IT in some way can therefore cause adverse impacts on the organization's business processes or mission, ranging from inconsequential to catastrophic in scale.
A hazard is a potential source of harm. Substances, events, or circumstances can constitute hazards when their nature would potentially allow them to cause damage to health, life, property, or any other interest of value. The probability of that harm being realized in a specific incident, combined with the magnitude of potential harm, make up its risk. This term is often used synonymously in colloquial speech.
In simple terms, risk is the possibility of something bad happening. Risk involves uncertainty about the effects/implications of an activity with respect to something that humans value, often focusing on negative, undesirable consequences. Many different definitions have been proposed. One international standard definition of risk is the "effect of uncertainty on objectives".
ISO 26262, titled "Road vehicles – Functional safety", is an international standard for functional safety of electrical and/or electronic systems that are installed in serial production road vehicles, defined by the International Organization for Standardization (ISO) in 2011, and revised in 2018.
Qualitative risk analysis is a technique used to quantify risk associated with a particular hazard. Risk assessment is used for uncertain events that could have many outcomes and for which there could be significant consequences. Risk is a function of probability of an event and the consequences given the event occurs. Probability refers to the likelihood that a hazard will occur. In a qualitative assessment, probability and consequence are not numerically estimated, but are evaluated verbally using qualifiers like high likelihood, low likelihood, etc. Qualitative assessments are good for screening level assessments when comparing/screening multiple alternatives or for when sufficient data is not available to support numerical probability or consequence estimates. Once numbers are inserted into the analysis the analysis transitions to a semi-quantitative or quantitative risk assessment.
Automotive Safety Integrity Level (ASIL) is a risk classification scheme defined by the ISO 26262 - Functional Safety for Road Vehicles standard. This is an adaptation of the Safety Integrity Level (SIL) used in IEC 61508 for the automotive industry. This classification helps defining the safety requirements necessary to be in line with the ISO 26262 standard. The ASIL is established by performing a risk analysis of a potential hazard by looking at the Severity, Exposure and Controllability of the vehicle operating scenario. The safety goal for that hazard in turn carries the ASIL requirements.
A cyber PHA or cyber HAZOP is a safety-oriented methodology to conduct a cybersecurity risk assessment for an industrial control system (ICS) or safety instrumented system (SIS). It is a systematic, consequence-driven approach that is based upon industry standards such as ISA 62443-3-2, ISA TR84.00.09, ISO/IEC 27005:2018, ISO 31000:2009 and NIST Special Publication (SP) 800-39.
An occupational risk assessment is an evaluation of how much potential danger a hazard can have to a person in a workplace environment. The assessment takes into account possible scenarios in addition to the probability of their occurrence, and the results. The five types of hazards to be aware of are safety, chemicals, biological, physical, and ergonomic.