Absolute probability judgement

Last updated August 21, 2024

Absolute probability judgement is a technique used in the field of human reliability assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of 'fits/doesn't fit' in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. 'HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Absolute probability judgement, which is also known as direct numerical estimation,^[1] is based on the quantification of human error probabilities (HEPs). It is grounded on the premise that people cannot recall or are unable to estimate with certainty, the probability of a given event occurring. Expert judgement is typically desirable for utilisation in the technique when there is little or no data with which to calculate HEPs, or when the data is unsuitable or difficult to understand. In theory, qualitative knowledge built through the experts' experience can be translated into quantitative data such as HEPs.

Required of the experts is a good level of both substantive experience (i.e. the expert must have a suitable level of knowledge of the problem domain) and normative experience (i.e. it must be possible for the expert, perhaps with the aid of a facilitator, to translate this knowledge explicitly into probabilities). If experts possess the required substantive knowledge but lack knowledge which is normative in nature, the experts may be trained or assisted in ensuring that the knowledge and expertise requiring to be captured is translated into the correct probabilities i.e. to ensure that it is an accurate representation of the experts' judgements.

Background

Absolute probability judgement is an expert judgement-based approach which involves using the beliefs of experts (e.g. front-line staff, process engineers etc.) to estimate HEPs. There are two primary forms of the technique; Group Methods and Single Expert Methods i.e. it can be done either as a group or as an individual exercise. Group methods tend to be the more popular and widely used as they are more robust and are less subject to bias. Moreover, within the context of use, it is unusual for a single individual to possess all the required information and expertise to be able to solely estimate, in an accurate manner, the human reliability in question. In the group approach, the outcome of aggregating individual knowledge and opinions is more reliable.

Methodologies

There are 4 main group methods by which absolute probability judgement can be conducted.

Aggregated individual method

Utilising this method, experts make their estimates individually without actually meeting or discussing the task. The estimates are then aggregated by taking the geometric mean of the individual experts' estimates for each task. The major drawback to this method is that there is no shared expertise through the group; however, a positive of this is that due to the individuality of the process, any conflict such as dominating personalities or conflicting personalities is avoided and the results are therefore free of any bias.

Delphi method

Developed by Dalkey,^[2]^[3] the Delphi method is very similar to the Aggregated Individual Method in that experts make their initial estimates in isolation. However following this stage, the experts are then shown the outcome that all other participants have arrived at and are then able to re-consider the estimates which they initially made. The re-estimates are then aggregated using the geometric mean. This allows for some information sharing, whilst avoiding most group-led biases; however there still remains the problem of a lack of discussion.

Nominal group technique (NGT)

This technique takes the Delphi method and introduces limited discussion/consultation between the experts. By this means, information-sharing is superior, and group domination is mitigated by having the experts separately come to their own conclusion before aggregating the HEP scores.

Consensus group method

This is the most group-centred approach and requires that the group must come to a consensus on the HEP estimates through discussion and mutual agreement. This method maximises knowledge sharing and the exchange of ideas and also promotes equal opportunity to participate in discussion. However, it can also prove to be logistically awkward to co-ordinate as it requires that all experts be together in the same location in order for the discussion to take place. Due to this technicality, personalities and other biasing mechanisms such as overconfidence, recent availability and anchoring may become a factor, thus increasing the potential for the results to be skewed. If the circumstance arises in which there is a deadlock or breakdown in group dynamics, it then becomes necessary to revert to one of the other group absolute probability judgement methods.

Procedure

1. Select subject matter experts

The chosen experts must have a good working knowledge of the tasks which require to be assessed. The correct number of experts is dependent upon what seems most practicable, while considering any constraints such as spatial and financial availability. However, the larger the group the more likely problems are to arise.

2. Prepare task statement

Task statements are a necessary component of the method; tasks are specified in detail. The more fuller the explanation of the task within the statement, the less likely it will be that the experts will resort to making individual guesses about the tasks. The statement should also ensure that any assumptions are clearly stated in an interpretable format for all experts to understand. The optimal level of detail will be governed by the nature of the task under consideration and the required use of the final HEP estimation.

3. Prepare response booklet These booklets detail the task statement and design of the scale to use in assessing error probability and by which experts can indicate their judgements.^[1] The scale must be one which allows differences to be made apparent. The booklet also includes instructions, assumptions and sample items.

4. Develop instructions for subjects

Instructions are required to specify to the experts the reasons for the session, otherwise they may guess such reasons which may cause bias in the resultant estimates of human reliability.

5. Obtain judgements

Experts are required to reveal their judgements on each of the tasks; this can be done in a group or individually. If done by the former means, a facilitator is often used to prevent any bias and help overcome any problems.

6. Calculate inter-judge consistency

This is a method by which the differences in the HEP estimates of individual experts can be compared; a statistical formulation is used for such purposes.

7. Aggregate individual estimates

Where group consensus methods are not used, it is necessary to compute an aggregate for each of the individual estimates for each HEP.

8. Uncertainty bound estimation Calculated by using statistical approaches involving confidence ranges.

Worked example

Context

In this example, absolute probability judgement was utilised by Eurocontrol, at the experimental centre in Brétigny-sur-Orge Paris, using a group consensus methodology.

Required inputs

Each of the grades of staff included in the session took turns to provide estimates of the error probabilities, including ground staff, pilots and controllers. Prior to the beginning of the session, an introductory exercise was conducted to allow the participants to feel more comfortable with use of the technique; this involved an explanation to the background of the method and provided an overview of what the session would entail of. To increase familiarity of the method, exemplary templates were used to show how errors are estimated.

Method

Initial task statements of the project were created leaving space for individual opinion of task estimates and additional assumptions the group may have collectively foregone.
A session was held in which the individual scenarios and tasks were accurately detailed to the experts
Experts, with this knowledge, were then able to enter individual estimations for all tasks under consideration
Discussion followed in which all participants were provided with the opportunity to express their opinion to the rest of the group
Facilitation was then used in order to reach a group consensus on the estimate values. Further discussion and amendment took place when necessary.

During the duration of the session it was revealed that the ease with which the experts were able to arrive at a consensus was low with regards to the differing estimates of the various HEP values. Discussions often changed individuals' thinking e.g. in the light of new information or interpretations, but this did not ease reaching an agreement. Due to this difficulty, it was therefore necessary to aggregate the individual estimates in order to calculate a geometric mean of these. The following table displays a sample of the results obtained.

Table: Pilot absolute probability judgement Session–extract of results

Potential Error (Code in Risk Model)	Maximum	Minimum	Range	Geometric Mean
C1a	1.1E-03	2.0E-05	55	2.1E-04
C1b	2.5E-04	1.0E-05	25	3.5E-05
D1	1.0E-03	1.0E-04	10	4.3E-04
F1a	4.0E-04	1.0E-05	40	6.9E-05
F1b	1.0E-03	1.0E-04	10	4.0E-04
F1c	1.0E-03	1.0E-04	10	4.6E-04

In various cases, the range of figures separating the maximum and minimum values proved to be too large to allow to aggregated value to be accepted with confidence These values are the events in the risk model which require to be quantified. There are 3 primary errors in the model that may occur:

C1: Capturing false information about final approach path
D1: Failure to maintain a/c on final approach path
F1: Selecting wrong runway

There were various reasons which can explain the reasons why there was such a large difference in the estimates provided by the group: the group of experts was largely diverse and the experience of the individuals differed. Experience with Ground Based Augmentation System (GBAS) also showed differences. This process was a new experience for all of the experts participating in the process and there was only a single day, in which the session was taking place, to become familiar with its use and use it correctly. Of most significance was the fact that the detail of the assessments was very fine, which the staff were not used to. Experts also became confused about the way in which the assessment took place; errors were not considered on their own and were analysed as a group. This meant that the values estimated represented a contribution of the error on a system failure as opposed to a single contribution to system failure.

Results/outcomes

Controllers and pilots provided good estimates for the errors and these have been used in some safety cases
Participants highlighted their understanding of the importance of their participation in the process to provide expertise, as opposed to using external safety analysts instead i.e. their understood their role in carrying out a Human Reliability Assessment of the system
The experts were provided with a realistic representation of human performance within the system and therefore further safety requirements required to improve the safety and reduce the likelihood of the identified errors. This is particularly beneficial; for the future GBAS.

Lessons from the study

Time is required to familiarise with the methodology and to understand what is needed to be done in the given context
Experts are required to understand the circumstances in which HEPs are conditional
There is a need for true experts to be included in the process and in significant number to allow for the necessary information to be gathered.
The use of existing information in the process is always helpful for the purposes of standardisation

Advantages

The method is relatively quick and straightforward to employ. With a greater degree of group discussion in use of the technique, there is more qualitative data that is produced; this can be considered as a useful by-product of the assessment.^[1]
Absolute probability judgement is not restricted to or specialised for use in a particular field; it is easily applicable to an HRA on any industrial sector thus making it a generic technique for use in a wide range of potential applications.[5]
Useful suggestions may result from discussion as to ways in which a reduction in errors can be achieved^[4]

Disadvantages

Absolute probability judgement is prone to certain biases and group conflicts or problems. Selection of the correct group methodology or high-quality group facilitation may decrease the effect of these biases and increase the validity of the results.^[1]
Locating suitable experts for the absolute probability judgement exercise is a difficult stage of the process, more so due to the ambiguity with which the term 'expert' can be defined^[5]
Because there may be little or no empirical and/or quantitative reasoning underpinning the experts' estimates, it is difficult to be certain of the validity of the final HEPs i.e. there is no means by which guesses can be validated^[1]

Related Research Articles

In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census. A sample refers to a group or section of a population from which information is to be obtained.

In statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample of individuals from within a statistical population to estimate characteristics of the whole population. The subset is meant to reflect the whole population and statisticians attempt to collect samples that are representative of the population. Sampling has lower costs and faster data collection compared to recording data from the entire population, and thus, it can provide insights in cases where it is infeasible to measure an entire population.

In the field of human factors and ergonomics, human reliability is the probability that a human performs a task to a sufficient standard. Reliability of humans can be affected by many factors such as age, physical health, mental state, attitude, emotions, personal propensity for certain mistakes, and cognitive biases.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability is defined as the probability that a product, system, or service will perform its intended function adequately for a specified period of time, OR will operate in a defined environment without failure. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

In statistics, inter-rater reliability is the degree of agreement among independent observers who rate, code, or assess the same phenomenon.

Calibrated probability assessments are subjective probabilities assigned by individuals who have been trained to assess probabilities in a way that historically represents their uncertainty. For example, when a person has calibrated a situation and says they are "80% confident" in each of 100 predictions they made, they will get about 80% of them correct. Likewise, they will be right 90% of the time they say they are 90% certain, and so on.

The wisdom of the crowd is the collective opinion of a diverse and independent group of individuals rather than that of a single expert. This process, while not new to the Information Age, has been pushed into the mainstream spotlight by social information sites such as Quora, Reddit, Stack Exchange, Wikipedia, Yahoo! Answers, and other web resources which rely on collective human knowledge. An explanation for this phenomenon is that there is idiosyncratic noise associated with each individual judgment, and taking the average over a large number of responses will go some way toward canceling the effect of this noise.

Group decision-making is a situation faced when individuals collectively make a choice from the alternatives before them. The decision is then no longer attributable to any single individual who is a member of the group. This is because all the individuals and social group processes such as social influence contribute to the outcome. The decisions made by groups are often different from those made by individuals. In workplace settings, collaborative decision-making is one of the most successful models to generate buy-in from other stakeholders, build consensus, and encourage creativity. According to the idea of synergy, decisions made collectively also tend to be more effective than decisions made by a single individual. In this vein, certain collaborative arrangements have the potential to generate better net performance outcomes than individuals acting on their own. Under normal everyday conditions, collaborative or group decision-making would often be preferred and would generate more benefits than individual decision-making when there is the time for proper deliberation, discussion, and dialogue. This can be achieved through the use of committee, teams, groups, partnerships, or other collaborative social processes.

Human Cognitive Reliability Correlation (HCR) is a technique used in the field of Human Reliability Assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Tecnica Empirica Stima Errori Operatori (TESEO) is a technique in the field of Human reliability Assessment (HRA), that evaluates the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. ‘HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

The Technique for human error-rate prediction (THERP) is a technique that is used in the field of Human Reliability Assessment (HRA) to evaluate the probability of human error occurring throughout the completion of a task. From such an analysis, some corrective measures could be taken to reduce the likelihood of errors occurring within a system. The overall goal of THERP is to apply and document probabilistic methodological analyses to increase safety during a given process. THERP is used in fields such as error identification, error quantification and error reduction.

Human error assessment and reduction technique (HEART) is a technique used in the field of human reliability assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA: error identification, error quantification, and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications: first-generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of 'fits/doesn't fit' in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. HRA techniques have been used in a range of industries including healthcare, engineering, nuclear, transportation, and business sectors. Each technique has varying uses within different disciplines.

Success Likelihood Index Method (SLIM) is a technique used in the field of Human reliability Assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. ‘HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Influence Diagrams Approach (IDA) is a technique used in the field of Human reliability Assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. ‘HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

A Technique for Human Event Analysis (ATHEANA) is a technique used in the field of human reliability assessment (HRA). The purpose of ATHEANA is to evaluate the probability of human error while performing a specific task. From such analyses, preventative measures can then be taken to reduce human errors within a system and therefore lead to improvements in the overall level of safety.

In simple terms, risk is the possibility of something bad happening. Risk involves uncertainty about the effects/implications of an activity with respect to something that humans value, often focusing on negative, undesirable consequences. Many different definitions have been proposed. One international standard definition of risk is the "effect of uncertainty on objectives".

Risk management tools allow the uncertainty to be addressed by identifying and generating metrics, parameterizing, prioritizing, and developing responses, and tracking risk. These activities may be difficult to track without tools and techniques, documentation and information systems.

Cultural consensus theory is an approach to information pooling which supports a framework for the measurement and evaluation of beliefs as cultural; shared to some extent by a group of individuals. Cultural consensus models guide the aggregation of responses from individuals to estimate (1) the culturally appropriate answers to a series of related questions and (2) individual competence in answering those questions. The theory is applicable when there is sufficient agreement across people to assume that a single set of answers exists. The agreement between pairs of individuals is used to estimate individual cultural competence. Answers are estimated by weighting responses of individuals by their competence and then combining responses.

The curse of knowledge is a cognitive bias that occurs when an individual, who is communicating with others, assumes that others have information that is only available to themselves, assuming they all share a background and understanding. This bias is also called by some authors the curse of expertise.

Human factors are the physical or cognitive properties of individuals, or social behavior which is specific to humans, and which influence functioning of technological systems as well as human-environment equilibria. The safety of underwater diving operations can be improved by reducing the frequency of human error and the consequences when it does occur. Human error can be defined as an individual's deviation from acceptable or desirable practice which culminates in undesirable or unexpected results. Human factors include both the non-technical skills that enhance safety and the non-technical factors that contribute to undesirable incidents that put the diver at risk.

[Safety is] An active, adaptive process which involves making sense of the task in the context of the environment to successfully achieve explicit and implied goals, with the expectation that no harm or damage will occur. – G. Lock, 2022

Dive safety is primarily a function of four factors: the environment, equipment, individual diver performance and dive team performance. The water is a harsh and alien environment which can impose severe physical and psychological stress on a diver. The remaining factors must be controlled and coordinated so the diver can overcome the stresses imposed by the underwater environment and work safely. Diving equipment is crucial because it provides life support to the diver, but the majority of dive accidents are caused by individual diver panic and an associated degradation of the individual diver's performance. – M.A. Blumenberg, 1996

References

1 2 3 4 5 Humphreys, P., (1995) Human Reliability Assessor's Guide. Human Factors in Reliability Group.
↑ Dalkey, N. & Helmer, O. (1963) An experimental application of the Delphi method to the use of experts. Management Science. 9(3) 458-467.
↑ Linstone, H.A. & Turoff, M. (1978) The Delphi Method: Techniques and Applications. Addison-Wesley, London.
↑ Kirwan, Practical Guide to Human Reliability Assessment, CPC Press, 1994
↑ 2004. Eurocontrol Experimental Centre; Review of Techniques to Support the EATMP Safety Assessment Methodology. EuroControl, Vol 1

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[humphreys-1] 1 2 3 4 5 Humphreys, P., (1995) Human Reliability Assessor's Guide. Human Factors in Reliability Group.

[dalkey-2] Dalkey, N. & Helmer, O. (1963) An experimental application of the Delphi method to the use of experts. Management Science. 9(3) 458-467.

[linstone-3] Linstone, H.A. & Turoff, M. (1978) The Delphi Method: Techniques and Applications. Addison-Wesley, London.

[kirwan-4] Kirwan, Practical Guide to Human Reliability Assessment, CPC Press, 1994

[EATMP-5] 2004. Eurocontrol Experimental Centre; Review of Techniques to Support the EATMP Safety Assessment Methodology. EuroControl, Vol 1

[1]

[2]

[3]

[4]

[5]