Absolute probability judgement

Last updated

Absolute probability judgement is a technique used in the field of human reliability assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of 'fits/doesn't fit' in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. 'HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Human reliability is related to the field of human factors and ergonomics, and refers to the reliability of humans in fields including manufacturing, medicine and nuclear power. Human performance can be affected by many factors such as age, state of mind, physical health, attitude, emotions, propensity for certain common mistakes, errors and cognitive biases, etc.

Human error has been cited as a primary cause contributing factor in disasters and accidents in industries as diverse as nuclear power, aviation, space exploration, and medicine. Prevention of human error is generally seen as a major contributor to reliability and safety of (complex) systems. Human error is one of the many contributing causes of risk events.

Engineering applied science

Engineering is the application of knowledge in the form of science, mathematics, and empirical evidence, to the innovation, design, construction, operation and maintenance of structures, machines, materials, devices, systems, processes, and organizations. The discipline of engineering encompasses a broad range of more specialized fields of engineering, each with a more specific emphasis on particular areas of applied mathematics, applied science, and types of application. See glossary of engineering.

Contents

Absolute probability judgement, which is also known as direct numerical estimation, [1] is based on the quantification of human error probabilities (HEPs). It is grounded on the premise that people cannot recall or are unable to estimate with certainty, the probability of a given event occurring. Expert judgement is typically desirable for utilisation in the technique when there is little or no data with which to calculate HEPs, or when the data is unsuitable or difficult to understand. In theory, qualitative knowledge built through the experts' experience can be translated into quantitative data such as HEPs.

In mathematics and empirical science, quantification is the act of counting and measuring that maps human sense observations and experiences into quantities. Quantification in this sense is fundamental to the scientific method.

Required of the experts is a good level of both substantive experience (i.e. the expert must have a suitable level of knowledge of the problem domain) and normative experience (i.e. it must be possible for the expert, perhaps with the aid of a facilitator, to translate this knowledge explicitly into probabilities). If experts possess the required substantive knowledge but lack knowledge which is normative in nature, the experts may be trained or assisted in ensuring that the knowledge and expertise requiring to be captured is translated into the correct probabilities i.e. to ensure that it is an accurate representation of the experts' judgements.

A facilitator is someone who engages in facilitation—any activity that makes a social process easy or easier. A facilitator often helps a group of people to understand their common objectives and assists them to plan how to achieve these objectives; in doing so, the facilitator remains "neutral", meaning he/she does not take a particular position in the discussion. Some facilitator tools will try to assist the group in achieving a consensus on any disagreements that preexist or emerge in the meeting so that it has a strong basis for future action.

Background

Absolute probability judgement is an expert judgement-based approach which involves using the beliefs of experts (e.g. front-line staff, process engineers etc.) to estimate HEPs. There are two primary forms of the technique; Group Methods and Single Expert Methods i.e. it can be done either as a group or as an individual exercise. Group methods tend to be the more popular and widely used as they are more robust and are less subject to bias. Moreover, within the context of use, it is unusual for a single individual to possess all the required information and expertise to be able to solely estimate, in an accurate manner, the human reliability in question. In the group approach, the outcome of aggregating individual knowledge and opinions is more reliable.

Probability is the measure of the likelihood that an event will occur. See glossary of probability and statistics. Probability quantifies as a number between 0 and 1, where, loosely speaking, 0 indicates impossibility and 1 indicates certainty. The higher the probability of an event, the more likely it is that the event will occur. A simple example is the tossing of a fair (unbiased) coin. Since the coin is fair, the two outcomes are both equally probable; the probability of "heads" equals the probability of "tails"; and since no other outcomes are possible, the probability of either "heads" or "tails" is 1/2.

Bias is disproportionate weight in favor of or against one thing, person, or group compared with another, usually in a way considered to be unfair.

Methodologies

There are 4 main group methods by which absolute probability judgement can be conducted.

Aggregated individual method

Utilising this method, experts make their estimates individually without actually meeting or discussing the task. The estimates are then aggregated by taking the geometric mean of the individual experts' estimates for each task. The major drawback to this method is that there is no shared expertise through the group; however, a positive of this is that due to the individuality of the process, any conflict such as dominating personalities or conflicting personalities is avoided and the results are therefore free of any bias.

Delphi method

Developed by Dalkey, [2] [3] the Delphi method is very similar to the Aggregated Individual Method in that experts make their initial estimates in isolation. However following this stage, the experts are then shown the outcome that all other participants have arrived at and are then able to re-consider the estimates which they initially made. The re-estimates are then aggregated using the geometric mean. This allows for some information sharing, whilst avoiding most group-led biases; however there still remains the problem of a lack of discussion.

The Delphi method is a structured communication technique or method, originally developed as a systematic, interactive forecasting method which relies on a panel of experts. The technique can also be adapted for use in face-to-face meetings, and is then called mini-Delphi or Estimate-Talk-Estimate (ETE). Delphi has been widely used for business forecasting and has certain advantages over another structured forecasting approach, prediction markets.

Geometric mean type of mean or average

In mathematics, the geometric mean is a mean or average, which indicates the central tendency or typical value of a set of numbers by using the product of their values. The geometric mean is defined as the nth root of the product of n numbers, i.e., for a set of numbers x1, x2, ..., xn, the geometric mean is defined as

Nominal group technique (NGT)

This technique takes the Delphi method and introduces limited discussion/consultation between the experts. By this means, information-sharing is superior, and group domination is mitigated by having the experts separately come to their own conclusion before aggregating the HEP scores.

Consensus group method

This is the most group-centred approach and requires that the group must come to a consensus on the HEP estimates through discussion and mutual agreement. This method maximises knowledge sharing and the exchange of ideas and also promotes equal opportunity to participate in discussion. However, it can also prove to be logistically awkward to co-ordinate as it requires that all experts be together in the same location in order for the discussion to take place. Due to this technicality, personalities and other biasing mechanisms such as overconfidence, recent availability and anchoring may become a factor, thus increasing the potential for the results to be skewed. If the circumstance arises in which there is a deadlock or breakdown in group dynamics, it then becomes necessary to revert to one of the other group absolute probability judgement methods.

Knowledge sharing is an activity through which knowledge is exchanged among people, friends, families, communities, or organizations.

Deadlock state in which members are blocking each other

In concurrent computing, a deadlock is a state in which each member of a group is waiting for another member, including itself, to take action, such as sending a message or more commonly releasing a lock. Deadlock is a common problem in multiprocessing systems, parallel computing, and distributed systems, where software and hardware locks are used to arbitrate shared resources and implement process synchronization.

Group dynamics is a system of behaviors and psychological processes occurring within a social group, or between social groups. The study of group dynamics can be useful in understanding decision-making behaviour, tracking the spread of diseases in society, creating effective therapy techniques, and following the emergence and popularity of new ideas and technologies. Group dynamics are at the core of understanding racism, sexism, and other forms of social prejudice and discrimination. These applications of the field are studied in psychology, sociology, anthropology, political science, epidemiology, education, social work, business, and communication studies.

Procedure

1. Select subject matter experts

The chosen experts must have a good working knowledge of the tasks which require to be assessed. The correct number of experts is dependent upon what seems most practicable, while considering any constraints such as spatial and financial availability. However, the larger the group the more likely problems are to arise.

2. Prepare task statement

Task statements are a necessary component of the method; tasks are specified in detail. The more fuller the explanation of the task within the statement, the less likely it will be that the experts will resort to making individual guesses about the tasks. The statement should also ensure that any assumptions are clearly stated in an interpretable format for all experts to understand. The optimal level of detail will be governed by the nature of the task under consideration and the required use of the final HEP estimation.

3. Prepare response booklet These booklets detail the task statement and design of the scale to use in assessing error probability and by which experts can indicate their judgements. [1] The scale must be one which allows differences to be made apparent. The booklet also includes instructions, assumptions and sample items.

4. Develop instructions for subjects

Instructions are required to specify to the experts the reasons for the session, otherwise they may guess such reasons which may cause bias in the resultant estimates of human reliability.

5. Obtain judgements

Experts are required to reveal their judgements on each of the tasks; this can be done in a group or individually. If done by the former means, a facilitator is often used to prevent any bias and help overcome any problems.

6. Calculate inter-judge consistency

This is a method by which the differences in the HEP estimates of individual experts can be compared; a statistical formulation is used for such purposes.

7. Aggregate individual estimates

Where group consensus methods are not used, it is necessary to compute an aggregate for each of the individual estimates for each HEP.

8. Uncertainty bound estimation Calculated by using statistical approaches involving confidence ranges.

Worked example

Context

In this example, absolute probability judgement was utilised by Eurocontrol, at the experimental centre in Brétigny-sur-Orge Paris, using a group consensus methodology.

Required inputs

Each of the grades of staff included in the session took turns to provide estimates of the error probabilities, including ground staff, pilots and controllers. Prior to the beginning of the session, an introductory exercise was conducted to allow the participants to feel more comfortable with use of the technique; this involved an explanation to the background of the method and provided an overview of what the session would entail of. To increase familiarity of the method, exemplary templates were used to show how errors are estimated.

Method

During the duration of the session it was revealed that the ease with which the experts were able to arrive at a consensus was low with regards to the differing estimates of the various HEP values. Discussions often changed individuals' thinking e.g. in the light of new information or interpretations, but this did not ease reaching an agreement. Due to this difficulty, it was therefore necessary to aggregate the individual estimates in order to calculate a geometric mean of these. The following table displays a sample of the results obtained.

Table: Pilot absolute probability judgement Session–extract of results

Potential Error (Code in Risk Model) Maximum Minimum Range Geometric Mean
C1a 1.1E-03 2.0E-05 55 2.1E-04
C1b 2.5E-04 1.0E-05 25 3.5E-05
D1 1.0E-03 1.0E-04 10 4.3E-04
F1a 4.0E-04 1.0E-05 40 6.9E-05
F1b 1.0E-03 1.0E-04 10 4.0E-04
F1c 1.0E-03 1.0E-04 10 4.6E-04

In various cases, the range of figures separating the maximum and minimum values proved to be too large to allow to aggregated value to be accepted with confidence These values are the events in the risk model which require to be quantified. There are 3 primary errors in the model that may occur:

There were various reasons which can explain the reasons why there was such a large difference in the estimates provided by the group: the group of experts was largely diverse and the experience of the individuals differed. Experience with Ground Based Augmentation System (GBAS) also showed differences. This process was a new experience for all of the experts participating in the process and there was only a single day, in which the session was taking place, to become familiar with its use and use it correctly. Of most significance was the fact that the detail of the assessments was very fine, which the staff were not used to. Experts also became confused about the way in which the assessment took place; errors were not considered on their own and were analysed as a group. This meant that the values estimated represented a contribution of the error on a system failure as opposed to a single contribution to system failure.

Results/outcomes

Lessons from the study

Advantages

Disadvantages

Related Research Articles

Statistics study of the collection, organization, analysis, interpretation, and presentation of data

Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation. In applying statistics to, for example, a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied. Populations can be diverse topics such as "all people living in a country" or "every atom composing a crystal". Statistics deals with all aspects of data, including the planning of data collection in terms of the design of surveys and experiments. See glossary of probability and statistics.

In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census. A sample refers to a group or section of a population from which information is to be obtained

Sampling (statistics) selection of data points in statistics

In statistics, quality assurance, and survey methodology, sampling is the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt for the samples to represent the population in question. Two advantages of sampling are lower cost and faster data collection than measuring the entire population.

Reliability engineering is a sub-discipline of systems engineering that emphasizes dependability in the lifecycle management of a product. Dependability, or reliability, describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Futures techniques used in the multi-disciplinary field of futures studies by futurists in Americas and Australasia, and futurology by futurologists in EU, include a diverse range of forecasting methods, including anticipatory thinking, backcasting, simulation, and visioning. Some of the anticipatory methods include, the delphi method, causal layered analysis, environmental scanning, morphological analysis, and scenario planning.

In statistics, inter-rater reliability is the degree of agreement among raters. It is a score of how much homogeneity, or consensus, there is in the ratings given by various judges. In contrast, intra-rater reliability is a score of the consistency in ratings given by the same person across multiple instances. Inter-rater and intra-rater reliability are aspects of test validity. Assessments of them are useful in refining the tools given to human judges, for example by determining if a particular scale is appropriate for measuring a particular variable. If various raters do not agree, either the scale is defective or the raters need to be re-trained.

Calibrated probability assessments are subjective probabilities assigned by individuals who have been trained to assess probabilities in a way that historically represents their uncertainty. For example, when a person has calibrated a situation and says they are "80% confident" in each of 100 predictions they made, they will get about 80% of them correct. Likewise, they will be right 90% of the time they say they are 90% certain, and so on.

Human Cognitive Reliability Correlation (HCR) is a technique used in the field of Human reliability Assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Tecnica Empirica Stima Errori Operatori (TESEO) is a technique in the field of Human reliability Assessment (HRA), that evaluates the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. ‘HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

The technique for human error-rate prediction (THERP) is a technique used in the field of human reliability assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA: error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications: first-generation techniques and second-generation techniques. First-generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in matching an error situation in context with related error identification and quantification. Second generation techniques are more theory-based in their assessment and quantification of errors. ‘HRA techniques have been utilised for various applications in a range of disciplines and industries including healthcare, engineering, nuclear, transportation and business.

Human error assessment and reduction technique (HEART) is a technique used in the field of human reliability assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of 'fits/doesn't fit' in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Success Likelihood Index Method (SLIM) is a technique used in the field of Human reliability Assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. ‘HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

Influence Diagrams Approach (IDA) is a technique used in the field of Human reliability Assessment (HRA), for the purposes of evaluating the probability of a human error occurring throughout the completion of a specific task. From such analyses measures can then be taken to reduce the likelihood of errors occurring within a system and therefore lead to an improvement in the overall levels of safety. There exist three primary reasons for conducting an HRA; error identification, error quantification and error reduction. As there exist a number of techniques used for such purposes, they can be split into one of two classifications; first generation techniques and second generation techniques. First generation techniques work on the basis of the simple dichotomy of ‘fits/doesn’t fit’ in the matching of the error situation in context with related error identification and quantification and second generation techniques are more theory based in their assessment and quantification of errors. ‘HRA techniques have been utilised in a range of industries including healthcare, engineering, nuclear, transportation and business sector; each technique has varying uses within different disciplines.

A Technique for Human Event Analysis (ATHEANA) is a technique used in the field of human reliability assessment (HRA). The purpose of ATHEANA is to evaluate the probability of human error while performing a specific task. From such analyses, preventative measures can then be taken to reduce human errors within a system and therefore lead to improvements in the overall level of safety.

Fraud is a billion-dollar business and it is increasing every year. The PwC global economic crime survey of 2018 found that half of the 7,200 companies they surveyed had experienced fraud of some kind. This is an increase from the PwC 2016 study in which slightly more than a third of organizations surveyed (36%) had experienced economic crime.

Risk management tools allow uncertainty to be addressed by identifying and generating metrics, parameterizing, prioritizing, and developing responses, and tracking risk. These activities may be difficult to track without tools and techniques, documentation and information systems.

Cultural consensus theory is an approach to information pooling which supports a framework for the measurement and evaluation of beliefs as cultural; shared to some extent by a group of individuals. Cultural consensus models guide the aggregation of responses from individuals to estimate (1) the culturally appropriate answers to a series of related questions and (2) individual competence in answering those questions. The theory is applicable when there is sufficient agreement across people to assume that a single set of answers exists. The agreement between pairs of individuals is used to estimate individual cultural competence. Answers are estimated by weighting responses of individuals by their competence and then combining responses.

The curse of expertise is a psychological concept where the intervention of experts may be counterproductive for learners acquiring new skills.

References

  1. 1 2 3 4 5 Humphreys, P., (1995) Human Reliability Assessor's Guide. Human Factors in Reliability Group.
  2. Dalkey, N. & Helmer, O. (1963) An experimental application of the Delphi method to the use of experts. Management Science. 9(3) 458-467.
  3. Linstone, H.A. & Turoff, M. (1978) The Delphi Method: Techniques and Applications. Addison-Wesley, London.
  4. Kirwan, Practical Guide to Human Reliability Assessment, CPC Press, 1994
  5. 2004. Eurocontrol Experimental Centre; Review of Techniques to Support the EATMP Safety Assessment Methodology. EuroControl, Vol 1