Resilience engineering

Last updated

Resilience engineering is a subfield of safety science research that focuses on understanding how complex adaptive systems cope when encountering a surprise. The term resilience in this context refers to the capabilities that a system must possess in order to deal effectively with unanticipated events. Resilience engineering examines how systems build, sustain, degrade, and lose these capabilities. [1]

Contents

Resilience engineering researchers have studied multiple safety-critical domains, including aviation, anesthesia, fire safety, space mission control, military operations, power plants, air traffic control, rail engineering, health care, and emergency response to both natural and industrial disasters. [1] [2] [3] Resilience engineering researchers have also studied the non-safety-critical domain of software operations. [4]

Whereas other approaches to safety (e.g., behavior-based safety, probabilistic risk assessment) focus on designing controls to prevent or mitigate specific known hazards (e.g., hazard analysis), or on assuring that a particular system is safe (e.g., safety cases), resilience engineering looks at a more general capability of systems to deal with hazards that were not previously known before they were encountered.

In particular, resilience engineering researchers study how people are able to cope effectively with complexity to ensure safe system operation, especially when they are experiencing time pressure. [5] Under the resilience engineering paradigm, accidents are not attributable to human error. Instead, the assumption is that humans working in a system are always faced with goal conflicts, and limited resources, requiring them to constantly make trade-offs while under time pressure. When failures happen, they are understood as being due to the system temporarily being unable to cope with complexity. [6] Hence, resilience engineering is related to other perspectives in safety that have reassessed the nature of human error, such as the "new look", [7] the "new view", [8] "safety differently", [9] and Safety-II. [10]

Resilience engineering researchers ask questions such as:

Because incidents often involve unforeseen challenges, resilience engineering researchers often use incident analysis as a research method. [3] [2]

Resilience engineering symposia

The first symposium on resilience engineering was held in October 2004 in Soderkoping, Sweden. [5] It brought together fourteen safety science researchers with an interest in complex systems. [11]

A second symposium on resilience engineering was held in November 2006 in Sophia Antipolis, France. [12] The symposium had eighty participants. [13] The Resilience Engineering Association, an association of researchers and practitioners with an interest in resilience engineering, continues to hold bi-annual symposia. [14]

These symposia led to a series of books being published (see Books section below).

Themes

This section discusses aspects of the resilience engineering perspective that are different from traditional approaches to safety.

Normal work leads to both success and failure

The resilience engineering perspective assumes that the nature of work which people do within a system that contributes to an accident is fundamentally the same as the work that people do that contributes to successful outcomes. As a consequence, if work practices are only examined after an accident and are only interpreted in the context of the accident, the result of this analysis is subject to selection bias. [11]

Fundamental surprise

The resilience engineering perspective posits that a significant number of failure modes are literally inconceivable in advance of them happening, because the environment that systems operate in are very dynamic and the perspectives of the people within the system are always inherently limited. [11] These sorts of events are sometimes referred to as fundamental surprise. Contrast this with the approach of probabilistic risk assessment which focuses on evaluate conceivable risks.

Human performance variability as an asset

The resilience engineering perspective holds that human performance variability has positive effects as well as negative ones, and that safety is increased by amplifying the positive effects of human variability as well as adding controls to mitigate the negative effects. For example, the ability of humans to adapt their behavior based on novel circumstances is a positive effect that creates safety. [11] As a consequence, adding controls to mitigate the effects of human variability can reduce safety in certain circumstances [15]

The centrality of expertise and experience

Expert operators are an important source of resilience inside of systems. These operators become experts through previous experience at dealing with failures. [11] [16]

Risk is unavoidable

Under the resilience engineering perspective, the operators are always required to trade-off risks. As a consequence, in order to create safety, it is sometimes necessary for a system to take on some risk. [11]

Bringing existing resilience to bear vs generating new resilience

The researcher Richard Cook distinguishes two separate kinds of work that tend to be conflated under the heading resilience engineering: [17]

Bringing existing resilience to bear

The first type of resilience engineering work is determining how to best take advantage of the resilience that is already present in the system. Cook uses the example of setting a broken bone as this type of work: the resilience is already present in the physiology of bone, and setting the bone uses this resilience to achieving better healing outcomes.

Cook notes that this first type of resilience work does not require a deep understanding of the underlying mechanisms of resilience: humans have been setting bones long before the mechanism by which bone heals was understood.

Generating new resilience

The second type of resilience engineering work involves altering mechanisms in the system in order to increase the amount of the resilience. Cook uses the example of new drugs such as Abaloparatide and Teriparatide, which mimic Parathyroid hormone-related protein and are used to treat osteoporosis.

Cook notes that this second type of resilience work requires a much deeper understanding of the underlying existing resilience mechanisms in order to create interventions that can effectively increase resilience.

Hollnagel perspective

The safety researcher Erik Hollnagel views resilient performance as requiring four systemic potentials: [18]

  1. The potential to respond
  2. The potential to monitor
  3. The potential to learn
  4. The potential to anticipate.

This has been described in a White Paper from Eurocontrol on Systemic Potentials Management https://skybrary.aero/bookshelf/systemic-potentials-management-building-basis-resilient-performance

Woods perspective

The safety researcher David Woods considers the following two concepts in his definition of resilience: [19]

These two concepts are elaborated in Woods's theory of graceful extensibility.

Woods contrasts resilience with robustness, which is the ability of a system to deal effectively with potential challenges that were anticipated in advance.

The safety researcher Richard Cook argued that bone should serve as the archetype for understanding what resilience is in the Woods perspective. [17] Cook notes that bone has both graceful extensibility (has a soft boundary at which it can extend function) and sustained adaptability (bone is constantly adapting through a dynamic balance between creation and destruction that is directed by mechanical strain).

In Woods's view, there are three common patterns to the failure of complex adaptive systems: [20]

  1. decompensation: exhaustion of capacity when encountering a disturbance
  2. working at cross purposes: when individual agents in a system behave in a way that achieves local goals but goes against global goals
  3. getting stuck in outdated behaviors: relying on strategies that were previously adaptive but are no longer so due to changes in the environment

Resilient Health care

In 2012 the growing interest for resilience engineering gave rise to the sub-field of Resilient Health Care. This led to a series of annual conferences on the topic that are still ongoing as well as a series of books, on Resilient Health Care, and in 2022 to the establishment of the Resilient Health Care Society (registered in Sweden). (https://rhcs.se/)

Books

Related Research Articles

<span class="mw-page-title-main">Biomechanics</span> Study of the mechanics of biological systems

Biomechanics is the study of the structure, function and motion of the mechanical aspects of biological systems, at any level from whole organisms to organs, cells and cell organelles, using the methods of mechanics. Biomechanics is a branch of biophysics.

In the field of human factors and ergonomics, human reliability is the probability that a human performs a task to a sufficient standard. Reliability of humans can be affected by many factors such as age, physical health, mental state, attitude, emotions, personal propensity for certain mistakes, and cognitive biases.

Psychological resilience is the ability to cope mentally and emotionally with a crisis, or to return to pre-crisis status quickly.

Human error is an action that has been done but that was "not intended by the actor; not desired by a set of rules or an external observer; or that led the task or system outside its acceptable limits". Human error has been cited as a primary cause and contributing factor in disasters and accidents in industries as diverse as nuclear power, aviation, space exploration, and medicine. Prevention of human error is generally seen as a major contributor to reliability and safety of (complex) systems. Human error is one of the many contributing causes of risk events.

Herbert William Heinrich was an American industrial safety pioneer from the 1930s.

<span class="mw-page-title-main">Nancy Leveson</span> American computer scientist

Nancy G. Leveson is an American specialist in system and software safety and a professor of Aeronautics and Astronautics at Massachusetts Institute of Technology (MIT), United States.

A resilient control system is one that maintains state awareness and an accepted level of operational normalcy in response to disturbances, including threats of an unexpected and malicious nature".

<span class="mw-page-title-main">Accident</span> Unforeseen event, often with a negative outcome

An accident is an unintended, normally unwanted event that was not directly caused by humans. The term accident implies that nobody should be blamed, but the event may have been caused by unrecognized or unaddressed risks. Most researchers who study unintentional injury avoid using the term accident and focus on factors that increase risk of severe injury and that reduce injury incidence and severity. For example, when a tree falls down during a wind storm, its fall may not have been caused by humans, but the tree's type, size, health, location, or improper maintenance may have contributed to the result. Most car wrecks are not true accidents; however, English speakers started using that word in the mid-20th century as a result of media manipulation by the US automobile industry.

<span class="mw-page-title-main">Jeffrey Braithwaite</span> Australian academic

Jeffrey Braithwaite BA [UNE], DipIR, MIR [Syd], MBA [Macq], PhD [UNSW], FIML, FACHSM, FAAHMS, FFPHRCP [UK], FAcSS [UK], Hon FRACMA is an Australian professor, health services and systems researcher, writer and commentator, with an international profile and affiliations. He is Founding Director of the Australian Institute of Health Innovation at Macquarie University, Sydney, Australia; Director of the Centre for Healthcare Resilience and Implementation Science, Australian Institute of Health Innovation; Professor of Health Systems Research, Macquarie University. His is Immediate Past President of the International Society for Quality in Healthcare.

The term use error has recently been introduced to replace the commonly used terms human error and user error. The new term, which has already been adopted by international standards organizations for medical devices, suggests that accidents should be attributed to the circumstances, rather than to the human beings who happened to be there.

An important part of the heritage of family resilience is the concept of individual psychological resilience which originates from work with children focusing on what helped them become resilient in the face of adversity. Individual resilience emerged primarily in the field of developmental psychopathology as scholars sought to identify the characteristics of children that allowed them to function "OK" after adversity. Individual resilience gradually moved into understanding the processes associated with overcoming adversity, then into prevention and intervention and now focuses on examining how factors at multiple levels of the system and using interdisciplinary approaches promote resilience. Resilience also has origins to the field of positive psychology. The term resilience gradually changed definitions and meanings, from a personality trait to a dynamic process of families, individuals, and communities.

<span class="mw-page-title-main">Resilience (engineering and construction)</span> Infrastructure design able to absorb damage without suffering complete failure

In the fields of engineering and construction, resilience is the ability to absorb or avoid damage without suffering complete failure and is an objective of design, maintenance and restoration for buildings and infrastructure, as well as communities. A more comprehensive definition is that it is the ability to respond, absorb, and adapt to, as well as recover in a disruptive event. A resilient structure/system/community is expected to be able to resist to an extreme event with minimal damages and functionality disruptions during the event; after the event, it should be able to rapidly recovery its functionality similar to or even better than the pre-event level.

<span class="mw-page-title-main">Wolfgang Kröger</span>

Wolfgang Kröger has been full professor of Safety Technology at the ETH Zurich since 1990 and director of the Laboratory of Safety Analysis simultaneously. Before being elected Founding Rector of International Risk Governance Council (IRGC) in 2003, he headed research in nuclear energy and safety at the Paul Scherrer Institut (PSI). After his retirement early 2011 he became the Executive Director of the newly established ETH Risk Center. He has both Swiss and German citizenship and lives in Kilchberg, Zürich. His seminal work lies in the general area of reliability, risk and vulnerability analysis of large-scale technical systems, initially single complicated systems like nuclear power plants of different types and finally complex engineered networks like power supply systems, the latter coupled to other critical infrastructure and controlled by cyber-physical systems. He is known for his continuing efforts to advance related frameworks, methodology, and tools, to communicate results including uncertainties as well as for his successful endeavor in stimulating trans-boundary cooperation to improve governance of emerging systemic risks. His contributions to shape and operationalize the concept of sustainability and - more recently - the concept of resilience are highly valued. Furthermore, he is in engaged in the evaluation of smart clean, secure, and affordable energy systems and future technologies, including new ways of exploiting nuclear energy. The development and certification of cooperative automated vehicles, regarded as a cornerstone of future mobility concepts, are matter of growing interest.

Climate resilience is a concept to describe how well people or ecosystems are prepared to bounce back from certain climate hazard events. The formal definition of the term is the "capacity of social, economic and ecosystems to cope with a hazardous event or trend or disturbance". For example, climate resilience can be the ability to recover from climate-related shocks such as floods and droughts. Different actions can increase climate resilience of communities and ecosystems to help them cope. They can help to keep systems working in the face of external forces. For example, building a seawall to protect a coastal community from flooding might help maintain existing ways of life there.

Supply chain resilience is "the capacity of a supply chain to persist, adapt, or transform in the face of change".

David D. Woods is an American safety systems researcher who studies human coordination and automation issues in a wide range safety-critical fields such as nuclear power, aviation, space operations, critical care medicine, and software services. He is one of the founding researchers of the fields of cognitive systems engineering and resilience engineering.

Dr. Richard I. Cook was a system safety researcher, physician, anesthesiologist, university professor, and software engineer. Cook did research in safety, incident analysis, cognitive systems engineering, and resilience engineering across a number of fields, including critical care medicine, aviation, air traffic control, space operations, semiconductor manufacturing, and software services.

Cognitive systems engineering (CSE) is a field of study that examines the intersection of people, work, and technology, with a focus on safety-critical systems. The central tenet of cognitive systems engineering is that it views a collection of people and technology as a single unit that is capable of cognitive work, which is called a joint cognitive system.

Emily S. Patterson is an American ergonomist and academic. She is a professor in the Ohio State University College of Medicine.

Resilience week is an annual symposium established to enable cross-disciplinary and role based discussions to advance strategies and research that engenders resilience in critical infrastructure systems and communities. Damaging storms, cyber attack and the interconnection of critical infrastructure systems can lead to cascading events that not only affect local but also across regions. However, many of these interdependencies are not easily recognized and obscure and complicate the mitigation of risk. The purpose of the symposia series is hence to facilitate best practice in managing critical infrastructure risks, by bringing together businesses, government and researchers.

References

  1. 1 2 Woods, D.D. (2018). "Resilience is a Verb" (PDF). In Trump, B.D.; Florin, M.-V.; Linkov, I (eds.). IRGC resource guide on resilience (vol. 2): Domains of resilience for complex interconnected systems. Lausanne, CH: EPFL International Risk Governance Center.
  2. 1 2 Pariès, Jean (15 May 2017). Resilience Engineering in Practice. CRC Press. ISBN   978-1-317-06525-8. OCLC   1151009227.
  3. 1 2 Hollnagel, Erik; Christopher P. Nemeth; Sidney Dekker, eds. (2019). Resilience engineering perspectives. Vol. 2: Preparation and Restoration. CRC Press. ISBN   978-0-367-38540-8. OCLC   1105725342.
  4. Woods, D.D. (2017). STELLA: Report from the SNAFUcatchers Workshop on Coping With Complexity. Columbus, OH: Ohio State University.
  5. 1 2 Dekker, Sidney (2019). Foundations of safety science: a century of understanding accidents and disasters. Boca Raton. ISBN   978-1-351-05977-0. OCLC   1091899791.{{cite book}}: CS1 maint: location missing publisher (link)
  6. (David), Woods, D. (2017). Resilience Engineering: Concepts and Precepts. CRC Press. ISBN   978-1-317-06528-9. OCLC   1011232533.{{cite book}}: CS1 maint: multiple names: authors list (link)
  7. Woods, David D.; Sidney Dekker; Richard Cook; Leila Johannesen (2017). Behind human error (2nd ed.). Boca Raton. ISBN   978-1-317-17553-7. OCLC   1004974951.{{cite book}}: CS1 maint: location missing publisher (link)
  8. Dekker, Sidney W. A. (2002-10-01). "Reconstructing human contributions to accidents: the new view on error and performance". Journal of Safety Research. 33 (3): 371–385. doi:10.1016/S0022-4375(02)00032-4. ISSN   0022-4375. PMID   12404999. S2CID   46350729.
  9. Dekker, Sidney (2015). Safety differently : human factors for a new era (Second ed.). Boca Raton, FL. ISBN   978-1-4822-4200-3. OCLC   881430177.{{cite book}}: CS1 maint: location missing publisher (link)
  10. Hollnagel, Erik (2014). Safety-I and safety-II: the past and future of safety management. Farnham. ISBN   978-1-4724-2306-1. OCLC   875819877.{{cite book}}: CS1 maint: location missing publisher (link)
  11. 1 2 3 4 5 6 Erik Hollnagel; Christopher P. Nemeth; Sidney Dekker, eds. (2008–2009). Resilience engineering perspectives. Aldershot, Hampshire, England: Ashgate. ISBN   978-0-7546-7127-5. OCLC   192027611.
  12. "2006 Sophia Antipolis (F)". Resilience Engineering Association. Retrieved 2022-09-25.
  13. Resilience engineering perspectives. Erik Hollnagel, Christopher P. Nemeth, Sidney Dekker. Aldershot, Hampshire, England: Ashgate. 2008–2009. ISBN   978-0-7546-7127-5. OCLC   192027611.{{cite book}}: CS1 maint: others (link)
  14. "Symposium". Resilience Engineering Association. Retrieved 2022-09-25.
  15. Dekker, Sidney (2018). The safety anarchist: relying on human expertise and innovation, reducing bureaucracy and compliance. London. ISBN   978-1-351-40364-1. OCLC   1022761874.{{cite book}}: CS1 maint: location missing publisher (link)
  16. "Hindsight 31 | SKYbrary Aviation Safety". skybrary.aero. Retrieved 2022-09-25.
  17. 1 2 A Few Observations on the Marvelous Resilience of Bone & Resilience Engineering - Dr. Richard Cook , retrieved 2022-09-25
  18. Hollnagel, Erik (2017-05-15), "Epilogue: RAG – The Resilience Analysis Grid", Resilience Engineering in Practice, CRC Press, pp. 275–296, doi:10.1201/9781317065265-19, ISBN   978-1-315-60569-2 , retrieved 2022-09-17
  19. Woods, David D. (September 2015). "Four concepts for resilience and the implications for the future of resilience engineering". Reliability Engineering & System Safety. 141: 5–9. doi:10.1016/j.ress.2015.03.018.
  20. Woods, David D.; Branlat, Matthieu (2017-05-15), "Basic Patterns in How Adaptive Systems Fail", Resilience Engineering in Practice, CRC Press, pp. 127–143, doi:10.1201/9781317065265-10, ISBN   978-1-315-60569-2 , retrieved 2022-09-24