Software system safety

Last updated

In software engineering, software system safety optimizes system safety in the design, development, use, and maintenance of software systems and their integration with safety-critical hardware systems in an operational environment.

Contents

Overview

Software system safety is a subset of system safety and system engineering and is synonymous with the software engineering aspects of Functional Safety. As part of the total safety and software development program, software cannot be allowed to function independently of the total effort. Both simple and highly integrated multiple systems are experiencing an extraordinary growth in the use of computers and software to monitor and/or control safety-critical subsystems or functions. A software specification error, design flaw, or the lack of generic safety-critical requirements can contribute to or cause a system failure or erroneous human decision. To achieve an acceptable level of safety for software used in critical applications, software system safety engineering must be given primary emphasis early in the requirements definition and system conceptual design process. Safety-critical software must then receive continuous management emphasis and engineering analysis throughout the development and operational lifecycles of the system. Software with safety-critical functionality must be thoroughly verified with objective analysis.

Functional Hazard Analyses (FHA) are often conducted early on - in parallel with or as part of system engineering Functional Analyses - to determine the safety-critical functions (SCF) of the systems for further analyses and verification. Software system safety is directly related to the more critical design aspects and safety attributes in software and system functionality, whereas software quality attributes are inherently different and require standard scrutiny and development rigor. Development Assurance levels (DAL) and associated Level of Rigor (LOR) is a graded approach to software quality and software design assurance as a pre-requisite that a suitable software process is followed for confidence. LOR concepts and standards such as DO-178C are NOT a substitute for software safety. Software safety per IEEE STD-1228 and MIL-STD-882E focuses on ensuring explicit safety requirements are met and verified using functional approaches from a safety requirements analysis and test perspective. Software safety hazard analysis required for more complex systems where software is controlling critical functions generally are in the following sequential categories and are conducted in phases as part of the system safety or safety engineering process: software safety requirements analysis; software safety design analyses (top level, detailed design and code level); software safety test analysis, and software safety change analysis. Once these "functional" software safety analyses are completed the software engineering team will know where to place safety emphasis and what functional threads, functional paths, domains and boundaries to focus on when designing in software safety attributes to ensure correct functionality and to detect malfunctions, failures, faults and to implement a host of mitigation strategies to control hazards. Software security and various software protection technologies are similar to software safety attributes in the design to mitigate various types of threats vulnerability and risks. Deterministic software is sought in the design by verifying correct and predictable behavior at the system level.

Goals

See also

Related Research Articles

Systems engineering Interdisciplinary field of engineering

Systems engineering is an interdisciplinary field of engineering and engineering management that focuses on how to design, integrate, and manage complex systems over their life cycles. At its core, systems engineering utilizes systems thinking principles to organize this body of knowledge. The individual outcome of such efforts, an engineered system, can be defined as a combination of components that work in synergy to collectively perform a useful function.

Safety engineering Engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

Configuration management

Configuration management (CM) is a systems engineering process for establishing and maintaining consistency of a product's performance, functional, and physical attributes with its requirements, design, and operational information throughout its life. The CM process is widely used by military engineering organizations to manage changes throughout the system lifecycle of complex systems, such as weapon systems, military vehicles, and information systems. Outside the military, the CM process is also used with IT service management as defined by ITIL, and with other domain models in the civil engineering and other industrial engineering segments such as roads, bridges, canals, dams, and buildings.

Avionics software is embedded software with legally mandated safety and reliability concerns used in avionics. The main difference between avionic software and conventional embedded software is that the development process is required by law and is optimized for safety. It is claimed that the process described below is only slightly slower and more costly than the normal ad hoc processes used for commercial software. Since most software fails because of mistakes, eliminating the mistakes at the earliest possible step is also a relatively inexpensive and reliable way to produce software. In some projects however, mistakes in the specifications may not be detected until deployment. At that point, they can be very expensive to fix.

In software project management, software testing, and software engineering, verification and validation (V&V) is the process of checking that a software system meets specifications and requirements so that it fulfills its intended purpose. It may also be referred to as software quality control. It is normally the responsibility of software testers as part of the software development lifecycle. In simple terms, software verification is: "Assuming we should build X, does our software achieve its goals without any bugs or gaps?" On the other hand, software validation is: "Was X what we should have built? Does X meet the high-level requirements?"

Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.

The safety life cycle is the series of phases from initiation and specifications of safety requirements, covering design and development of safety features in a safety-critical system, and ending in decommissioning of that system. This article uses software as the context but the safety life cycle applies to other areas such as construction of buildings, for example. In software development, a process is used and this process consists of a few phases, typically covering initiation, analysis, design, programming, testing and implementation. The focus is to build the software. Some software have safety concerns while others do not. For example, a Leave Application System does not have safety requirements. But we are concerned about safety if a software that is used to control the components in a plane fails. So for the latter, the question is how safety, being so important, should be managed within the software life cycle.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

DO-178B, Software Considerations in Airborne Systems and Equipment Certification is a guideline dealing with the safety of safety-critical software used in certain airborne systems. It was jointly developed by the safety-critical working group RTCA SC-167 of the Radio Technical Commission for Aeronautics (RTCA) and WG-12 of the European Organisation for Civil Aviation Equipment (EUROCAE). RTCA published the document as RTCA/DO-178B, while EUROCAE published the document as ED-12B. Although technically a guideline, it was a de facto standard for developing avionics software systems until it was replaced in 2012 by DO-178C.

Safety integrity level (SIL) is defined as a relative levels of risk-reduction provided by a safety function, or to specify a target level of risk reduction. In simple terms, SIL is a measurement of performance required for a safety instrumented function (SIF).

A hazard analysis is used as the first step in a process used to assess risk. The result of a hazard analysis is the identification of different type of hazards. A hazard is a potential condition and exists or not. It may in single existence or in combination with other hazards and conditions become an actual Functional Failure or Accident (Mishap). The way this exactly happens in one particular sequence is called a scenario. This scenario has a probability of occurrence. Often a system has many potential failure scenarios. It also is assigned a classification, based on the worst case severity of the end condition. Risk is the combination of probability and severity. Preliminary risk levels can be provided in the hazard analysis. The validation, more precise prediction (verification) and acceptance of risk is determined in the Risk assessment (analysis). The main goal of both is to provide the best selection of means of controlling or eliminating the risk. The term is used in several engineering specialties, including avionics, chemical process safety, safety engineering, reliability engineering and food safety.

ARP4761

ARP4761, Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment is an Aerospace Recommended Practice from SAE International. In conjunction with ARP4754, ARP4761 is used to demonstrate compliance with 14 CFR 25.1309 in the U.S. Federal Aviation Administration (FAA) airworthiness regulations for transport category aircraft, and also harmonized international airworthiness regulations such as European Aviation Safety Agency (EASA) CS–25.1309.

ARP4754

ARP4754, Aerospace Recommended Practice (ARP) ARP4754A, is a guideline from SAE International, dealing with the development processes which support certification of Aircraft systems, addressing "the complete aircraft development cycle, from systems requirements through systems verification." Revision A was released in December 2010. It was recognized by the FAA in AC 20-174 published November 2011. EUROCAE jointly issues the document as ED–79.

RTCA DO-254 / EUROCAE ED-80, Design Assurance Guidance for Airborne Electronic Hardware is a document providing guidance for the development of airborne electronic hardware, published by RTCA, Incorporated and EUROCAE. The DO-254/ED-80 standard was formally recognized by the FAA in 2005 via AC 20-152 as a means of compliance for the design assurance of electronic hardware in airborne systems. The guidance in this document is applicable, but not limited, to such electronic hardware items as

IEC 61508 is an international standard published by the International Electrotechnical Commission consisting of methods on how to apply, design, deploy and maintain automatic protection systems called safety-related systems. It is titled Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems.

The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach. This is different from traditional safety strategies which rely on control of conditions and causes of an accident based either on the epidemiological analysis or as a result of investigation of individual past accidents. The concept of system safety is useful in demonstrating adequacy of technologies when difficulties are faced with probabilistic risk analysis. The underlying principle is one of synergy: a whole is more than sum of its parts. Systems-based approach to safety requires the application of scientific, technical and managerial skills to hazard identification, hazard analysis, and elimination, control, or management of hazards throughout the life-cycle of a system, program, project or an activity or a product. "Hazop" is one of several techniques available for identification of hazards.

Functional safety is the part of the overall safety of a system or piece of equipment that depends on automatic protection operating correctly in response to its inputs or failure in a predictable manner (fail-safe). The automatic protection system should be designed to properly handle likely human errors, systematic errors, hardware failures and operational/environmental stress.

In the United States military integrated acquisition lifecycle the Technical section has multiple acquisition "Technical Reviews". Technical reviews and audits assist the acquisition and the number and types are tailored to the acquisition. Overall guidance flows from the Defense Acquisition Guidebook chapter 4, with local details further defined by the review organizations. Typical topics examined include adequacy of program/contract metrics, proper staffing, risks, budget, and schedule.

ISO 26262, titled "Road vehicles – Functional safety", is an international standard for functional safety of electrical and/or electronic systems that are installed in serial production road vehicles, defined by the International Organization for Standardization (ISO) in 2011, and revised in 2018.

Arcadia (engineering)

ARCADIA is a system and software architecture engineering method, based on architecture-centric and model-driven engineering activities.

References

    PD-icon.svg This article incorporates  public domain material from the United States Army document: "Software handbook".