A critical system is a system which must be highly reliable and retain this reliability as it evolves without incurring prohibitive costs. [1]
There are four types of critical systems: safety critical, mission critical, business critical and security critical. [1]
For such systems, trusted methods and techniques must be used for development. Consequently, critical systems are usually developed using well-tested techniques rather than newer techniques that have not been subject to extensive practical experience. Developers of critical systems are naturally conservative, preferring to use older techniques whose strengths and weaknesses are understood, rather than new techniques which may appear to be better, but whose long-term problems are unknown. [2]
Expensive software engineering techniques that are not cost-effective for non-critical systems may sometimes be used for critical systems development. For example, formal mathematical methods of software development have been successfully used for safety and security critical systems. One reason why these formal methods are used is that it helps reduce the amount of testing required. For critical systems, the costs of verification and validation are usually very high—more than 50% of the total system development costs. [2]
A critical system is distinguished by the consequences associated with system or function failure. Likewise, critical systems are further distinguished between fail-operational and fail safe systems, according to the tolerance they must exhibit to failures: [3]
Safety critical systems deal with scenarios that may lead to loss of life, serious personal injury, or damage to the natural environment. Examples of safety-critical systems are a control system for a chemical manufacturing plant, aircraft, the controller of an unmanned train metro system, a controller of a nuclear plant, etc. [2] [1] [3]
Mission critical systems are made to avoid inability to complete the overall system, project objectives or one of the goals for which the system was designed. Examples of mission-critical systems are a navigational system for a spacecraft, software controlling a baggage handling system of an airport, etc. [2] [1] [3]
Business critical systems are programmed to avoid significant tangible or intangible economic costs; e.g., loss of business or damage to reputation. This is often due to the interruption of service caused by the system being unusable. Examples of business-critical systems are clients' accounting systems for a bank, a stock-trading systems, enterprise resource planning systems, search engines, etc. [2] [1] [3] These are often delineated via a business impact analysis. The term is sometimes used interchangeably with 'mission critical'; however business critical systems can be defined as those not necessary during incidents, while mission critical systems are seen as essential for any operations at any time. [4]
Security critical systems deal with the loss of sensitive data through theft or accidental loss. [1]
Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.
In engineering, a fail-safe is a design feature or practice that, in the event of a failure of the design feature, inherently responds in a way that will cause minimal or no harm to other equipment, to the environment or to people. Unlike inherent safety to a particular hazard, a system being "fail-safe" does not mean that failure is naturally inconsequential, but rather that the system's design prevents or mitigates unsafe consequences of the system's failure. If and when a "fail-safe" system fails, it remains at least as safe as it was before the failure. Since many types of failure are possible, failure mode and effects analysis is used to examine failure situations and recommend safety design and procedures.
Safety is the state of being "safe", the condition of being protected from harm or other danger. Safety can also refer to the control of recognized hazards in order to achieve an acceptable level of risk.
A safety-critical system or life-critical system is a system whose failure or malfunction may result in one of the following outcomes:
In systems engineering, dependability is a measure of a system's availability, reliability, maintainability, and in some cases, other characteristics such as durability, safety and security. In real-time computing, dependability is the ability to provide services that can be trusted within a time-period. The service guarantees must hold even when the system is subject to attacks or natural failures.
In software project management, software testing, and software engineering, verification and validation is the process of checking that a software engineer system meets specifications and requirements so that it fulfills its intended purpose. It may also be referred to as software quality control. It is normally the responsibility of software testers as part of the software development lifecycle. In simple terms, software verification is: "Assuming we should build X, does our software achieve its goals without any bugs or gaps?" On the other hand, software validation is: "Was X what we should have built? Does X meet the high-level requirements?"
A mission critical factor of a system is any factor that is essential to business, organizational, or governmental operations. Failure or disruption of mission critical factors will result in serious impact on business, organization, or government operations, and even can cause social turmoil and catastrophes.
Failure mode and effects analysis is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effects. For each component, the failure modes and their resulting effects on the rest of the system are recorded in a specific FMEA worksheet. There are numerous variations of such worksheets. An FMEA can be a qualitative analysis, but may be put on a quantitative basis when mathematical failure rate models are combined with a statistical failure mode ratio database. It was one of the first highly structured, systematic techniques for failure analysis. It was developed by reliability engineers in the late 1950s to study problems that might arise from malfunctions of military systems. An FMEA is often the first step of a system reliability study.
In the context of software engineering, software quality refers to two related but distinct notions:
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.
In engineering and systems theory, redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing.
Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission-critical, or even life-critical systems.
Integrated logistics support (ILS) is a technology in the system engineering to lower a product life cycle cost and decrease demand for logistics by the maintenance system optimization to ease the product support. Although originally developed for military purposes, it is also widely used in commercial customer service organisations.
Software assurance (SwA) is a critical process in software development that ensures the reliability, safety, and security of software products. It involves a variety of activities, including requirements analysis, design reviews, code inspections, testing, and formal verification. One crucial component of software assurance is secure coding practices, which follow industry-accepted standards and best practices, such as those outlined by the Software Engineering Institute (SEI) in their CERT Secure Coding Standards (SCS).
A hazard analysis is one of many methods that may be used to assess risk. At its core, the process entails describing a system object that intends to conduct some activity. During the performance of that activity, an adverse event may be encountered that could cause or contribute to an occurrence. Finally, that occurrence will result in some outcome that may be measured in terms of the degree of loss or harm. This outcome may be measured on a continuous scale, such as an amount of monetary loss, or the outcomes may be categorized into various levels of severity.
Failure mode effects and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).
Software safety is an engineering discipline that aims to ensure that software, which is used in safety-related systems, does not contribute to any hazards such a system might pose. There are numerous standards that govern the way how safety-related software should be developed and assured in various domains. Most of them classify software according to their criticality and propose techniques and measures that should be employed during the development and assurance:
Operational risk management (ORM) is defined as a continual recurring process that includes risk assessment, risk decision making, and the implementation of risk controls, resulting in the acceptance, mitigation, or avoidance of risk.
The Motor Industry Software Reliability Association (MISRA) is an organization that produces guidelines for the software developed for electronic components used in the automotive industry. It is a collaboration between vehicle manufacturers, component suppliers and engineering consultancies. In 2021, the loose consortium restructured as The MISRA Consortium Limited.
Software reliability testing is a field of software-testing that relates to testing a software's ability to function, given environmental conditions, for a particular amount of time. Software reliability testing helps discover many problems in the software design and functionality.