In engineering, a fault is a defect or problem in a system that causes it to fail or act abnormally.
The ISO document 10303-226 defines fault as an abnormal condition or defect at the component, equipment, or sub-system level which may lead to a failure.
The United States Glossary of Telecommunication Terms defines fault for telecommunications as:
A random fault occurs as a result of wear or other deterioration.
Since deterioration progresses somewhat randomly, predicting when a particular unit will develop a fault is not possible. But the rate at which a particular fault occurs among a large number of units often can be predicted with significant accuracy.
Manufacturers often accept random faults as a risk if the chances are virtually negligible.
A fault can happen in virtually any object or appliance, most common with electronics and machinery.
For example, an Xbox 360 console will deteriorate over time due to dust buildup in the fans. This will cause the Xbox to overheat, cause an error, and shut the console down.
A Systematic fault results from an error in design such that every copy has the same fault. Sometimes a systematic fault remains undetected for a long time even if many copies are in use. The fault might be triggered when conditions change and could fail in every copy at the same time.
Software can have faults, a.k.a. bugs, but since software cannot deteriorate, all faults are systematic.[ citation needed ]
In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.
Software testing is the act of checking whether software satisfies expectations.
A software bug is a design defect (bug) in computer software. A computer program with many or serious bugs may be described as buggy.
Fault commonly refers to:
A debugger or debugging tool is a computer program used to test and debug other programs. The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its execution and monitor changes in computer resources that may indicate malfunctioning code. Typical debugging facilities include the ability to run or halt the target program at specific points, display the contents of memory, CPU registers or storage devices, and modify memory or register contents in order to enter selected test data that might be a cause of faulty program execution.
In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. On some operating systems or individual applications, a crash reporting service will report the crash and any details relating to it, usually to the developer(s) of the application. If the program is a critical part of the operating system, the entire system may crash or hang, often resulting in a kernel panic or fatal system error.
A glitch is a short-lived technical fault, such as a transient one that corrects itself, making it difficult to troubleshoot. The term is particularly common in the computing and electronics industries, in circuit bending, as well as among players of video games. More generally, all types of systems including human organizations and nature experience glitches.
Ariane flight V88 was the failed maiden flight of the Arianespace Ariane 5 rocket, vehicle no. 501, on 4 June 1996. It carried the Cluster spacecraft, a constellation of four European Space Agency research satellites.
In systems engineering, dependability is a measure of a system's availability, reliability, maintainability, and in some cases, other characteristics such as durability, safety and security. In real-time computing, dependability is the ability to provide services that can be trusted within a time-period. The service guarantees must hold even when the system is subject to attacks or natural failures.
In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space. Furthermore, the actual page contents may need to be loaded from a back-up, e.g. a disk. The MMU detects the page fault, but the operating system's kernel handles the exception by making the required page accessible in the physical memory or denying an illegal memory access.
Troubleshooting is a form of problem solving, often applied to repair failed products or processes on a machine or a system. It is a logical, systematic search for the source of a problem in order to solve it, and make the product or process operational again. Troubleshooting is needed to identify the symptoms. Determining the most likely cause is a process of elimination—eliminating potential causes of a problem. Finally, troubleshooting requires confirmation that the solution restores the product or process to its working state.
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability is defined as the probability that a product, system, or service will perform its intended function adequately for a specified period of time, OR will operate in a defined environment without failure. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.
Game testing, also called quality assurance (QA) testing within the video game industry, is a software testing process for quality control of video games. The primary function of game testing is the discovery and documentation of software defects. Interactive entertainment software testing is a highly technical field requiring computing expertise, analytic competence, critical evaluation skills, and endurance. In recent years the field of game testing has come under fire for being extremely strenuous and unrewarding, both financially and emotionally.
In computer programming jargon, a heisenbug is a software bug that seems to disappear or alter its behavior when one attempts to study it. The term is a pun on the name of Werner Heisenberg, the physicist who first asserted the observer effect of quantum mechanics, which states that the act of observing a system inevitably alters its state. In electronics, the traditional term is probe effect, where attaching a test probe to a device changes its behavior.
The Xbox 360 video game console is subject to a number of technical problems and failures that can render it unusable. However, many of the issues can be identified by a series of glowing red lights flashing on the face of the console; the three flashing red lights nicknamed the "Red Ring of Death" or the "RRoD" being the most infamous. There are also other issues that arise with the console, such as discs becoming scratched in the drive and "bricking" of consoles due to dashboard updates. Since its release on November 22, 2005, many articles have appeared in the media portraying the Xbox 360's failure rates, with the latest estimate by warranty provider SquareTrade to be 23.7% in 2009, and currently the highest estimate being 54.2% by a Game Informer survey.
The term downtime is used to refer to periods when a system is unavailable. The unavailability is the proportion of a time-span that a system is unavailable or offline. This is usually a result of the system failing to function because of an unplanned event, or because of routine maintenance.
The blue screen of death is a critical error screen displayed by the Microsoft Windows operating systems. It indicates a system crash, in which the operating system reaches a critical condition where it can no longer operate safely.
In engineering, debugging is the process of finding the root cause of and workarounds and possible fixes for bugs.
ISO 26262, titled "Road vehicles – Functional safety", is an international standard for functional safety of electrical and/or electronic systems that are installed in serial production road vehicles, defined by the International Organization for Standardization (ISO) in 2011, and revised in 2018.
Condition monitoring of transformers in electrical engineering is the process of acquiring and processing data related to various parameters of transformers to determine their state of quality and predict their failure. This is done by observing the deviation of the transformer parameters from their expected values. Transformers are the most critical assets of electrical transmission and distribution systems, and their failures could cause power outages, personal and environmental hazards, and expensive rerouting or purchase of power from other suppliers. Identifying a transformer which is near failure can allow it to be replaced under controlled conditions at a non-critical time and avoid a system failure.