Fault (technology)

Last updated

In engineering, a fault is a defect or problem in a system that causes it to fail or act abnormally.

Contents

The ISO document 10303-226 defines fault as an abnormal condition or defect at the component, equipment, or sub-system level which may lead to a failure.

The United States Glossary of Telecommunication Terms defines fault for telecommunications as:

  1. An accidental condition that causes a functional unit to fail to perform its required function. See § Random fault.
  2. A defect that causes a reproducible or catastrophic malfunction. A malfunction is considered reproducible if it occurs consistently under the same circumstances. See § Systematic fault.
  3. In power systems, an unintentional short circuit, or partial short circuit, between energized conductors or between an energized conductor and ground. A distinction can be made between symmetric and asymmetric faults. See Fault (power engineering).

Random fault

A random fault occurs as a result of wear or other deterioration.

Since deterioration progresses somewhat randomly, predicting when a particular unit will develop a fault is not possible. But the rate at which a particular fault occurs among a large number of units often can be predicted with significant accuracy.

Manufacturers often accept random faults as a risk if the chances are virtually negligible.

A fault can happen in virtually any object or appliance, most common with electronics and machinery.

For example, an Xbox 360 console will deteriorate over time due to dust buildup in the fans. This will cause the Xbox to overheat, cause an error, and shut the console down.

Systematic fault

A Systematic fault results from an error in design such that every copy has the same fault. Sometimes a systematic fault remains undetected for a long time even if many copies are in use. The fault might be triggered when conditions change and could fail in every copy at the same time.

Software can have faults, a.k.a. bugs, but since software cannot deteriorate, all faults are systematic.[ citation needed ]

See also


Related Research Articles

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

<span class="mw-page-title-main">Software testing</span> Checking software against a standard

Software testing is the act of checking whether software satisfies expectations.

A software bug is a bug in computer software.

<span class="mw-page-title-main">Debugger</span> Computer program used to test and debug other programs

A debugger or debugging tool is a computer program used to test and debug other programs. The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its execution and monitor changes in computer resources that may indicate malfunctioning code. Typical debugging facilities include the ability to run or halt the target program at specific points, display the contents of memory, CPU registers or storage devices, and modify memory or register contents in order to enter selected test data that might be a cause of faulty program execution.

<span class="mw-page-title-main">Crash (computing)</span> When a computer program stops functioning properly and self-terminates

In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. On some operating systems or individual applications, a crash reporting service will report the crash and any details relating to it, usually to the developer(s) of the application. If the program is a critical part of the operating system, the entire system may crash or hang, often resulting in a kernel panic or fatal system error.

<span class="mw-page-title-main">Glitch</span> Short-lived fault in a computer system

A glitch is a short-lived technical fault, such as a transient one that corrects itself, making it difficult to troubleshoot. The term is particularly common in the computing and electronics industries, in circuit bending, as well as among players of video games. More generally, all types of systems including human organizations and nature experience glitches.

<span class="mw-page-title-main">Residual-current device</span> Electrical safety device used in household wiring

A residual-current device (RCD), residual-current circuit breaker (RCCB) or ground fault circuit interrupter (GFCI) is an electrical safety device that interrupts an electrical circuit when the current passing through a conductor is not equal and opposite in both directions, therefore indicating leakage current to ground or current flowing to another powered conductor. The device's purpose is to reduce the severity of injury caused by an electric shock. Injury from shock is limited to the time before the electrical circuit is interrupted, but the victim may also sustain further injury, e.g. by falling after receiving a shock. This type of circuit interrupter can not protect a person who touches both circuit conductors at the same time, since it then cannot distinguish normal current from that passing through a person.

<span class="mw-page-title-main">Ariane flight V88</span> Failed maiden flight of Ariane 5, 1996

Ariane flight V88 was the failed maiden flight of the Arianespace Ariane 5 rocket, vehicle no. 501, on 4 June 1996. It carried the Cluster spacecraft, a constellation of four European Space Agency research satellites.

In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space. Besides, the actual page contents may need to be loaded from a backing store, such as a disk. The MMU detects the page fault, but the operating system's kernel handles the exception by making the required page accessible in the physical memory or denying an illegal memory access.

Troubleshooting is a form of problem solving, often applied to repair failed products or processes on a machine or a system. It is a logical, systematic search for the source of a problem in order to solve it, and make the product or process operational again. Troubleshooting is needed to identify the symptoms. Determining the most likely cause is a process of elimination—eliminating potential causes of a problem. Finally, troubleshooting requires confirmation that the solution restores the product or process to its working state.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Game testing, also called quality assurance (QA) testing within the video game industry, is a software testing process for quality control of video games. The primary function of game testing is the discovery and documentation of software defects. Interactive entertainment software testing is a highly technical field requiring computing expertise, analytic competence, critical evaluation skills, and endurance. In recent years the field of game testing has come under fire for being extremely strenuous and unrewarding, both financially and emotionally.

<span class="mw-page-title-main">Xbox 360 technical problems</span> Errors on Xbox 360

The Xbox 360 video game console is subject to a number of technical problems and failures that can render it unusable. However, many of the issues can be identified by a series of glowing red lights flashing on the face of the console; the three flashing red lights nicknamed the "Red Ring of Death" or the "RRoD" being the most infamous. There are also other issues that arise with the console, such as discs becoming scratched in the drive and "bricking" of consoles due to dashboard updates. Since its release on November 22, 2005, many articles have appeared in the media portraying the Xbox 360's failure rates, with the latest estimate by warranty provider SquareTrade to be 23.7% in 2009, and currently the highest estimate being 54.2% by a Game Informer survey.

In an electric power system, a fault or fault current is any abnormal electric current. For example, a short circuit is a fault in which a live wire touches a neutral or ground wire. An open-circuit fault occurs if a circuit is interrupted by a failure of a current-carrying wire or a blown fuse or circuit breaker. In three-phase systems, a fault may involve one or more phases and ground, or may occur only between phases. In a "ground fault" or "earth fault", current flows into the earth. The prospective short-circuit current of a predictable fault can be calculated for most situations. In power systems, protective devices can detect fault conditions and operate circuit breakers and other devices to limit the loss of service due to a failure.

An intermittent fault, often called simply an "intermittent", is a malfunction of a device or system that occurs at intervals, usually irregular, in a device or system that functions normally at other times. Intermittent faults are common to all branches of technology, including computer software. An intermittent fault is caused by several contributing factors, some of which may be effectively random, which occur simultaneously. The more complex the system or mechanism involved, the greater the likelihood of an intermittent fault.

The term downtime is used to refer to periods when a system is unavailable. The unavailability is the proportion of a time-span that a system is unavailable or offline. This is usually a result of the system failing to function because of an unplanned event, or because of routine maintenance.

<span class="mw-page-title-main">Blue screen of death</span> Fatal system error screen on a computer running Microsoft Windows or ReactOS

The blue screen of death is a critical error screen displayed by Microsoft Windows. It indicates a system crash, in which the operating system reaches a critical condition where it can no longer operate safely.

In engineering, debugging is the process of finding the root cause of and workarounds and possible fixes for bugs.

ISO 26262, titled "Road vehicles – Functional safety", is an international standard for functional safety of electrical and/or electronic systems that are installed in serial production road vehicles, defined by the International Organization for Standardization (ISO) in 2011, and revised in 2018.

Condition monitoring of transformers in electrical engineering is the process of acquiring and processing data related to various parameters of transformers to determine their state of quality and predict their failure. This is done by observing the deviation of the transformer parameters from their expected values. Transformers are the most critical assets of electrical transmission and distribution systems, and their failures could cause power outages, personal and environmental hazards, and expensive rerouting or purchase of power from other suppliers. Identifying a transformer which is near failure can allow it to be replaced under controlled conditions at a non-critical time and avoid a system failure.