Error-tolerant design

Last updated

An error-tolerant design (or human-error-tolerant design [1] ) is one that does not unduly penalize user or human errors. It is the human equivalent of fault tolerant design that allows equipment to continue functioning in the presence of hardware faults, such as a "limp-in" mode for an automobile electronics unit that would be employed if something like the oxygen sensor failed.

Contents

Use of behavior shaping constraints to prevent errors

Use of forcing functions or behavior-shaping constraints is one technique in error-tolerant design. An example is the interlock or lockout of reverse in the transmission of a moving car. This prevents errors, and prevention of errors is the most effective technique in error-tolerant design. The practice is known as poka-yoke in Japan where it was introduced by Shigeo Shingo as part of the Toyota Production System.

Mitigation of the effects of errors

The next most effective technique in error-tolerant design is the mitigation or limitation of the effects of errors after they have been made. An example is a checking or confirmation function such as an "Are you sure" dialog box with the harmless option preselected in computer software for an action that could have severe consequences if made in error, such as deleting or overwriting files (although the consequence of inadvertent file deletion has been reduced from the DOS days by a concept like the trash can in Mac OS, which has been introduced in most GUI interfaces). Adding too great a mitigating factor in some circumstances can become a hindrance, where the confirmation becomes mechanical this may become detrimental - for example, if a prompt is asked for every file in a batch delete, one may be tempted to simply agree to each prompt, even if a file is deleted accidentally.

Another example is Google's use of spell checking on searches performed through their search engine. The spell checking minimises the problems caused by incorrect spelling by not only highlighting the error to the user, but by also providing a link to search using the correct spelling instead. Searches like this are commonly performed using a combination of edit distance, soundex, and metaphone calculations.

See also

Related Research Articles

Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation. Test techniques include, but are not limited to:

Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle. It is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The term is broad in scope and may have widely different meanings depending on the specific context even under the same general umbrella of computing. It is at times used as a proxy term for data quality, while data validation is a prerequisite for data integrity.

In computing and computer programming, exception handling is the process of responding to the occurrence of exceptions – anomalous or exceptional conditions requiring special processing – during the execution of a program. In general, an exception breaks the normal flow of execution and executes a pre-registered exception handler; the details of how this is done depend on whether it is a hardware or software exception and how the software exception is implemented.

<span class="mw-page-title-main">Spell checker</span> Software to help correct spelling errors

In software, a spell checker is a software feature that checks for misspellings in a text. Spell-checking features are often embedded in software or services, such as a word processor, email client, electronic dictionary, or search engine.

<span class="mw-page-title-main">Typographical error</span> Mistake made in typing printed material

A typographical error, also called a misprint, is a mistake made in the typing of printed or electronic material. Historically, this referred to mistakes in manual typesetting. Technically, the term includes errors due to mechanical failure or slips of the hand or finger, but excludes errors of ignorance, such as spelling errors, or changing and misuse of words such as "than" and "then". Before the arrival of printing, the copyist's mistake or scribal error was the equivalent for manuscripts. Most typos involve simple duplication, omission, transposition, or substitution of a small number of characters.

Code injection is the exploitation of a computer bug that is caused by processing invalid data. The injection is used by an attacker to introduce code into a vulnerable computer program and change the course of execution. The result of successful code injection can be disastrous, for example, by allowing computer viruses or computer worms to propagate.

Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of storage media that does not remove data previously written to the media, or through physical properties of the storage media that allow previously written data to be recovered. Data remanence may make inadvertent disclosure of sensitive information possible should the storage media be released into an uncontrolled environment.

Data loss is an error condition in information systems in which information is destroyed by failures or neglect in storage, transmission, or processing. Information systems implement backup and disaster recovery equipment and processes to prevent data loss or restore lost data. Data loss can also occur if the physical medium containing the data is lost or stolen.

Troubleshooting is a form of problem solving, often applied to repair failed products or processes on a machine or a system. It is a logical, systematic search for the source of a problem in order to solve it, and make the product or process operational again. Troubleshooting is needed to identify the symptoms. Determining the most likely cause is a process of elimination—eliminating potential causes of a problem. Finally, troubleshooting requires confirmation that the solution restores the product or process to its working state.

Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specified period of time. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time.

Fault tolerance is the ability of a system to maintain proper operation in the event of failures or faults in one or more of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can lead to total breakdown. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation.

In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).

<span class="mw-page-title-main">User interface design</span> Planned operator–machine interaction

User interface (UI) design or user interface engineering is the design of user interfaces for machines and software, such as computers, home appliances, mobile devices, and other electronic devices, with the focus on maximizing usability and the user experience. In computer or software design, user interface (UI) design primarily focuses on information architecture. It is the process of building interfaces that clearly communicate to the user what's important. UI design refers to graphical user interfaces and other forms of interface design. The goal of user interface design is to make the user's interaction as simple and efficient as possible, in terms of accomplishing user goals.

Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by International Business Machines (IBM) as a term to describe the robustness of their mainframe computers.

<span class="mw-page-title-main">Error message</span> Computer message indicating an error

An error message is the information displayed when an unforeseen problem occurs, usually on a computer or other device. Modern operating systems with graphical user interfaces, often display error messages using dialog boxes. Error messages are used when user intervention is required, to indicate that a desired operation has failed, or to relay important warnings. Error messages are seen widely throughout computing, and are part of every operating system or computer hardware device. The proper design of error messages is an important topic in usability and other fields of human–computer interaction.

In systems design, a fail-fast system is one which immediately reports at its interface any condition that is likely to indicate a failure. Fail-fast systems are usually designed to stop normal operation rather than attempt to continue a possibly flawed process. Such designs often check the system's state at several points in an operation, so any failures can be detected early. The responsibility of a fail-fast module is detecting errors, then letting the next-highest level of the system handle them.

The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach. This is different from traditional safety strategies which rely on control of conditions and causes of an accident based either on the epidemiological analysis or as a result of investigation of individual past accidents. The concept of system safety is useful in demonstrating adequacy of technologies when difficulties are faced with probabilistic risk analysis. The underlying principle is one of synergy: a whole is more than sum of its parts. Systems-based approach to safety requires the application of scientific, technical and managerial skills to hazard identification, hazard analysis, and elimination, control, or management of hazards throughout the life-cycle of a system, program, project or an activity or a product. "Hazop" is one of several techniques available for identification of hazards.

In computer science, robustness is the ability of a computer system to cope with errors during execution and cope with erroneous input. Robustness can encompass many areas of computer science, such as robust programming, robust machine learning, and Robust Security Network. Formal techniques, such as fuzz testing, are essential to showing robustness since this type of testing involves invalid or unexpected inputs. Alternatively, fault injection can be used to test robustness. Various commercial products perform robustness testing of software analysis.

Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Fault-tolerant software has the ability to satisfy requirements despite failures.

Database testing usually consists of a layered process, including the user interface (UI) layer, the business layer, the data access layer and the database itself. The UI layer deals with the interface design of the database, while the business layer includes databases supporting business strategies.

References

  1. Chauncey Wilson (September 11, 2009). "User Needs Analysis". User experience re-mastered : your guide to getting the right design. Morgan Kaufmann Publishers. ISBN   9780123751140. Archived from the original on Dec 30, 2019.