Crash-only software

Last updated

Crash-only software refers to computer programs that handle failures by simply restarting, without attempting any sophisticated recovery. [1] Correctly written components of crash-only software can microreboot to a known-good state without the help of a user. Since failure-handling and normal startup use the same methods, this can increase the chance that bugs in failure-handling code will be noticed,[ clarification needed ] except when there are leftover artifacts, such as data corruption from a severe failure, that don't occur during normal startup.[ citation needed ]

Contents

Crash-only software also has benefits for end-users. All too often, applications do not save their data and settings while running, only at the end of their use. For example, word processors usually save settings when they are closed. A crash-only application is designed to save all changed user settings soon after they are changed, so that the persistent state matches that of the running machine. No matter how an application terminates (be it a clean close or the sudden failure of a laptop battery), the state will persist.

See also

Related Research Articles

In telecommunication, provisioning involves the process of preparing and equipping a network to allow it to provide new services to its users. In National Security/Emergency Preparedness telecommunications services, "provisioning" equates to "initiation" and includes altering the state of an existing priority service or capability.

<span class="mw-page-title-main">Crash (computing)</span> When a computer program stops functioning properly and self-terminates

In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. On some operating systems or individual applications, a crash reporting service will report the crash and any details relating to it, usually to the developer(s) of the application. If the program is a critical part of the operating system, the entire system may crash or hang, often resulting in a kernel panic or fatal system error.

Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly important for long running applications that are executed in failure-prone computing systems.

Execution in computer and software engineering is the process by which a computer or virtual machine interpret and acts on the instructions of a computer program. Each instruction of a program is a description of a particular action which must be carried out, in order for a specific problem to be solved. Execution involves repeatedly following a "fetch–decode–execute" cycle for each instruction done by control unit. As the executing machine follows the instructions, specific effects are produced in accordance with the semantics of those instructions.

In computer science and networking in particular, a session is a time-delimited two-way link, a practical layer in the TCP/IP protocol enabling interactive expression and information exchange between two or more communication devices or ends – be they computers, automated systems, or live active users. A session is established at a certain point in time, and then ‘torn down’ - brought to an end - at some later point. An established communication session may involve more than one message in each direction. A session is typically stateful, meaning that at least one of the communicating parties needs to hold current state information and save information about the session history to be able to communicate, as opposed to stateless communication, where the communication consists of independent requests with responses.

REST is a software architectural style that was created to guide the design and development of the architecture for the World Wide Web. REST defines a set of constraints for how the architecture of a distributed, Internet-scale hypermedia system, such as the Web, should behave. The REST architectural style emphasises uniform interfaces, independent deployment of components, the scalability of interactions between them, and creating a layered architecture to promote caching to reduce user-perceived latency, enforce security, and encapsulate legacy systems.

<span class="mw-page-title-main">Windows Registry</span> Database for Microsoft Windows

The Windows Registry is a hierarchical database that stores low-level settings for the Microsoft Windows operating system and for applications that opt to use the registry. The kernel, device drivers, services, Security Accounts Manager, and user interfaces can all use the registry. The registry also allows access to counters for profiling system performance.

High-availability clusters are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.

In systems design, a fail-fast system is one which immediately reports at its interface any condition that is likely to indicate a failure. Fail-fast systems are usually designed to stop normal operation rather than attempt to continue a possibly flawed process. Such designs often check the system's state at several points in an operation, so any failures can be detected early. The responsibility of a fail-fast module is detecting errors, then letting the next-highest level of the system handle them.

In computer programming, error hiding is the practice of catching an error or exception, and then continuing without logging, processing, or reporting the error to other parts of the software. Handling errors in this manner is considered bad practice and an anti-pattern in computer programming. In languages with exception handling support, this practice is called exception swallowing.

<span class="mw-page-title-main">Windows Error Reporting</span> Crash reporting technology

Windows Error Reporting (WER) is a crash reporting technology introduced by Microsoft with Windows XP and included in later Windows versions and Windows Mobile 5.0 and 6.0. Not to be confused with the Dr. Watson debugging tool which left the memory dump on the user's local machine, Windows Error Reporting collects and offers to send post-error debug information using the Internet to Microsoft when an application crashes or stops responding on a user's desktop. No data is sent without the user's consent. When a crash dump reaches the Microsoft server, it is analyzed, and information about a solution is sent back to the user if available. Solutions are served using Windows Error Reporting Responses. Windows Error Reporting runs as a Windows service. Kinshuman Kinshumann is the original architect of WER. WER was also included in the Association for Computing Machinery (ACM) hall of fame for its impact on the computing industry.

Stress testing is a software testing activity that determines the robustness of software by testing beyond the limits of normal operation. Stress testing is particularly important for "mission critical" software, but is used for all types of software. Stress tests commonly put a greater emphasis on robustness, availability, and error handling under a heavy load, than on what would be considered correct behavior under normal circumstances.

Oracle Secure Global Desktop (SGD) software provides secure access to both published applications and published desktops running on Microsoft Windows, Unix, mainframe and IBM i systems via a variety of clients ranging from fat PCs to thin clients such as Sun Rays.

A roaming user profile is a file synchronization concept in the Windows NT family of operating systems that allows users with a computer joined to a Windows domain to log on to any computer on the same domain and access their documents and have a consistent desktop experience, such as applications remembering toolbar positions and preferences, or the desktop appearance staying the same, while keeping all related files stored locally, to not continuously depend on a fast and reliable network connection to a file server.

Turbo is a set of software products and services developed by the Code Systems Corporation for application virtualization, portable application creation, and digital distribution. Code Systems Corporation is an American corporation headquartered in Seattle, Washington, and is best known for its Turbo products that include Browser Sandbox, Turbo Studio, TurboServer, and Turbo.

The Handle System is the Corporation for National Research Initiatives's proprietary registry assigning persistent identifiers, or handles, to information resources, and for resolving "those handles into the information necessary to locate, access, and otherwise make use of the resources".

Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF or XSRF, is a type of malicious exploit of a website or web application where unauthorized commands are submitted from a user that the web application trusts. There are many ways in which a malicious website can transmit such commands; specially-crafted image tags, hidden forms, and JavaScript fetch or XMLHttpRequests, for example, can all work without the user's interaction or even knowledge. Unlike cross-site scripting (XSS), which exploits the trust a user has for a particular site, CSRF exploits the trust that a site has in a user's browser. In a CSRF attack, an innocent end user is tricked by an attacker into submitting a web request that they did not intend. This may cause actions to be performed on the website that can include inadvertent client or server data leakage, change of session state, or manipulation of an end user's account.

In computing, rebooting is the process by which a running computer system is restarted, either intentionally or unintentionally. Reboots can be either a cold reboot in which the power to the system is physically turned off and back on again ; or a warm reboot in which the system restarts while still powered up. The term restart is used to refer to a reboot when the operating system closes all programs and finalizes all pending input and output operations before initiating a soft reboot.

Data scraping is a technique where a computer program extracts data from human-readable output coming from another program.

References

  1. Candea, George; Fox, Armando (May 2003). "Crash-only software". 9th Workshop on Hot Topics in Operating Systems. Lihue, Hawaii, USA.