Crash-only software

Last updated

Crash-only software refers to computer programs that handle failures by simply restarting, without attempting any sophisticated recovery. [1] Correctly written components of crash-only software can microreboot to a known-good state without the help of a user. Since failure-handling and normal startup use the same methods, this can increase the chance that bugs in failure-handling code will be noticed,[ clarification needed ] except when there are leftover artifacts, such as data corruption from a severe failure, that don't occur during normal startup.[ citation needed ]

Contents

Crash-only software also has benefits for end-users. All too often, applications do not save their data and settings while running, only at the end of their use. For example, word processors usually save settings when they are closed. A crash-only application is designed to save all changed user settings soon after they are changed, so that the persistent state matches that of the running machine. No matter how an application terminates (be it a clean close or the sudden failure of a laptop battery), the state will persist.

See also

Related Research Articles

In telecommunication, provisioning involves the process of preparing and equipping a network to allow it to provide new services to its users. In National Security/Emergency Preparedness telecommunications services, "provisioning" equates to "initiation" and includes altering the state of an existing priority service or capability.

<span class="mw-page-title-main">Crash (computing)</span> When a computer program stops functioning properly and self-terminates

In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. On some operating systems or individual applications, a crash reporting service will report the crash and any details relating to it, usually to the developer(s) of the application. If the program is a critical part of the operating system, the entire system may crash or hang, often resulting in a kernel panic or fatal system error.

Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly important for long running applications that are executed in failure-prone computing systems.

In computer science and networking in particular, a session is a time-delimited two-way link, a practical layer in the TCP/IP protocol enabling interactive expression and information exchange between two or more communication devices or ends – be they computers, automated systems, or live active users. A session is established at a certain point in time, and then ‘torn down’ - brought to an end - at some later point. An established communication session may involve more than one message in each direction. A session is typically stateful, meaning that at least one of the communicating parties needs to hold current state information and save information about the session history to be able to communicate, as opposed to stateless communication, where the communication consists of independent requests with responses.

REST is a software architectural style that was created to guide the design and development of the architecture for the World Wide Web. REST defines a set of constraints for how the architecture of a distributed, Internet-scale hypermedia system, such as the Web, should behave. The REST architectural style emphasises uniform interfaces, independent deployment of components, the scalability of interactions between them, and creating a layered architecture to promote caching to reduce user-perceived latency, enforce security, and encapsulate legacy systems.

<span class="mw-page-title-main">Windows Registry</span> Database for Microsoft Windows

The Windows Registry is a hierarchical database that stores low-level settings for the Microsoft Windows operating system and for applications that opt to use the registry. The kernel, device drivers, services, Security Accounts Manager, and user interfaces can all use the registry. The registry also allows access to counters for profiling system performance.

High-availability clusters are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.

In systems design, a fail-fast system is one which immediately reports at its interface any condition that is likely to indicate a failure. Fail-fast systems are usually designed to stop normal operation rather than attempt to continue a possibly flawed process. Such designs often check the system's state at several points in an operation, so any failures can be detected early. The responsibility of a fail-fast module is detecting errors, then letting the next-highest level of the system handle them.

In computer programming, error hiding is the practice of catching an error or exception, and then continuing without logging, processing, or reporting the error to other parts of the software. Handling errors in this manner is considered bad practice and an anti-pattern in computer programming. In languages with exception handling support, this practice is called exception swallowing.

Client Server Runtime Subsystem, or csrss.exe, is a component of the Windows NT family of operating systems that provides the user mode side of the Win32 subsystem and is included in Windows NT 3.1 and later. Because most of the Win32 subsystem operations have been moved to kernel mode drivers in Windows NT 4 and later, CSRSS is mainly responsible for Win32 console handling and GUI shutdown. It is critical to system operation; therefore, terminating this process will result in system failure. Under normal circumstances, CSRSS cannot be terminated with the taskkill command or with Windows Task Manager, although it is possible in Windows Vista if the Task Manager is run in Administrator mode. On Windows 7 and later, Task Manager will inform the user that terminating the process may result in system failure, and prompt if they want to continue. In Windows NT 4.0 however, terminating CSRSS without the Session Manager Subsystem (SMSS) watching will not crash the system. However, in Windows XP, terminating CSRSS without SMSS watching will crash the system due to the critical bit being set in RAM for csrss.exe.

<span class="mw-page-title-main">Windows Error Reporting</span> Crash reporting technology

Windows Error Reporting (WER) is a crash reporting technology introduced by Microsoft with Windows XP and included in later Windows versions and Windows Mobile 5.0 and 6.0. Not to be confused with the Dr. Watson debugging tool which left the memory dump on the user's local machine, Windows Error Reporting collects and offers to send post-error debug information using the Internet to Microsoft when an application crashes or stops responding on a user's desktop. No data is sent without the user's consent. When a crash dump reaches the Microsoft server, it is analyzed, and information about a solution is sent back to the user if available. Solutions are served using Windows Error Reporting Responses. Windows Error Reporting runs as a Windows service. Kinshuman Kinshumann is the original architect of WER. WER was also included in the Association for Computing Machinery (ACM) hall of fame for its impact on the computing industry.

Stress testing is a software testing activity that determines the robustness of software by testing beyond the limits of normal operation. Stress testing is particularly important for "mission critical" software, but is used for all types of software. Stress tests commonly put a greater emphasis on robustness, availability, and error handling under a heavy load, than on what would be considered correct behavior under normal circumstances.

Oracle Secure Global Desktop (SGD) software provides secure access to both published applications and published desktops running on Microsoft Windows, Unix, mainframe and IBM i systems via a variety of clients ranging from fat PCs to thin clients such as Sun Rays.

A roaming user profile is a file synchronization concept in the Windows NT family of operating systems that allows users with a computer joined to a Windows domain to log on to any computer on the same domain and access their documents and have a consistent desktop experience, such as applications remembering toolbar positions and preferences, or the desktop appearance staying the same, while keeping all related files stored locally, to not continuously depend on a fast and reliable network connection to a file server.

Turbo is a set of software products and services developed by the Code Systems Corporation for application virtualization, portable application creation, and digital distribution. Code Systems Corporation is an American corporation headquartered in Seattle, Washington, and is best known for its Turbo products that include Browser Sandbox, Turbo Studio, TurboServer, and Turbo.

Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF or XSRF, is a type of malicious exploit of a website or web application where unauthorized commands are submitted from a user that the web application trusts. There are many ways in which a malicious website can transmit such commands; specially-crafted image tags, hidden forms, and JavaScript fetch or XMLHttpRequests, for example, can all work without the user's interaction or even knowledge. Unlike cross-site scripting (XSS), which exploits the trust a user has for a particular site, CSRF exploits the trust that a site has in a user's browser. In a CSRF attack, an innocent end user is tricked by an attacker into submitting a web request that they did not intend. This may cause actions to be performed on the website that can include inadvertent client or server data leakage, change of session state, or manipulation of an end user's account.

In computing, rebooting is the process by which a running computer system is restarted, either intentionally or unintentionally. Reboots can be either a cold reboot in which the power to the system is physically turned off and back on again ; or a warm reboot in which the system restarts while still powered up. The term restart is used to refer to a reboot when the operating system closes all programs and finalizes all pending input and output operations before initiating a soft reboot.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by the Cloud Native Computing Foundation.

Data scraping is a technique where a computer program extracts data from human-readable output coming from another program.

References

  1. Candea, George; Fox, Armando (May 2003). "Crash-only software". 9th Workshop on Hot Topics in Operating Systems. Lihue, Hawaii, USA.