Switchover

Last updated

Switchover is the manual switch from one system to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active server, system, or network, or to perform system maintenance, such as installing patches, and upgrading software or hardware. [1]

Redundancy (engineering) Duplication of critical components to increase reliability of a system

In engineering, redundancy is the duplication of critical components or functions of a system with the intention of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing.

A computer is a machine that can be instructed to carry out sequences of arithmetic or logical operations automatically via computer programming. Modern computers have the ability to follow generalized sets of operations, called programs. These programs enable computers to perform an extremely wide range of tasks. A "complete" computer including the hardware, the operating system, and peripheral equipment required and used for "full" operation can be referred to as a computer system. This term may as well be used for a group of computers that are connected and work together, in particular a computer network or computer cluster.

Server (computing) Computer to access a central resource or service on a network

In computing, a server is a computer program or a device that provides functionality for other programs or devices, called "clients". This architecture is called the client–server model, and a single overall computation is distributed across multiple processes or devices. Servers can provide various functionalities, often called "services", such as sharing data or resources among multiple clients, or performing computation for a client. A single server can serve multiple clients, and a single client can use multiple servers. A client process may run on the same device or may connect over a network to a server on a different device. Typical servers are database servers, file servers, mail servers, print servers, web servers, game servers, and application servers.

Automatic switchover of a redundant system on an error condition, without human intervention, is called failover. Manual switchover on error would be used if automatic failover is not available, possibly because the overall system is too complex.

In computing and related technologies such as networking, failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.

See also

Safety engineering engineering discipline which assures that engineered systems provide acceptable levels of safety

Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail.

Data integrity is the maintenance of, and the assurance of the accuracy and consistency of, data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data. The term is broad in scope and may have widely different meanings depending on the specific context – even under the same general umbrella of computing. It is at times used as a proxy term for data quality, while data validation is a pre-requisite for data integrity. Data integrity is the opposite of data corruption. The overall intent of any data integrity technique is the same: ensure data is recorded exactly as intended and upon later retrieval, ensure the data is the same as it was when it was originally recorded. In short, data integrity aims to prevent unintentional changes to information. Data integrity is not to be confused with data security, the discipline of protecting data from unauthorized parties.

High-availability clusters are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.

Related Research Articles

Load balancing (computing) set of techniques to improve the distribution of workloads across multiple computing resources

In computing, load balancing improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units, or disk drives. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy. Load balancing usually involves dedicated software or hardware, such as a multilayer switch or a Domain Name System server process.

Network Load Balancing Services (NLBS) is a Microsoft implementation of clustering and load balancing that is intended to provide high availability and high reliability, as well as high scalability. NLBS is intended for applications with relatively small data sets that rarely change, and do not have long-running in-memory states. These types of applications are called stateless applications, and typically include Web, File Transfer Protocol (FTP), and virtual private networking (VPN) servers. Every client request to a stateless application is a separate transaction, so it is possible to distribute the requests among multiple servers to balance the load. One attractive feature of NLBS is that all servers in a cluster monitor each other with a heartbeat signal, so there is no single point of failure.

Network monitoring is the use of a system that constantly monitors a computer network for slow or failing components and that notifies the network administrator in case of outages or other trouble. Network monitoring is part of network management.

Windows Server 2008 server operating system by Microsoft released in 2008

Windows Server 2008 is a server operating system produced by Microsoft. It was released to manufacturing on February 4, 2008, and reached general availability on February 27, 2008. It is the successor of Windows Server 2003, released nearly five years earlier.

Reliability, availability and serviceability (RAS) is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by International Business Machines (IBM) as a term to describe the robustness of their mainframe computers.

A hot spare or warm spare or hot standby is used as a failover mechanism to provide reliability in system configurations. The hot spare is active and connected as part of a working system. When a key component fails, the hot spare is switched into operation. More generally, a hot standby can be used to refer to any device or system that is held in readiness to overcome an otherwise significant start-up delay.

High availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

The software which Oracle Corporation markets as Oracle Data Guard forms an extension to the Oracle relational database management system (RDBMS). It aids in establishing and maintaining secondary standby databases as alternative/supplementary repositories to production primary databases.

The term downtime is used to refer to periods when a system is unavailable. Downtime or outage duration refers to a period of time that a system fails to provide or perform its primary function. Reliability, availability, recovery, and unavailability are related concepts. The unavailability is the proportion of a time-span that a system is unavailable or offline. This is usually a result of the system failing to function because of an unplanned event, or because of routine maintenance.

N+1 redundancy is a form of resilience that ensures system availability in the event of component failure. Components (N) have at least one independent backup component (+1). The level of resilience is referred to as active/passive or standby as backup components do not actively participate within the system during normal operation. The level of transparency during failover is dependent on a specific solution, though degradation to system resilience will occur during failover.

A database shard is a horizontal partition of data in a database or search engine. Each individual partition is referred to as a shard or database shard. Each shard is held on a separate database server instance, to spread load.

Log shipping is the process of automating the backup of transaction log files on a primary (production) database server, and then restoring them onto a standby server. This technique is supported by Microsoft SQL Server, 4D Server, MySQL, and PostgreSQL. Similar to replication, the primary purpose of log shipping is to increase database availability by maintaining a backup server that can replace a production server quickly. Other databases such as Adaptive Server Enterprise and Oracle Database support the technique but require the Database Administrator to write code or scripts to perform the work.

Cluster Shared Volumes (CSV) is a feature of Failover Clustering first introduced in Windows Server 2008 R2 for use with the Hyper-V role. A Cluster Shared Volume is a shared disk containing an NTFS or ReFS volume that is made accessible for read and write operations by all nodes within a Windows Server Failover Cluster.

A cluster-aware application is a software application designed to call cluster APIs in order to determine its running state, in case a manual failover is triggered between cluster nodes for planned technical maintenance, or an automatic failover is required, if a computing cluster node encounters hardware or software failure, to maintain business continuity. A cluster-aware application may be capable of failing over LAN or WAN.

Adaptable Modular Storage 2000 is the brand name of Hitachi Data Systems mid-range storage platforms.

SQL Server Agent is a component of Microsoft SQL Server which schedules jobs and handles other automated tasks. It runs as a Windows service so it can start automatically when the system boots or it can be started manually.

DM-Multipathing (DM-MPIO) provides input-output (I/O) fail-over and load-balancing by using multipath I/O within Linux for block devices. By utilizing device-mapper, the multipathd daemon provides the host-side logic to use multiple paths of a redundant network to provide continuous availability and higher-bandwidth connectivity between the host server and the block-level device. DM-MPIO handles the rerouting of block I/O to an alternate path in the event of a path failure. DM-MPIO can also balance the I/O load across all of the available paths that are typically utilized in Fibre Channel (FC) and iSCSI SAN environments. DM-MPIO is based on the device mapper, which provides the basic framework that maps one block device onto another.

Wireless failover is an automated function in telephone networks and computer networks where a standard hardwired connection is switched to a redundant wireless connection upon failure or irregular closure of a default hardwired connection or component in the network such as a router, server, or computer.

References

  1. "Switchovers and Failovers". Microsoft . Retrieved November 9, 2015. A switchover is a scheduled outage of a database or server that's explicitly initiated by an administrator, typically in preparation for performing a maintenance operation