Switchover

Last updated

Switchover is the manual switch from one system to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active server, system, or network, or to perform system maintenance, such as installing patches, and upgrading software or hardware. [1]

Automatic switchover of a redundant system on an error condition, without human intervention, is called failover. Manual switchover on error would be used if automatic failover is not available, possibly because the overall system is too complex.

See also

Related Research Articles

<span class="mw-page-title-main">Failover</span> Automatic switching from failed computer system to standby computers

Failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network in a computer network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.

MySQL Cluster, also known as MySQL Ndb Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

Network Load Balancing Services (NLBS) is a Microsoft implementation of clustering and load balancing that is intended to provide high availability and high reliability, as well as high scalability. NLBS is intended for applications with relatively small data sets that rarely change, and do not have long-running in-memory states. These types of applications are called stateless applications, and typically include Web, File Transfer Protocol (FTP), and virtual private networking (VPN) servers. Every client request to a stateless application is a separate transaction, so it is possible to distribute the requests among multiple servers to balance the load. One attractive feature of NLBS is that all servers in a cluster monitor each other with a heartbeat signal, so there is no single point of failure.

Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission-critical, or even life-critical systems.

<span class="mw-page-title-main">Windows Server 2008</span> Fourth version of Windows Server, released in 2008

Windows Server 2008, codenamed "Longhorn Server", is the eighth major version of the Windows NT operating system produced by Microsoft to be released under the Windows Server brand name. It was released to manufacturing on February 4, 2008, and generally to retail on February 27, 2008. Derived from Windows Vista, Windows Server 2008 is the successor to Windows Server 2003 R2 and the predecessor to Windows Server 2008 R2. It removed support for processors without ACPI, and is the first version that includes Hyper-V.

High-availability clusters are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.

<span class="mw-page-title-main">Solaris Cluster</span> High-availability cluster software

Oracle Solaris Cluster is a high-availability cluster software product for Solaris, originally created by Sun Microsystems, which was acquired by Oracle Corporation in 2010. It is used to improve the availability of software services such as databases, file sharing on a network, electronic commerce websites, or other applications. Sun Cluster operates by having redundant computers or nodes where one or more computers continue to provide service if another fails. Nodes may be located in the same data center or on different continents.

Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by IBM as a term to describe the robustness of their mainframe computers.

A hot spare or warm spare or hot standby is used as a failover mechanism to provide reliability in system configurations. The hot spare is active and connected as part of a working system. When a key component fails, the hot spare is switched into operation. More generally, a hot standby can be used to refer to any device or system that is held in readiness to overcome an otherwise significant start-up delay.

The software which Oracle Corporation markets as Oracle Data Guard forms an extension to the Oracle relational database management system (RDBMS). It aids in establishing and maintaining secondary standby databases as alternative/supplementary repositories to production primary databases.

The Red Hat Cluster Suite (RHCS) includes software to create a high availability and load balancing cluster. Both can be used on the same system although this use case is unlikely. Both products, the High Availability Add-On and Load Balancer Add-On, are based on open-source community projects. Red Hat Cluster developers contribute code upstream for the community. Computational clustering is not part of cluster suite, but instead provided by Red Hat MRG.

Redundancy is a form of resilience that ensures system availability in the event of component failure. Components have at least one independent backup component (+1). The level of resilience is referred to as active/passive or standby as backup components do not actively participate within the system during normal operation. The level of transparency during failover is dependent on a specific solution, though degradation to system resilience will occur during failover.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.

Log shipping is the process of automating the backup of transaction log files on a primary (production) database server, and then restoring them onto a standby server. This technique is supported by Microsoft SQL Server, 4D Server, MySQL, and PostgreSQL. Similar to replication, the primary purpose of log shipping is to increase database availability by maintaining a backup server that can replace a production server quickly. Other databases such as Adaptive Server Enterprise and Oracle Database support the technique but require the Database Administrator to write code or scripts to perform the work.

Cluster Shared Volumes (CSV) is a feature of Failover Clustering first introduced in Windows Server 2008 R2 for use with the Hyper-V role. A Cluster Shared Volume is a shared disk containing an NTFS or ReFS volume that is made accessible for read and write operations by all nodes within a Windows Server Failover Cluster.

A cluster-aware application is a software application designed to call cluster APIs in order to determine its running state, in case a manual failover is triggered between cluster nodes for planned technical maintenance, or an automatic failover is required, if a computing cluster node encounters hardware or software failure, to maintain business continuity. A cluster-aware application may be capable of failing over LAN or WAN.

Moose File System (MooseFS) is an open-source, POSIX-compliant distributed file system developed by Core Technology. MooseFS aims to be fault-tolerant, highly available, highly performing, scalable general-purpose network distributed file system for data centers. Initially proprietary software, it was released to the public as open source on May 30, 2008.

Adaptable Modular Storage 2000 is the brand name of Hitachi Data Systems mid-range storage platforms.

Device Mapper Multipath Input Output often shortened to DM-Multipathing and abbreviated as DM-MPIO provides input-output (I/O) fail-over and load-balancing by using multipath I/O within Linux for block devices. By utilizing device-mapper, the multipathd daemon provides the host-side logic to use multiple paths of a redundant network to provide continuous availability and higher-bandwidth connectivity between the host server and the block-level device. DM-MPIO handles the rerouting of block I/O to an alternate path in the event of a path failure. DM-MPIO can also balance the I/O load across all of the available paths that are typically utilized in Fibre Channel (FC) and iSCSI SAN environments. DM-MPIO is based on the device mapper, which provides the basic framework that maps one block device onto another.

In computer science, a heartbeat is a periodic signal generated by hardware or software to indicate normal operation or to synchronize other parts of a computer system. Heartbeat mechanism is one of the common techniques in mission critical systems for providing high availability and fault tolerance of network services by detecting the network or systems failures of nodes or daemons which belongs to a network cluster—administered by a master server—for the purpose of automatic adaptation and rebalancing of the system by using the remaining redundant nodes on the cluster to take over the load of failed nodes for providing constant services. Usually a heartbeat is sent between machines at a regular interval in the order of seconds; a heartbeat message. If the endpoint does not receive a heartbeat for a time—usually a few heartbeat intervals—the machine that should have sent the heartbeat is assumed to have failed. Heartbeat messages are typically sent non-stop on a periodic or recurring basis from the originator's start-up until the originator's shutdown. When the destination identifies a lack of heartbeat messages during an anticipated arrival period, the destination may determine that the originator has failed, shutdown, or is generally no longer available.

References

  1. "Switchovers and Failovers". Microsoft . Retrieved November 9, 2015. A switchover is a scheduled outage of a database or server that's explicitly initiated by an administrator, typically in preparation for performing a maintenance operation