Replication (computing)

Last updated

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

Computing Activity that uses computers

Computing is any activity that uses computers. It includes developing hardware and software, and using computers to manage, process, and communicate information for various purposes. Computing is a critically important, integral component of modern industrial technology. Major computing disciplines include computer engineering, software engineering, computer science, information systems, and information technology.

Software non-tangible executable component of a computer

Computer software, or simply software, is a collection of data or computer instructions that tell the computer how to work. This is in contrast to physical hardware, from which the system is built and actually performs the work. In computer science and software engineering, computer software is all information processed by computer systems, programs and data. Computer software includes computer programs, libraries and related non-executable data, such as online documentation or digital media. Computer hardware and software require each other and neither can be realistically used on its own.

Computer hardware physical components of a computer system

Computer hardware includes the physical, tangible parts or components of a computer, such as the cabinet, central processing unit, monitor, keyboard, computer data storage, graphics card, sound card, speakers and motherboard. By contrast, software is instructions that can be stored and run by hardware. Hardware is so-termed because it is "hard" or rigid with respect to changes or modifications; whereas software is "soft" because it is easy to update or change. Intermediate between software and hardware is "firmware", which is software that is strongly coupled to the particular hardware of a computer system and thus the most difficult to change but also among the most stable with respect to consistency of interface. The progression from levels of "hardness" to "softness" in computer systems parallels a progression of layers of abstraction in computing.

Contents

Terminology

Replication in computing can refer to:

Replication in space or in time is often linked to scheduling algorithms. [1]

Access to a replicated entity is typically uniform with access to a single non-replicated entity. The replication itself should be transparent to an external user. In a failure scenario, a failover of replicas should be hidden as much as possible with respect to quality of service (QoS). [2]

In computing and related technologies such as networking, failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.

Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network or a cloud computing service, particularly the performance seen by the users of the network. To quantitatively measure quality of service, several related aspects of the network service are often considered, such as packet loss, bit rate, throughput, transmission delay, availability, jitter, etc.

Computer scientists further describe replication as being either:

When one master replica is designated to process all the requests, the system is using a primary-backup or master-slave scheme, which is predominant in high-availability clusters. In comparison, if any replica can process a request and distribute a new state, the system is using a multi-primary or multi-master scheme. In the latter case, some form of distributed concurrency control must be used, such as a distributed lock manager.

High-availability clusters are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to be running as well.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group, and resolving any conflicts that might arise between concurrent changes made by different members.

Distributed concurrency control is the concurrency control of a system distributed over a computer network.

Load balancing differs from task replication, since it distributes a load of different computations across machines, and allows a single computation to be dropped in case of failure. Load balancing, however, sometimes uses data replication (especially multi-master replication) internally, to distribute its data among machines.

Load balancing (computing) set of techniques to improve the distribution of workloads across multiple computing resources

In computing, load balancing improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units, or disk drives. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy. Load balancing usually involves dedicated software or hardware, such as a multilayer switch or a Domain Name System server process.

Backup differs from replication in that the saved copy of data remains unchanged for a long period of time. [3] Replicas, on the other hand, undergo frequent updates and quickly lose any historical state. Replication is one of the oldest and most important topics in the overall area of distributed systems.

Data replication and computation replication both require processes to handle incoming events. Processes for data replication are passive and operate only to maintain the stored data, reply to read requests and apply updates. Computation replication is usually performed to provide fault-tolerance, and take over an operation if one component fails. In both cases, the underlying needs are to ensure that the replicas see the same events in equivalent orders, so that they stay in consistent states and any replica can respond to queries.

Replication models in distributed systems

Three widely cited models exist for data replication, each having its own properties and performance:

Database replication

Database replication can be used on many database management systems (DBMS), usually with a master-slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. Each slave outputs a message stating that it has received the update successfully, thus allowing the sending of subsequent updates.

In Multi-master replication, updates can be submitted to any database node, and then ripple through to other servers. This is often desired but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or resolution. Most synchronous (or eager) replication solutions perform conflict prevention, while asynchronous (or lazy) solutions have to perform conflict resolution. For instance, if the same record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. A lazy replication system would allow both transactions to commit and run a conflict resolution during resynchronization. [6] The resolution of such a conflict may be based on a timestamp of the transaction, on the hierarchy of the origin nodes or on much more complex logic, which decides consistently across all nodes.

Database replication becomes more complex when it scales up, which is usually described in two dimensions: horizontal and vertical. Horizontal scale-up has more data replicas, while vertical scale-up has data replicas located at greater physical distances. Problems raised by horizontal scale-up can be alleviated by a multi-layer multi-view access protocol. The early problems of vertical scale-up have largely been addressed by improving Internet reliability and performance. [7] . [8]

When data is replicated between database servers, so that the information remains consistent throughout the database system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit replication transparency.

Disk storage replication

Storage replication Storage replication.png
Storage replication

Active (real-time) storage replication is usually implemented by distributing updates of a block device to several physical hard disks. This way, any file system supported by the operating system can be replicated without modification, as the file system code works on a level above the block device driver layer. It is implemented either in hardware (in a disk array controller) or in software (in a device driver).

The most basic method is disk mirroring, typical for locally connected disks. The storage industry narrows the definitions, so mirroring is a local (short-distance) operation. A replication is extendable across a computer network, so that the disks can be located in physically distant locations, and the master-slave database replication model is usually applied. The purpose of replication is to prevent damage from failures or disasters that may occur in one location – or in case such events do occur, to improve the ability to recover data. For replication, latency is the key factor because it determines either how far apart the sites can be or the type of replication that can be employed.

The main characteristic of such cross-site replication is how write operations are handled, through either synchronous or asynchronous replication. The main difference is that synchronous replication needs to wait for the destination server in any write operation.

Synchronous replication guarantees "zero data loss" by the means of atomic write operations, where the write operation is not considered complete until acknowledged by both the local and remote storage. Most applications wait for a write transaction to complete before proceeding with further work, hence overall performance decreases considerably. Inherently, performance drops proportionally to distance, as minimum latency is dictated by the speed of light. For 10 km distance, the fastest possible roundtrip takes 67 μs, whereas an entire local cached write completes in about 10–20 μs.

In Asynchronous replication, the write operation is considered complete as soon as local storage acknowledges it. Remote storage is updated with a small lag. Performance is greatly increased, but in case of a local storage failure, the remote storage is not guaranteed to have the current copy of data (the most recent data may be lost).

Semi-synchronous replication typically considers a write operation complete when acknowledged by local storage and received or logged by the remote server. The actual remote write is performed asynchronously, resulting in better performance but remote storage will lag behind the local storage, so that there is no guarantee of durability (i.e.: seamless transparency) in the case of local storage failure.[ citation needed ]

Point-in-time replication produces periodic snapshots which are replicated instead of primary storage. This is intended to replicate only the changed data instead of the entire volume. As less information is replicated using this method, replication can occur over less-expensive bandwidth links such as iSCSI or T1 instead of fiber optic lines.

Implementations

Many distributed filesystems use replication to ensure fault tolerance and avoid a single point of failure.

Many commercial synchronous replication systems do not freeze when the remote replica fails or loses connection – behaviour which guarantees zero data loss – but proceed to operate locally, losing the desired zero recovery point objective.

Techniques of wide-area network (WAN) optimization can be applied to address the limits imposed by latency.

File-based replication

File-based replication conducts data replication at the logical level (i.e.: individual data files) rather than at the storage block level. There are many different ways of performing this, which almost exclusively rely on software.

Capture with a kernel driver

A kernel driver (specifically a filter driver) can be used to intercept calls to the filesystem functions, capturing any activity as it occurs. This utilises the same type of technology that real-time active virus checkers employ. At this level, logical file operations are captured like file open, write, delete, etc. The kernel driver transmits these commands to another process, generally over a network to a different machine, which will mimic the operations of the source machine. Like block-level storage replication, the file-level replication allows both synchronous and asynchronous modes. In synchronous mode, write operations on the source machine are held and not allowed to occur until the destination machine has acknowledged the successful replication. Synchronous mode is less common with file replication products although a few solutions exist.

File-level replication solutions allow for informed decisions about replication based on the location and type of the file. For example, temporary files or parts of a filesystem that hold no business value could be excluded. The data transmitted can also be more granular; if an application writes 100 bytes, only the 100 bytes are transmitted instead of a complete disk block (generally 4096 bytes). This substantially reduces the amount of data sent from the source machine and the storage burden on the destination machine.

Drawbacks of this software-only solution include the requirement for implementation and maintenance on the operating system level, and an increased burden on the machine's processing power.

File system journal replication

Similarly to database transaction logs, many file systems have the ability to journal their activity. The journal can be sent to another machine, either periodically or in real time by streaming. On the replica side, the journal can be used to play back file system modifications.

One of the notable implementations is Microsoft's System Center Data Protection Manager (DPM), released in 2005, which performs periodic updates but does not offer real-time replication.[ citation needed ]

Batch replication

This is the process of comparing the source and destination file systems and ensuring that the destination matches the source. The key benefit is that such solutions are generally free or inexpensive. The downside is that the process of synchronizing them is quite system-intensive, and consequently this process generally runs infrequently.

One of the notable implementations is rsync.

Distributed shared memory replication

Another example of using replication appears in distributed shared memory systems, where many nodes of the system share the same page of memory. This usually means that each node has a separate copy (replica) of this page.

Primary-backup and multi-primary replication

Many classical approaches to replication are based on a primary-backup model where one device or process has unilateral control over one or more other processes or devices. For example, the primary might perform some computation, streaming a log of updates to a backup (standby) process, which can then take over if the primary fails. This approach is common for replicating databases, despite the risk that if a portion of the log is lost during a failure, the backup might not be in a state identical to the primary, and transactions could then be lost.

A weakness of primary-backup schemes is that only one is actually performing operations. Fault-tolerance is gained, but the identical backup system doubles the costs. For this reason, starting c.1985, the distributed systems research community began to explore alternative methods of replicating data. An outgrowth of this work was the emergence of schemes in which a group of replicas could cooperate, with each process acting as a backup while also handling a share of the workload.

Computer scientist Jim Gray analyzed multi-primary replication schemes under the transactional model and published a widely cited paper skeptical of the approach "The Dangers of Replication and a Solution". [9] [10] He argued that unless the data splits in some natural way so that the database can be treated as nn disjoint sub-databases, concurrency control conflicts will result in seriously degraded performance and the group of replicas will probably slow as a function of n. Gray suggested that the most common approaches are likely to result in degradation that scales as O(n³). His solution, which is to partition the data, is only viable in situations where data actually has a natural partitioning key.

In the 1985–1987, the virtual synchrony model was proposed and emerged as a widely adopted standard (it was used in the Isis Toolkit, Horus, Transis, Ensemble, Totem, Spread, C-Ensemble, Phoenix and Quicksilver systems, and is the basis for the CORBA fault-tolerant computing standard). Virtual synchrony permits a multi-primary approach in which a group of processes cooperates to parallelize some aspects of request processing. The scheme can only be used for some forms of in-memory data, but can provide linear speedups in the size of the group.

A number of modern products support similar schemes. For example, the Spread Toolkit supports this same virtual synchrony model and can be used to implement a multi-primary replication scheme; it would also be possible to use C-Ensemble or Quicksilver in this manner. WANdisco permits active replication where every node on a network is an exact copy or replica and hence every node on the network is active at one time; this scheme is optimized for use in a wide area network (WAN).

See also

Related Research Articles

A distributed database is a database in which not all storage devices are attached to a common processor. It may be stored in multiple computers, located in the same physical location; or may be dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.

Scalability

Scalability is the property of a system to handle a growing amount of work by adding resources to the system.

A transaction symbolizes a unit of work performed within a database management system against a database, and treated in a coherent and reliable way independent of other transactions. A transaction generally represents any change in a database. Transactions in a database environment have two main purposes:

  1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure, when execution stops and many operations upon a database remain uncompleted, with unclear status.
  2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the programs' outcomes are possibly erroneous.

In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. The system is said to support a given model if operations on memory follow specific rules. The data consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, memory will be consistent and the results of reading, writing, or updating memory will be predictable. This is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

Google File System is a proprietary distributed file system developed by Google to provide efficient, reliable access to data using large clusters of commodity hardware. A new version of Google File System code named Colossus was released in 2010.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It's also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

Disk mirroring replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability

In data storage, disk mirroring is the replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability. It is most commonly used in RAID 1. A mirrored volume is a complete logical representation of separate volume copies.

Transaction processing is a way of computing that divides work into individual, indivisible operations, called transactions. A transaction processing system (TPS) is a software system, or software/hardware combination, that supports transaction processing.

A fundamental problem in distributed computing and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes. This often requires processes to agree on some data value that is needed during computation. Examples of applications of consensus include whether to commit a transaction to a database, agreeing on the identity of a leader, state machine replication, and atomic broadcasts. The real world applications include clock synchronization, PageRank, opinion formation, smart power grids, state estimation, control of UAVs, load balancing, blockchain and others.

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardware—still the common use—it has also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Cellular automata, as with other multi-agent system models, usually treat time as discrete and state updates as occurring synchronously. The state of every cell in the model is updated together, before any of the new states influence other cells. In contrast, an asynchronous cellular automaton is able to update individual cells independently, in such a way that the new state of a cell affects the calculation of states in neighbouring cells.

Optimistic replication is a strategy for replication in which replicas are allowed to diverge.

Versant Object Database (VOD) is an object database software product developed by Versant Corporation.

Data grid

A data grid is an architecture or set of services that gives individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request. The data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data. Likewise, multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be equally applied to the replicas. Specifically developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible. The adjacent diagram depicts a high level view of a data grid.

Oracle NoSQL Database

Oracle NoSQL Database (OND) is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

In the high-performance computing environment, burst buffer is a fast and intermediate storage layer positioned between the front-end computing processes and the back-end storage systems. It emerges as a timely storage solution to bridge the ever-increasing performance gap between the processing speed of the compute nodes and the Input/output (I/O) bandwidth of the storage systems. Burst buffer is built from arrays of high-performance storage devices, such as NVRAM and SSD. It typically offers from one to two orders of magnitude higher I/O bandwidth than the back-end storage systems.

Apache Ignite

Apache Ignite is an open-source distributed database, caching and processing platform designed to store and compute on large volumes of data across a cluster of nodes.

References

  1. Mansouri, Najme, GholamHosein Dastghaibyfard, and Ehsan Mansouri. "Combination of data replication and scheduling algorithm for improving data availability in Data Grids." Journal of Network and Computer Applications (2013)
  2. V. Andronikou, K. Mamouras, K. Tserpes, D. Kyriazis, T. Varvarigou, Dynamic QoS-aware Data Replication in Grid Environments, Elsevier Future Generation Computer Systems - The International Journal of Grid Computing and eScience, 2012
  3. "Backup and Replication: What is the Difference?". Zerto. February 6, 2012.
  4. Marton Trencseni, Attila Gazso (2009). "Keyspace: A Consistently Replicated, Highly-Available Key-Value Store" . Retrieved 2010-04-18.
  5. Mike Burrows (2006). "The Chubby Lock Service for Loosely-Coupled Distributed Systems". Archived from the original on 2010-02-09. Retrieved 2010-04-18.
  6. "Conflict Resolution". ITTIA. Retrieved 21 October 2016.
  7. Dragan Simic; Srecko Ristic; Slobodan Obradovic (April 2007). "Measurement of the Achieved Performance Levels of the WEB Applications With Distributed Relational Database" (PDF). Electronics and Energetics. Facta Universitatis. p. 31–43. Retrieved 30 January 2014.
  8. Mokadem Riad; Hameurlain Abdelkader (December 2014). "Data Replication Strategies with Performance Objective in Data Grid Systems: A Survey" (PDF). Internal journal of grid and utility computing. Underscience Publisher. p. 30–46. Retrieved 18 December 2014.
  9. The Dangers of Replication and a Solution
  10. Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data: SIGMOD '99, Philadelphia, PA, US; June 1–3, 1999, Volume 28; p. 3.