Backup site

Last updated

A backup site (also work area recovery site [1] or just recovery site) is a location where an organization can relocate following a disaster, such as fire, flood, terrorist threat, or other disruptive event. This is an integral part of the disaster recovery plan and wider business continuity planning of an organization. [2]

Contents

A backup, or alternate, site can be another data center location which is either operated by the organization, or contracted via a company that specializes in disaster recovery services. In some cases, one organization will have an agreement with a second organization to operate a joint backup site. In addition, an organization may have a reciprocal agreement with another organization to set up a site at each of their data centers.

Sites are generally classified based on how prepared they are and the speed with which they can be brought into operation: "cold" (facility is prepared), "warm" (equipment is in place), "hot" (operational data is loaded) –- with increasing cost to implement and maintain with increasing "temperature".

Classification

Cold site

A cold site is operational space with basic facilities like raised floors, air conditioning, power and communication lines etc. Following an incident, equipment is brought in and set up to resume operations. It does not include backed-up copies of data and information from the original location of the organization, nor does it include hardware already set up. The lack of provisioned hardware contributes to the minimal start-up costs of the cold site, but requires additional time following the disaster to have the operation running at a capacity similar to that prior to the disaster. In some cases, a cold site may have equipment available, but it is not operational.

Warm site

A warm site is a compromise between hot and cold. These sites will have hardware and connectivity already established -- though on a smaller scale. Warm sites might have backups on hand, but they may be incomplete and may be between several days to a week old. The recovery of pre-disaster operations will be delayed while more up-to-date backup tapes are delivered to the warm site, or network connectivity is established to recover data from a remote backup site.

Hot site

A hot site is a near duplicate of the original site of the organization, including full computer systems as well as complete backups of user data. Real-time synchronization between the two sites may be used to completely mirror the data environment of the original site using wide-area network links and specialized software. Following a disruption to the original site, the hot site exists so that the organization can relocate, with minimal losses to normal operations in the shortest recovery time. Ideally, a hot site will be up and running within a matter of hours. Personnel may need to be moved to the hot site, but it is possible that the hot site may be operational from a data-processing perspective before staff has relocated. The capacity of the hot site may or may not match the capacity of the original site depending on the organization's requirements. This type of backup site is the most expensive to operate. Hot sites are popular with organizations that operate real-time processes such as financial institutions, government agencies, and eCommerce providers.

The most important feature offered from a hot site is that the production environment(s) is running concurrently with the main datacenter. This synchronizing allows for minimal impact and downtime to business operations. In the event of a significant outage, the hot site can take the place of the affected site immediately. However, this level of redundancy does not come cheap, and businesses will need to weigh the cost-benefit-analysis (CBA) of hot site utilization.

These days, if the backup site is down and misses the "proactive" approach, it may not be considered a hot site depending on the level of maturity of the organization regarding the ISO 22301 approach (international standard for Business Continuity Management).

Alternate sites

Generally, an Alternate Site refers to a site where people and the equipment that they need to work is relocated for a period of time until the normal production environment, whether reconstituted or replaced, is available.

Choosing

Choosing the type of backup site to be used is decided by an organizations based on a cost vs. benefit analysis. Hot sites are traditionally more expensive than cold sites, since much of the equipment the company needs must be purchased and thus people are needed to maintain it, making the operational costs higher. However, if the same organization loses a substantial amount of revenue for each day they are inactive, then it may be worth the cost. Another advantage of a hot site is that it can be used for operations prior to a disaster happening. This load balanced production processing method can be cost effective, and will provide the users with the security of minimal downtime during an event that affects one of the data centers.

The advantages of a cold site are simple — cost. It requires fewer resources to operate a cold site because no equipment has been brought prior to the disaster. Some organizations may store older versions of the hardware in the center. This may be appropriate in a server farm environment, where old hardware could be used in many cases. The downside with a cold site is the potential cost that must be incurred in order to make the cold site effective. The costs of purchasing equipment on very short notice may be higher and the disaster may make the equipment difficult to obtain.

Commercial sites

When contracting services from a commercial provider of backup site capability, organizations should take note of contractual usage provision and invocation procedures. Providers may sign up more than one organization for a given site or facility, often depending on various service levels. This is a reasonable proposition because it is unlikely that all organizations subscribed to the service are likely to need it at the same time. It also allows the provider to offer the service at an affordable cost. However, in a large-scale incident that affects a wide area, it is likely that these facilities will become over-subscribed due to multiple customers claiming the same backup site. To gain priority in service over other customers, an organization can request a Priority Service from the provider, which often includes a higher monthly fee. This commercial site can also be used as a company's secondary production site with a full scale mirroring environment for their primary data center. Again, a higher fee will be required; but the cost could be justified by the security and resiliency of the site, which would give that organization the ability to provide its users with uninterrupted access to their data and applications.

See also

Related Research Articles

<span class="mw-page-title-main">Business continuity planning</span> Prevention and recovery from threats that might affect a company

Business continuity may be defined as "the capability of an organization to continue the delivery of products or services at pre-defined acceptable levels following a disruptive incident", and business continuity planning is the process of creating systems of prevention and recovery to deal with potential threats to a company. In addition to prevention, the goal is to enable ongoing operations before and during execution of disaster recovery. Business continuity is the intended outcome of proper execution of both business continuity planning and disaster recovery.

Total cost of ownership (TCO) is a financial estimate intended to help buyers and owners determine the direct and indirect costs of a product or service. It is a management accounting concept that can be used in full cost accounting or even ecological economics where it includes social costs.

<span class="mw-page-title-main">Data center</span> Building or room used to house computer servers and related equipment

A data center is a building, a dedicated space within a building, or a group of buildings used to house computer systems and associated components, such as telecommunications and storage systems.

IT disaster recovery (also, simply disaster recovery (DR)) is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle. DR employs policies, tools, and procedures with a focus on IT systems supporting critical business functions. This involves keeping all essential aspects of a business functioning despite significant disruptive events; it can therefore be considered a subset of business continuity (BC). DR assumes that the primary site is not immediately recoverable and restores data and services to a secondary site.

Data loss is an error condition in information systems in which information is destroyed by failures or neglect in storage, transmission, or processing. Information systems implement backup and disaster recovery equipment and processes to prevent data loss or restore lost data. Data loss can also occur if the physical medium containing the data is lost or stolen.

A remote, online, or managed backup service, sometimes marketed as cloud backup or backup-as-a-service, is a service that provides users with a system for the backup, storage, and recovery of computer files. Online backup providers are companies that provide this type of service to end users. Such backup services are considered a form of cloud computing.

Given organizations' increasing dependency on information technology (IT) to run their operations, business continuity planning covers the entire organization, while disaster recovery focuses on IT.

Unitrends Inc., a Kaseya company, is an American company specializing in backup and business continuity.

High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

<span class="mw-page-title-main">Computer operator</span> Person who oversees the running of computer systems

A computer operator is a role in IT which oversees the running of computer systems, ensuring that the machines, and computers are running properly. The job of a computer operator as defined by the United States Bureau of Labor Statistics is to "monitor and control ... and respond to ... enter commands ... set controls on computer and peripheral devices. This Excludes Data Entry."

In computing and telecommunications, downtime is a period when a system is unavailable. The unavailability is the proportion of a time-span that a system is unavailable or offline. This is usually a result of the system failing to function because of an unplanned event, or because of routine maintenance.

Hardware virtualization is the virtualization of computers as complete hardware platforms, certain logical abstractions of their componentry, or only the functionality required to run various operating systems. Virtualization emulates the hardware environment of its host architecture, allowing multiple OSes to run unmodified and in isolation. At its origins, the software that controlled virtualization was called a "control program", but the terms "hypervisor" or "virtual machine monitor" became preferred over time.

Cloud storage is a model of computer data storage in which data, said to be on "the cloud", is stored remotely in logical pools and is accessible to users over a network, typically the Internet. The physical storage spans multiple servers, and the physical environment is typically owned and managed by a cloud computing provider. These cloud storage providers are responsible for keeping the data available and accessible, and the physical environment secured, protected, and running. People and organizations buy or lease storage capacity from the providers to store user, organization, or application data.

<span class="mw-page-title-main">Cloud computing</span> Form of shared internet-based computing

"Cloud computing is a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to ISO.

In information technology, real-time recovery (RTR) is the ability to recover a piece of IT infrastructure such as a server from an infrastructure failure or human-induced error in a time frame that has minimal impact on business operations. Real-time recovery focuses on the most appropriate technology for restores, thus reducing the Recovery Time Objective (RTO) to minutes, Recovery Point Objectives (RPO) to within 15 minutes ago, and minimizing Test Recovery Objectives (TRO), which is the ability to test and validate that backups have occurred correctly without impacting production systems.

Continuous availability is an approach to computer system and application design that protects users against downtime, whatever the cause and ensures that users remain connected to their documents, data files and business applications. Continuous availability describes the information technology methods to ensure business continuity.

Granular configuration automation (GCA) is a specialized area in the field of configuration management which focuses on visibility and control of an IT environment's configuration and bill-of-material at the most granular level. This framework focuses on improving the stability of IT environments by analyzing granular information. It responds to the requirement to determine a threat level of an environment risk, and to allow IT organizations to focus on those risks with the highest impact on performance. Granular configuration automation combines two major trends in configuration management: the move to collect detailed and comprehensive environment information and the growing utilization of automation tools.

Managed private cloud refers to a principle in software architecture where a single instance of the software runs on a server, serves a single client organization (tenant), and is managed by a third party. The third-party provider is responsible for providing the hardware for the server and also for preliminary maintenance. This is in contrast to multitenancy, where multiple client organizations share a single server, or an on-premises deployment, where the client organization hosts its software instance.

Disk-based backup refers to technology that allows one to back up large amounts of data to a disk storage unit. It is often supplemented by tape drives for data archival or replication to another facility for disaster recovery. Backup-to-disk is a popular in enterprise use for both technical and business reasons. Storage devices have gotten faster access time and higher storage capacity. There are different forms of disks used for back up, standard mechanical disks and solid state disks.

Data center management is the collection of tasks performed by those responsible for managing ongoing operation of a data center. This includes Business service management and planning for the future.

References

  1. Paul Kirvan. "Checklist for work area recovery site planning". SearchDisasterRecovery.com.
  2. Baraniuk, Chris (23 March 2020). "How firms move to secret offices amid Covid-19". BBC.

General references