Backup validation

Last updated

Backup validation is the process whereby owners of computer data may examine how their data was backed up in order to understand what their risk of data loss might be. It also speaks to optimization of such processes, charging for them as well as estimating future requirements, sometimes called capacity planning.

Contents

History

Over the past several decades (leading up to 2005), organizations (banks, governments, schools, manufacturers and others) have increased their reliance more on "Open Systems" and less on "Closed Systems". For example, 25 years ago, a large bank might have most if not all of its critical data housed in an IBM mainframe computer (a "Closed System"), but today, that same bank might store a substantially greater portion of its critical data in spreadsheets, databases, or even word processing documents (i.e., "Open Systems"). The problem with Open Systems is, primarily, their unpredictable nature. The very nature of an Open System is that it is exposed to potentially thousands if not millions of variables ranging from network overloads to computer virus attacks to simple software incompatibility. Any one, or indeed several in combination, of these factors may result in either lost data and/or compromised data backup attempts. These types of problems do not generally occur on Closed Systems, or at least, in unpredictable ways. In the "old days", backups were a nicely contained affair. Today, because of the ubiquity of, and dependence upon, Open Systems, an entire industry has developed around data protection. Three key elements of such data protection are Validation, Optimization and Chargeback.

Validation

Validation is the process of finding out whether a backup attempt succeeded or not, or, whether the data is backed up enough to consider it "protected". This process usually involves the examination of log files, the "smoking gun" often left behind after a backup attempts takes place, as well as media databases, data traffic and even magnetic tapes. Patterns can be detected, key error messages identified and statistics extracted in order to determine which backups worked and which did not. According to Veeam Availability Report in 2014 organizations test their backups for recoverability on average every eight days. However, each quarter, organizations only test an average of 5.26 percent of their backups, meaning that the vast majority of backups are not verified, so could fail and cause downtime.

Some backup software's validation consists solely of examining the backup file to see if it can be read by the backup program. That is a useful part of validation, but as an entire validation process, it's useless.

A proper validation process consists of at least two processes. Validation of a backup file is of little or no use unless it compares the backup file's data to the data of the source. Additionally, "validation" is an unknown unless it's known with certainty that the backup file can actually restore the source's data.

Optimization

Optimization is the process of examining productivity patterns in the process of backup to determine where improvements can be made and often, where certain (less important) backup jobs may be eliminated entirely.

Chargeback

Very often, the service of backing up data is done by one person (or persons) in the service of others, the Owners of the data. Becoming more prevalent today also is the charging for those services back to the data owners. A simple fee per backup might be agreed upon, or, as is more often the case, a complex charge based on success rates, speed, size, frequency and retention (how long the copy is kept) is put into place. Usually some form of service level agreement (SLA) is in place between the backup service provider and the data owner in which it is agreed what is to be done and how the service is to be charged for.

See also

Related Research Articles

<span class="mw-page-title-main">Legacy system</span> Old computing technology or system that remains in use

In computing, a legacy system is an old method, technology, computer system, or application program, "of, relating to, or being a previous or outdated computer system", yet still in use. Often referencing a system as "legacy" means that it paved the way for the standards that would follow it. This can also imply that the system is out of date or in need of replacement.

<span class="mw-page-title-main">Software testing</span> Checking software against a standard

Software testing is the act of checking whether software satisfies expectations.

Computerized batch processing is a method of running software programs called jobs in batches automatically. While users are required to submit the jobs, no other interaction by the user is required to process the batch. Batches may automatically be run at scheduled times as well as being run contingent on the availability of computer resources.

<span class="mw-page-title-main">Server (computing)</span> Computer to access a central resource or service on a network

A server is a computer that provides information to other computers called "clients" on computer network. This architecture is called the client–server model. Servers can provide various functionalities, often called "services", such as sharing data or resources among multiple clients or performing computations for a client. A single server can serve multiple clients, and a single client can use multiple servers. A client process may run on the same device or may connect over a network to a server on a different device. Typical servers are database servers, file servers, mail servers, print servers, web servers, game servers, and application servers.

Electronic data processing (EDP) or business information processing can refer to the use of automated methods to process commercial data. Typically, this uses relatively simple, repetitive activities to process large volumes of similar information. For example: stock updates applied to an inventory, banking transactions applied to account and customer master files, booking and ticketing transactions to an airline's reservation system, billing for utility services. The modifier "electronic" or "automatic" was used with "data processing" (DP), especially c. 1960, to distinguish human clerical data processing from that done by computer.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

Utility software is a program specifically designed to help manage and tune system or application software. It is used to support the computer infrastructure - in contrast to application software, which is aimed at directly performing tasks that benefit ordinary users. However, utilities often form part of the application systems. For example, a batch job may run user-written code to update a database and may then include a step that runs a utility to back up the database, or a job may run a utility to compress a disk before copying files..

<span class="mw-page-title-main">Data corruption</span> Errors in computer data that introduce unintended changes to the original data

Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of measures to provide end-to-end data integrity, or lack of errors.

Data loss is an error condition in information systems in which information is destroyed by failures or neglect in storage, transmission, or processing. Information systems implement backup and disaster recovery equipment and processes to prevent data loss or restore lost data. Data loss can also occur if the physical medium containing the data is lost or stolen.

DNS spoofing, also referred to as DNS cache poisoning, is a form of computer security hacking in which corrupt Domain Name System data is introduced into the DNS resolver's cache, causing the name server to return an incorrect result record, e.g. an IP address. This results in traffic being diverted to any computer that the attacker chooses.

A remote, online, or managed backup service, sometimes marketed as cloud backup or backup-as-a-service, is a service that provides users with a system for the backup, storage, and recovery of computer files. Online backup providers are companies that provide this type of service to end users. Such backup services are considered a form of cloud computing.

Utility computing, or computer utility, is a service provisioning model in which a service provider makes computing resources and infrastructure management available to the customer as needed, and charges them for specific usage rather than a flat rate. Like other types of on-demand computing, the utility model seeks to maximize the efficient use of resources and/or minimize associated costs. Utility is the packaging of system resources, such as computation, storage and services, as a metered service. This model has the advantage of a low or no initial cost to acquire computer resources; instead, resources are essentially rented.

A job scheduler is a computer application for controlling unattended background program execution of jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional job and batch are distinguished and contrasted; see that page for details. Other synonyms include batch system, distributed resource management system (DRMS), distributed resource manager (DRM), and, commonly today, workload automation (WLA). The data structure of jobs to run is known as the job queue.

Open-source software development (OSSD) is the process by which open-source software, or similar software whose source code is publicly available, is developed by an open-source software project. These are software products available with its source code under an open-source license to study, change, and improve its design. Examples of some popular open-source software products are Mozilla Firefox, Google Chromium, Android, LibreOffice and the VLC media player.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

Database administration is the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM Db2 and Microsoft SQL Server need ongoing management. As such, corporations that use DBMS software often hire specialized information technology personnel called database administrators or DBAs.

<span class="mw-page-title-main">Accounting software</span> Computer program that maintains account books

Accounting software is a computer program that maintains account books on computers, including recording transactions and account balances. It may depend on virtual thinking. Depending on the purpose, the software can manage budgets, perform accounting tasks for multiple currencies, perform payroll and customer relationship management, and prepare financial reporting. Work to have accounting functions be implemented on computers goes back to the earliest days of electronic data processing. Over time, accounting software has revolutionized from supporting basic accounting operations to performing real-time accounting and supporting financial processing and reporting. Cloud accounting software was first introduced in 2011, and it allowed the performance of all accounting functions through the internet.

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">Classes of computers</span>

Computers can be classified, or typed, in many ways. Some common classifications of computers are given below.

References