Continuous availability

Last updated

Continuous availability is an approach to computer system and application design that protects users against downtime, whatever the cause and ensures that users remain connected to their documents, data files and business applications. Continuous availability describes the information technology methods to ensure business continuity. [1] [ citation needed ]

Contents

In early days of computing, availability was not considered business critical. With the increasing use of mobile computing, global access to online business transactions and business-to-business communication, continuous availability is increasingly important based on the need to support customer access to information systems. [2]

Solutions to continuous availability exists in different forms and implementations depending on the software and hardware manufacturer. The goal of the discipline is to reduce the user or business application downtime, which can have a severe impact on business operations. Inevitably, such downtime can lead to loss of productivity, loss of revenue, customer dissatisfaction and ultimately can damage a company's reputation.

Degrees of availability

The terms high availability, continuous operation, and continuous availability are generally used to express how available a system is. [3] [4] The following is a definition of each of these terms.

High availability refers to the ability to avoid unplanned outages by eliminating single points of failure. This is a measure of the reliability of the hardware, operating system, middleware, and database manager software. Another measure of high availability is the ability to minimize the effect of an unplanned outage by masking the outage from the end users. This can be accomplished by providing redundancy or quickly restarting failed components.

Availability is usually expressed as a percentage of uptime in a given year:

AvailabilityDowntime per year
99.9%8.76 hours
99.99%1 hour
99.999%5 minutes

When defining such a percentage it needs to be specified if it applies to the hardware, the IT infrastructure or the business application on top. [5]

Continuous operation refers to the ability to avoid planned outages. For continuous operation there must be ways to perform necessary administrative work, like hardware and software maintenance, upgrades, and platform refreshes while the business application remains available to the end users. This is accomplished by providing multiple servers and switching end users to an available server at times when one server is made unavailable. Note that a system running in continuous operation is not necessarily operating with high availability because an excessive number of unplanned outages could compromise this.

Continuous availability combines the characteristics of high availability and continuous operation to provide the ability to keep the business application running without any noticeable downtime.

Types of outages

Planned outages are deliberate and are scheduled at a convenient time. These involve such activities as: - Hardware installation or maintenance - Software maintenance or upgrades of the operating system, the middleware, the database server or the business application - Database administration such as offline backup, or offline reorganization

Unplanned outages are unexpected outages that are caused by the failure of any system component. They include hardware failures, software issues, or people and process issues.

History

Various commercially viable examples exist for hardware/software implementations. These include:

See also

Related Research Articles

<span class="mw-page-title-main">Mainframe computer</span> Large and powerful computer

A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterprise resource planning, and large-scale transaction processing. A mainframe computer is large but not as large as a supercomputer and has more processing power than some other classes of computers, such as minicomputers, servers, workstations, and personal computers. Most large-scale computer-system architectures were established in the 1960s, but they continue to evolve. Mainframe computers are often used as servers.

Middleware in the context of distributed applications is software that provides services beyond those provided by the operating system to enable the various components of a distributed system to communicate and manage data. Middleware supports and simplifies complex distributed applications. It includes web servers, application servers, messaging and similar tools that support application development and delivery. Middleware is especially integral to modern information technology based on XML, SOAP, Web services, and service-oriented architecture.

<span class="mw-page-title-main">IBM WebSphere</span> Brand of computer software

IBM WebSphere refers to a brand of proprietary computer software products in the genre of enterprise software known as "application and integration middleware". These software products are used by end-users to create and integrate applications with other applications. IBM WebSphere has been available to the general market since 1998.

Total cost of ownership (TCO) is a financial estimate intended to help buyers and owners determine the direct and indirect costs of a product or service. It is a management accounting concept that can be used in full cost accounting or even ecological economics where it includes social costs.

<span class="mw-page-title-main">Failover</span> Automatic switching from failed computer system to standby computers

Failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network in a computer network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.

The Service Availability Forum is a consortium that develops, publishes, educates on and promotes open specifications for carrier-grade and mission-critical systems. Formed in 2001, it promotes development and deployment of commercial off-the-shelf (COTS) technology.

<span class="mw-page-title-main">LAMP (software bundle)</span> Acronym for a common web hosting solution

A LAMP is one of the most common software stacks for the web's most popular applications. Its generic software stack model has largely interchangeable components.

Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by IBM as a term to describe the robustness of their mainframe computers.

High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

The term downtime is used to refer to periods when a system is unavailable. The unavailability is the proportion of a time-span that a system is unavailable or offline. This is usually a result of the system failing to function because of an unplanned event, or because of routine maintenance.

Database administration is the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM Db2 and Microsoft SQL Server need ongoing management. As such, corporations that use DBMS software often hire specialized information technology personnel called database administrators or DBAs.

Hardware virtualization is the virtualization of computers as complete hardware platforms, certain logical abstractions of their componentry, or only the functionality required to run various operating systems. Virtualization emulates the hardware environment of its host architecture, allowing multiple OSes to run unmodified and in isolation. At its origins, the software that controlled virtualization was called a "control program", but the terms "hypervisor" or "virtual machine monitor" became preferred over time.

In software engineering and hardware engineering, serviceability is one of the -ilities or aspects. It refers to the ability of technical support personnel to install, configure, and monitor computer products, identify exceptions or faults, debug or isolate faults to root cause analysis, and provide hardware or software maintenance in pursuit of solving a problem and restoring the product into service. Incorporating serviceability facilitating features typically results in more efficient product maintenance and reduces operational costs and maintains business continuity.

4690 Operating System is a specially designed point of sale (POS) operating system, originally sold by IBM. In 2012, IBM sold its retail business, including this product, to Toshiba, which assumed support. 4690 is widely used by IBM and Toshiba retail customers to run retail systems which run their own applications and others.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is cloud computing.

Dynamic Infrastructure is an information technology concept related to the design of data centers, whereby the underlying hardware and software can respond dynamically and more efficiently to changing levels of demand. In other words, data center assets such as storage and processing power can be provisioned to meet surges in user's needs. The concept has also been referred to as Infrastructure 2.0 and Next Generation Data Center.

Middleware analysts are computer software engineers with a specialization in products that connect two different computer systems together. These products can be open-source or proprietary. As the term implies, the software, tools, and technologies used by Middleware analysts sit "in-the-middle", between two or more systems; the purpose being to enable two systems to communicate and share information.

Software-defined storage (SDS) is a marketing term for computer data storage software for policy-based provisioning and management of data storage independent of the underlying hardware. Software-defined storage typically includes a form of storage virtualization to separate the storage hardware from the software that manages it. The software enabling a software-defined storage environment may also provide policy management for features such as data deduplication, replication, thin provisioning, snapshots and backup.

High availability software is software used to ensure that systems are running and available most of the time. High availability is a high percentage of time that the system is functioning. It can be formally defined as *100%. Although the minimum required availability varies by task, systems typically attempt to achieve 99.999% (5-nines) availability. This characteristic is weaker than fault tolerance, which typically seeks to provide 100% availability, albeit with significant price and performance penalties.

Data center management is the collection of tasks performed by those responsible for managing ongoing operation of a data center. This includes Business service management and planning for the future.

References

  1. Business Continuity: Delivering Data and Applications Through Continuous Availability, A META Group White Paper, June 2003 Archived 2012-04-25 at the Wayback Machine
  2. Gartner Survey Shows IT Availability Remain Top Priorities for U.S. IT Services Buyers, September 2010
  3. High availability (again) versus continuous availability, IBM WebSphere Developer Technical Journal, April 14, 2010
  4. Bob Dickerson: Service Recovery & Availability, IEEE Computer Society, 2010 Meeting
  5. itSM Solutions Newsletter December 2006: The Paradox of the 9s