Continuous availability

Last updated

Continuous availability is an approach to computer system and application design that protects users against downtime, whatever the cause and ensures that users remain connected to their documents, data files and business applications. Continuous availability describes the information technology methods to ensure business continuity. [1] [ citation needed ]

Contents

In early days of computing, availability was not considered business critical. With the increasing use of mobile computing, global access to online business transactions and business-to-business communication, continuous availability is increasingly important based on the need to support customer access to information systems. [2]

Solutions to continuous availability exists in different forms and implementations depending on the software and hardware manufacturer. The goal of the discipline is to reduce the user or business application downtime, which can have a severe impact on business operations. Inevitably, such downtime can lead to loss of productivity, loss of revenue, customer dissatisfaction and ultimately can damage a company's reputation.

Degrees of availability

The terms high availability, continuous operation, and continuous availability are generally used to express how available a system is. [3] [4] The following is a definition of each of these terms.

High availability refers to the ability to avoid unplanned outages by eliminating single points of failure. This is a measure of the reliability of the hardware, operating system, middleware, and database manager software. Another measure of high availability is the ability to minimize the effect of an unplanned outage by masking the outage from the end users. This can be accomplished by providing redundancy or quickly restarting failed components.

Availability is usually expressed as a percentage of uptime in a given year:

AvailabilityDowntime per year
99.9%8.76 hours
99.99%1 hour
99.999%5 minutes

When defining such a percentage it needs to be specified if it applies to the hardware, the IT infrastructure or the business application on top. [5]

Continuous operation refers to the ability to avoid planned outages. For continuous operation there must be ways to perform necessary administrative work, like hardware and software maintenance, upgrades, and platform refreshes while the business application remains available to the end users. This is accomplished by providing multiple servers and switching end users to an available server at times when one server is made unavailable. Note that a system running in continuous operation is not necessarily operating with high availability because an excessive number of unplanned outages could compromise this.

Continuous availability combines the characteristics of high availability and continuous operation to provide the ability to keep the business application running without any noticeable downtime.

Types of outages

Planned outages are deliberate and are scheduled at a convenient time. These involve such activities as: - Hardware installation or maintenance - Software maintenance or upgrades of the operating system, the middleware, the database server or the business application - Database administration such as offline backup, or offline reorganization

Unplanned outages are unexpected outages that are caused by the failure of any system component. They include hardware failures, software issues, or people and process issues.

History

Various commercially viable examples exist for hardware/software implementations. These include:

See also

Related Research Articles

<span class="mw-page-title-main">Mainframe computer</span> Large computer

A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterprise resource planning, and large-scale transaction processing. A mainframe computer is large but not as large as a supercomputer and has more processing power than some other classes of computers, such as minicomputers, servers, workstations, and personal computers. Most large-scale computer-system architectures were established in the 1960s, but they continue to evolve. Mainframe computers are often used as servers.

z/OS 64-bit operating system for IBM mainframes

z/OS is a 64-bit operating system for IBM z/Architecture mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn was preceded by a string of MVS versions. Like OS/390, z/OS combines a number of formerly separate, related products, some of which are still optional. z/OS has the attributes of modern operating systems, but also retains much of the older functionality originated in the 1960s and still in regular use—z/OS is designed for backward compatibility.

Middleware in the context of distributed applications is software that provides services beyond those provided by the operating system to enable the various components of a distributed system to communicate and manage data. Middleware supports and simplifies complex distributed applications. It includes web servers, application servers, messaging and similar tools that support application development and delivery. Middleware is especially integral to modern information technology based on XML, SOAP, Web services, and service-oriented architecture.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

<span class="mw-page-title-main">IBM WebSphere</span>

IBM WebSphere refers to a brand of proprietary computer software products in the genre of enterprise software known as "application and integration middleware". These software products are used by end-users to create and integrate applications with other applications. IBM WebSphere has been available to the general market since 1998.

Failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network in a computer network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.

The Service Availability Forum is a consortium that develops, publishes, educates on and promotes open specifications for carrier-grade and mission-critical systems. Formed in 2001, it promotes development and deployment of commercial off-the-shelf (COTS) technology.

In computing, a Parallel Sysplex is a cluster of IBM mainframes acting together as a single system image with z/OS. Used for disaster recovery, Parallel Sysplex combines data sharing and parallel computing to allow a cluster of up to 32 systems to share a workload for high performance and high availability.

Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by International Business Machines (IBM) as a term to describe the robustness of their mainframe computers.

High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

The term downtime is used to refer to periods when a system is unavailable.

Database administration is the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM Db2 and Microsoft SQL Server need ongoing management. As such, corporations that use DBMS software often hire specialized information technology personnel called database administrators or DBAs.

In software engineering and hardware engineering, serviceability is one of the -ilities or aspects. It refers to the ability of technical support personnel to install, configure, and monitor computer products, identify exceptions or faults, debug or isolate faults to root cause analysis, and provide hardware or software maintenance in pursuit of solving a problem and restoring the product into service. Incorporating serviceability facilitating features typically results in more efficient product maintenance and reduces operational costs and maintains business continuity.

4690 Operating System is a specially designed point of sale (POS) operating system, originally sold by IBM. In 2012, IBM sold its retail business, including this product, to Toshiba, which assumed support. 4690 is widely used by IBM and Toshiba retail customers to run retail systems which run their own applications and others.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

Dynamic Infrastructure is an information technology concept related to the design of data centers, whereby the underlying hardware and software can respond dynamically and more efficiently to changing levels of demand. In other words, data center assets such as storage and processing power can be provisioned to meet surges in user's needs. The concept has also been referred to as Infrastructure 2.0 and Next Generation Data Center.

Middleware analysts are computer software engineers with a specialization in products that connect two different computer systems together. These products can be open-source or proprietary. As the term implies, the software, tools, and technologies used by Middleware analysts sit "in-the-middle", between two or more systems; the purpose being to enable two systems to communicate and share information.

Linux on IBM Z or Linux on zSystems is the collective term for the Linux operating system compiled to run on IBM mainframes, especially IBM Z / IBM zSystems and IBM LinuxONE servers. Similar terms which imply the same meaning are Linux/390, Linux/390x, etc. The three Linux distributions certified for usage on the IBM Z hardware platform are Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Ubuntu.

High availability software is software used to ensure that systems are running and available most of the time. High availability is a high percentage of time that the system is functioning. It can be formally defined as *100%. Although the minimum required availability varies by task, systems typically attempt to achieve 99.999% (5-nines) availability. This characteristic is weaker than fault tolerance, which typically seeks to provide 100% availability, albeit with significant price and performance penalties.

Data center management is the collection of tasks performed by those responsible for managing ongoing operation of a data center. This includes Business service management and planning for the future.

References

  1. Business Continuity: Delivering Data and Applications Through Continuous Availability, A META Group White Paper, June 2003 Archived 2012-04-25 at the Wayback Machine
  2. Gartner Survey Shows IT Availability Remain Top Priorities for U.S. IT Services Buyers, September 2010
  3. High availability (again) versus continuous availability, IBM WebSphere Developer Technical Journal, April 14, 2010
  4. Bob Dickerson: Service Recovery & Availability, IEEE Computer Society, 2010 Meeting
  5. itSM Solutions Newsletter December 2006: The Paradox of the 9s