Service-level objective

Last updated

A service-level objective (SLO), as per the O'Reilly Site Reliability Engineering book, is a "target value or range of values for a service level that is measured by an SLI." [1] An SLO is a key element of a service-level agreement (SLA) between a service provider and a customer. SLOs are agreed upon as a means of measuring the performance of the service provider and are outlined as a way of avoiding disputes between the two parties based on misunderstanding.

Contents

Overview

There is often confusion in the use of SLAs and SLOs. The SLA is the entire agreement that specifies what service is to be provided, how it is supported, times, locations, costs, performance, and responsibilities of the parties involved. SLOs are specific measurable characteristics of the SLA such as availability, throughput, frequency, response time, or quality. These SLOs together are meant to define the expected service between the provider and the customer and vary depending on the service's urgency, resources, and budget. SLOs provide a quantitative means to define the level of service a customer can expect from a provider. [2]

The SLO are formed by setting goals for metrics (commonly called service level indicators, SLIs). As an example, an availability SLO may be defined as the expected measured value of an availability SLI over a prescribed duration (e.g. four weeks). The availability SLI used will vary based on the nature and architecture of the service. For example, a simple web service might use the ratio of successful responses served vs the total number of valid requests received. (total_success / total_valid) [3]

Examples

Sturm and Morris argue [4] that SLOs must be:

While Andrieux et al. define the SLO as "the quality of service aspect of the agreement. Syntactically, it is an assertion over the terms of the agreement as well as such qualities as date and time". [5] Keller and Ludwig more concisely define an SLO as "commitment to maintain a particular state of the service in a given period" with respect to the state of the SLA parameters. [6] Keller and Ludwig go on to state that while service providers will most often be the lead entity in taking on SLOs there is no firm definition as such and any entity can be responsible for an SLO. Along with this an SLO can be broken down into a number of different components.

Optionally an EvaluationEvent maybe assigned to the SLO, an EvaluationEvent is defined as the measure by which the SLO will be checked to see if it's meeting the Expression.

SLOs should generally be specified in terms of an achievement value or service level, a target measurement, a measurement period, and where and how they are measured. [2] As an example, "90% of calls to the helpdesk should be answered in less than 20 seconds measured over a one-month period as reported by the ACD system". Results can be reported as a percent of time that the target answer time was achieved and then compared to the desired service level (90%).

Type of MeasureExample SLO RequirementMeasurement Period
AvailabilityThe application will be available 99.95% of the timeOver a year
Service Desk Response75% of help desk calls will be answered in less than a minute

85% of help desk calls will be answered within two minutes

100% of help desk calls will be answered within three minutes

Over a month
Incident Response Time99% of severity 1 tickets will be resolved within three hours

98% of severity 2 tickets will be resolved within eight hours

98% of severity 3 tickets will be resolved within three business days

98% of severity 4 tickets will be resolved within five business days

Over a quarter
Response Time85% of TCP replies within 1.5 seconds of receiving a request

99.5% of TCP replies within 4 seconds of receiving a request

Over a month

Term usage

The SLO term is found in various scientific papers, for instance in the reference architecture of the SLA@SOI project, [7] and it is used in the Open Grid Forum document on WS-Agreement. [5]

Related Research Articles

<span class="mw-page-title-main">Web hosting service</span> Service for hosting websites

A web hosting service is a type of Internet hosting service that hosts websites for clients, i.e. it offers the facilities required for them to create and maintain a site and makes it accessible on the World Wide Web. Companies providing web hosting services are sometimes called web hosts.

In product development and process optimization, a requirement is a singular documented physical or functional need that a particular design, product or process aims to satisfy. It is commonly used in a formal sense in engineering design, including for example in systems engineering, software engineering, or enterprise engineering. It is a broad concept that could speak to any necessary function, attribute, capability, characteristic, or quality of a system for it to have value and utility to a customer, organization, internal user, or other stakeholder. Requirements can come with different levels of specificity; for example, a requirement specification or requirement "spec" refers to an explicit, highly objective/clear requirement to be satisfied by a material, design, product, or service.

A service-level agreement (SLA) is an agreement between a service provider and a customer. Particular aspects of the service – quality, availability, responsibilities – are agreed between the service provider and the service user. The most common component of an SLA is that the services should be provided to the customer as agreed upon in the contract. As an example, Internet service providers and telcos will commonly include service level agreements within the terms of their contracts with customers to define the level(s) of service being sold in plain language terms. In this case, the SLA will typically have a technical definition of mean time between failures (MTBF), mean time to repair or mean time to recovery (MTTR); identifying which party is responsible for reporting faults or paying fees; responsibility for various data rates; throughput; jitter; or similar measurable details.

<span class="mw-page-title-main">Performance indicator</span> Measurement that evaluates the success of an organization

A performance indicator or key performance indicator (KPI) is a type of performance measurement. KPIs evaluate the success of an organization or of a particular activity in which it engages. KPIs provide a focus for strategic and operational improvement, create an analytical basis for decision making and help focus attention on what matters most.

<span class="mw-page-title-main">ISO/IEC 9126</span> Former ISO and IEC standard

ISO/IEC 9126Software engineering — Product quality was an international standard for the evaluation of software quality. It has been replaced by ISO/IEC 25010:2011.

Service Provisioning Markup Language (SPML) is an XML-based framework, being developed by OASIS, for exchanging user, resource and service provisioning information between cooperating organizations.

An operational-level agreement (OLA) defines interdependent relationships in support of a service-level agreement (SLA). The agreement describes the responsibilities of each internal support group toward other support groups, including the process and timeframe for delivery of their services. The objective of the OLA is to present a clear, concise and measurable description of the service provider's internal support relationships.

A dedicated hosting service, dedicated server, or managed hosting service is a type of Internet hosting in which the client leases an entire server not shared with anyone else. This is more flexible than shared hosting, as organizations have full control over the server(s), including choice of operating system, hardware, etc.

Service level measures the performance of a system. Certain goals are defined and the service level gives the percentage to which those goals should be achieved. Fill rate is different from service level.

High availability (HA) is a characteristic of a system that aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.

ITIL security management describes the structured fitting of security into an organization. ITIL security management is based on the ISO 27001 standard. "ISO/IEC 27001:2005 covers all types of organizations. ISO/IEC 27001:2005 specifies the requirements for establishing, implementing, operating, monitoring, reviewing, maintaining and improving a documented Information Security Management System within the context of the organization's overall business risks. It specifies requirements for the implementation of security controls customized to the needs of individual organizations or parts thereof. ISO/IEC 27001:2005 is designed to ensure the selection of adequate and proportionate security controls that protect information assets and give confidence to interested parties."

Entity Framework (EF) is an open source object–relational mapping (ORM) framework for ADO.NET. It was originally shipped as an integral part of .NET Framework, however starting with Entity Framework version 6.0 it has been delivered separately from the .NET Framework.

<span class="mw-page-title-main">Shared services center</span> Entity responsible for the execution and the handling of specific operational tasks

A shared services center – a center for shared services in an organization – is the entity responsible for the execution and the handling of specific operational tasks, such as accounting, human resources, payroll, IT, legal, compliance, purchasing, security. The shared services center is often a spin-off of the corporate services to separate all operational types of tasks from the corporate headquarters, which has to focus on a leadership and corporate governance type of role. As shared services centers are often cost centers, they are quite cost-sensitive also in terms of their headcount, labour costs and location selection criteria.

Security level management (SLM) comprises a quality assurance system for electronic information security.

ITU-T Y.156sam Ethernet Service Activation Test Methodology is a draft recommendation under study by the ITU-T describing a new testing methodology adapted to the multiservice reality of packet-based networks.

ITU-T Y.1564 is an Ethernet service activation test methodology, which is the new ITU-T standard for turning up, installing and troubleshooting Ethernet-based services. It is the only standard test methodology that allows for complete validation of Ethernet service-level agreements (SLAs) in a single test.

Vested outsourcing is a hybrid business model in which contracting parties create a formal relational contract using shared values and goals and outcome-based economics to create an agreement that is mutually beneficial for each party. The model was developed out of research by the University of Tennessee led by Kate Vitasek.


Site reliability engineering (SRE) is a set of principles and practices that applies aspects of software engineering to IT infrastructure and operations. SRE claims to create highly reliable and scalable software systems. Although they are closely related, SRE is slightly different from DevOps.

In information technology, a service level indicator (SLI) is a measure of the service level provided by a service provider to a customer. SLIs form the basis of service level objectives (SLOs), which in turn form the basis of service level agreements (SLAs); an SLI is thus also called an SLA metric.

<span class="mw-page-title-main">Levels of service</span> Quality control assessment for numerous types of assets

Levels of service (LOS) is a term in asset management referring to the quality of a given service. Defining and measuring levels of service is a key activity in developing infrastructure asset management plans. Levels of service may be tied to physical performance of assets or be defined via customer expectation and satisfaction. The latter is more service-centric rather than asset-centric. For instance, when measuring the LOS of a road, it could be measured by a physical performance indicator such as Pavement Condition Index (PCI) or by a measure related to customer satisfaction such as the number of complaints per month about that certain road section. Or in the case of traffic level of service, it could be measured by the geometry of road or by travel time of the vehicles, which reflects the quality of traffic flow. So, levels of service can have multiple facets: customer satisfaction, environmental requirements and legal requirements.

References

  1. Beyer, Jones, Petoff, Murphy. "Site Reliability Engineering: How Google Runs Production Systems". Google Site Reliability Engineering. O'Reilly. Retrieved 9 June 2023.{{cite web}}: CS1 maint: multiple names: authors list (link)
  2. 1 2 Rastegari, Yousef; Shams, Fereidoon (2015-12-29). "Optimal Decomposition of Service Level Objectives into Policy Assertions". The Scientific World Journal. 2015: 465074. doi: 10.1155/2015/465074 . ISSN   2356-6140. PMC   4709918 . PMID   26962544.
  3. Hidalgo, Alex (August 2020). Implementing Service Level Objectives (1 ed.). O'Reilly Media, Inc. ISBN   9781492076766.
  4. Rick Sturm, Wayne Morris "Foundations of Service Level Management", April 2000, Pearson.
  5. 1 2 Alain Andrieux, Karl Czajkowski, Asit Dan, Kate Keahey, Heiko Ludwig, Toshiyuki Nakata, Jim Pruyne, John Rofrano, Steve Tuecke, Ming Xu "Web Services Agreement Specification (WS-Agreement)", GFD-R-P.107, March 2007, Open Grid Forum.
  6. Alexander Keller, Heiko Ludwig "The WSLA Framework: Specifying and Monitoring Service Level Agreements for Web Services", Journal of Network and Systems Management, Vol 11, n. 1, March 2003.
  7. Jens Happe, Wolfgang Theilmann, Andrew Edmonds, and Keven T. Kearney "A Reference Architecture for Multi-Level SLA Management" in "Service Level Agreements for Cloud Computing", eds. Wieder, Philipp and Butler, Joe M. and Theilmann, Wolfgang and Yahyapour, Ramin, Springer New York, 2011, DOI:10.1007/978-1-4614-1614-2_2