Apdex

Last updated
Apdex Logo.PNG

Apdex (Application Performance Index) is an open standard developed by an alliance of companies for measuring performance of software applications in computing. Its purpose is to convert measurements into insights about user satisfaction, by specifying a uniform way to analyze and report on the degree to which measured performance meets user expectations. It is based on counts of "satisfied", "tolerating", and "frustrated" users, given a maximum satisfactory response time tof , a maximum tolerable response time of 4t, and where users are assumed to be frustrated above 4t. The score is equivalent to a weighted average of these user counts with weights 1, 0.5, and 0, respectively.

Contents

Problems addressed

When engaging in application performance management, for example in the course of website monitoring, enterprises collect many measurements of the performance of information technology applications. However, this measurement data may not provide a clear and simple picture of how well those applications are performing from a business point of view, a characteristic desired in metrics that are used as key performance indicators. Reporting several different kinds of data can confuse. Reducing measurement data to a single well understood metric is a convenient way to track and report on quality of experience.

Measurements of application response times, in particular, may be difficult to evaluate because:

The Apdex method seeks to address these problems.

Apdex method

Proponents of the Apdex standard believe that it offers a better way to "measure what matters". The Apdex method converts many measurements into one number on a uniform scale of 0 to 1 (0 = no users satisfied, 1 = all users satisfied). The resulting Apdex score is a numerical measure of user satisfaction with the performance of enterprise applications. This metric can be used to report on any source of end-user performance measurements for which a performance objective has been defined.

The Apdex formula is the number of satisfied samples plus half of the tolerating samples plus none of the frustrated samples, divided by all the samples:

where the sub-script t is the target time, and the tolerable time is assumed to be 4 times the target time. So it is easy to see how this ratio is always directly related to users' perceptions of satisfactory application responsiveness.

Example: assuming a performance objective of 3 seconds or better, and a tolerable standard of 12 seconds or better, given a dataset with 100 samples where 60 are below 3 seconds, 30 are between 3 and 12 seconds, and the remaining 10 are above 12 seconds, the Apdex score is:

The Apdex formula is equivalent to a weighted average, where a satisfied user is given a score of 1, a tolerating user is given a score of 0.5, and a frustrated user is given a score of 0.

Apdex Alliance

The Apdex Alliance, headquartered in Charlottesville, Virginia, was founded in 2004 by Peter Sevcik, President of NetForecast, Inc. The Alliance is a group of companies that are collaborating to establish the Apdex standard. These companies have perceived the need for a simple and uniform way to report on application performance, are adopting the Apdex method in their internal operations or software products, and are participating in the work of refining and extending the definition of the Apdex specifications. Alliance contributing members who incorporate the standard into their products may use the Apdex name or logo where the Alliance has certified them as compliant.

In January 2007, the Alliance comprised 11 contributing member companies, and over 200 individual members. While the number of contributing companies has remained relatively stable, individual membership grew to over 800 by December 2008, and reached 2000 in 2010. In 2008 the Alliance began publishing a blog, the Apdex Exchange, and in 2010, began offering educational Webinars. These activities address performance management topics, with an emphasis on how to apply the Apdex methodology.

Related Research Articles

Conversion of units is the conversion between different units of measurement for the same quantity, typically through multiplicative conversion factors which change the measured quantity value without changing its effects. Unit conversion is often easier within the metric or the SI than in others, due to the regular 10-base in all units and the prefixes that increase or decrease by 3 powers of 10 at a time.

<span class="mw-page-title-main">Relative density</span> Ratio of two densities

Relative density, sometimes called specific gravity, is a dimensionless quantity defined as the ratio of the density of a substance to the density of a given reference material. Specific gravity for liquids is nearly always measured with respect to water at its densest ; for gases, the reference is air at room temperature. The term "relative density" is often preferred in scientific usage, whereas the term "specific gravity" is deprecated.

<span class="mw-page-title-main">Nyquist–Shannon sampling theorem</span> Sufficiency theorem for reconstructing signals from samples

The Nyquist–Shannon sampling theorem is an essential principle for digital signal processing linking the frequency range of a signal and the sample rate required to avoid a type of distortion called aliasing. The theorem states that the sample rate must be at least twice the bandwidth of the signal to avoid aliasing. In practice, it is used to select band-limiting filters to keep aliasing below an acceptable amount when an analog signal is sampled or when sample rates are changed within a digital signal processing function.

In telecommunications, round-trip delay (RTD) or round-trip time (RTT) is the amount of time it takes for a signal to be sent plus the amount of time it takes for acknowledgement of that signal having been received. This time delay includes propagation times for the paths between the two communication endpoints. In the context of computer networks, the signal is typically a data packet. RTT is also known as ping time, and can be determined with the ping command.

Signal-to-noise ratio is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. SNR is defined as the ratio of signal power to noise power, often expressed in decibels. A ratio higher than 1:1 indicates more signal than noise.

Reverberation, in acoustics, is a persistence of sound after it is produced. Reverberation is created when a sound or signal is reflected. This causes numerous reflections to build up and then decay as the sound is absorbed by the surfaces of objects in the space – which could include furniture, people, and air. This is most noticeable when the sound source stops but the reflections continue, their amplitude decreasing, until zero is reached.

Pearson's chi-squared test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is the most widely used of many chi-squared tests – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900. In contexts where it is important to improve a distinction between the test statistic and its distribution, names similar to Pearson χ-squared test or statistic are used.

<span class="mw-page-title-main">Hyperbolic space</span> Non-Euclidean geometry

In mathematics, hyperbolic space of dimension n is the unique simply connected, n-dimensional Riemannian manifold of constant sectional curvature equal to -1. It is homogeneous, and satisfies the stronger property of being a symmetric space. There are many ways to construct it as an open subset of with an explicitly written Riemannian metric; such constructions are referred to as models. Hyperbolic 2-space, H2, which was the first instance studied, is also called the hyperbolic plane.

<span class="mw-page-title-main">Color rendering index</span> Measure of ability of a light source to reproduce colors in comparison with a standard light source

A color rendering index (CRI) is a quantitative measure of the ability of a light source to reveal the colors of various objects faithfully in comparison with a natural or standard light source. Light sources with a high CRI are desirable in color-critical applications such as neonatal care and art restoration.

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

Optical resolution describes the ability of an imaging system to resolve detail, in the object that is being imaged. An imaging system may have many individual components, including one or more lenses, and/or recording and display components. Each of these contributes to the optical resolution of the system; the environment in which the imaging is done often is a further important factor.

<span class="mw-page-title-main">ProPhoto RGB color space</span> Photographic color space developed by Kodak

The ProPhoto RGB color space, also known as ROMM RGB, is an output referred RGB color space developed by Kodak. It offers an especially large gamut designed for use with photographic output in mind. The ProPhoto RGB color space encompasses over 90% of possible surface colors in the CIE L*a*b* color space, and 100% of likely occurring real-world surface colors documented by Michael Pointer in 1980, making ProPhoto even larger than the Wide-gamut RGB color space. The ProPhoto RGB primaries were also chosen in order to minimize hue rotations associated with non-linear tone scale operations. One of the downsides to this color space is that approximately 13% of the representable colors are imaginary colors that do not exist and are not visible colors.

Video quality is a characteristic of a video passed through a video transmission or processing system that describes perceived video degradation. Video processing systems may introduce some amount of distortion or artifacts in the video signal that negatively impacts the user's perception of a system. For many stakeholders in video production and distribution, assurance of video quality is an important task.

Fairness measures or metrics are used in network engineering to determine whether users or applications are receiving a fair share of system resources. There are several mathematical and conceptual definitions of fairness.

<span class="mw-page-title-main">F-score</span> Statistical measure of a tests accuracy

In statistical analysis of binary classification and information retrieval systems, the F-score or F-measure is a measure of predictive performance. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all positive results, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and recall is also known as sensitivity in diagnostic binary classification.

The zero-order hold (ZOH) is a mathematical model of the practical signal reconstruction done by a conventional digital-to-analog converter (DAC). That is, it describes the effect of converting a discrete-time signal to a continuous-time signal by holding each sample value for one sample interval. It has several applications in electrical communication.

<span class="mw-page-title-main">Precision and recall</span> Pattern-recognition performance metrics

In pattern recognition, information retrieval, object detection and classification, precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.

<span class="mw-page-title-main">Power factor (shooting sports)</span> Ranking system for the momentum of pistol cartridges in competitive practical shooting

Power Factor (PF) in practical shooting competitions refers to a ranking system used to reward cartridges with more recoil. Power factor is a measure of the momentum of the bullet, which to some degree reflects the recoil impulse from the firearm onto the shooter.

Evaluation measures for an information retrieval (IR) system assess how well an index, search engine or database returns results from a collection of resources that satisfy a user's query. They are therefore fundamental to the success of information systems and digital platforms. The success of an IR system may be judged by a range of criteria including relevance, speed, user satisfaction, usability, efficiency and reliability. However, the most important factor in determining a system's effectiveness for users is the overall relevance of results retrieved in response to a query. Evaluation measures may be categorised in various ways including offline or online, user-based or system-based and include methods such as observed user behaviour, test collections, precision and recall, and scores from prepared benchmark test sets.

P4 metric enables performance evaluation of the binary classifier. It is calculated from precision, recall, specificity and NPV (negative predictive value). P4 is designed in similar way to F1 metric, however addressing the criticisms leveled against F1. It may be perceived as its extension.