Perceptual Evaluation of Speech Quality (PESQ) is a family of standards comprising a test methodology for automated assessment of the speech quality as experienced by a user of a telephony system. It was standardized as Recommendation ITU-T P.862 [1] in 2001. PESQ is used for objective voice quality testing by phone manufacturers, network equipment vendors and telecom operators. Its usage requires a license. The first edition of PESQ's successor POLQA (Recommendation ITU-T P.863 [2] ) entered into force in 2011.
PESQ was developed to model subjective tests commonly used in telecommunications (e.g., Recommendation ITU-T P.800) to assess the voice quality perceived by human beings. Consequently, it employs true voice samples as test signals. In order to characterize the listening quality as perceived by users, it is of paramount importance to load modern telecom equipment with speech-like signals. Many systems are optimized for speech and would respond in an unpredictable way to non-speech signals (e.g., tones, noise). Guidelines for proper applications of voice test samples are defined in the PESQ application guide contained in Recommendation ITU-T P.862.3. [3]
ITU-T's family of full reference objective voice quality measurements started in 1997 with Recommendation ITU-T P.861 (PSQM), which was superseded by ITU-T P.862 (PESQ) [1] in 2001. P.862 was later complemented with Recommendations ITU-T P.862.1 [4] (mapping of PESQ scores to a MOS scale), ITU-T P.862.2 [5] (wideband measurements) and ITU-T P.862.3 [3] (application guide). The first edition of ITU-T P.863 (POLQA) [2] entered into force in 2011. An Application guide for Recommendation ITU-T P.863 was approved in 2019 and published as ITU-T P.863.1. [6]
In addition to the above listed full reference methods, the list of ITU-T's objective voice quality measurement standards also includes ITU-T P.563 [7] (no-reference algorithm).
Depending on the information that is made available to an algorithm, voice-quality test algorithms can be divided into two main categories:
PESQ is a full-reference algorithm and analyzes the speech signal sample-by-sample after a temporal alignment of corresponding excerpts of reference and test signal. PESQ can be applied to provide an end-to-end (E2E) quality assessment for a network, or characterize individual network components.
PESQ results principally model mean opinion scores (MOS) that cover a scale from 1 (bad) to 5 (excellent). A mapping function to MOS-LQO is outlined in Recommendation ITU-T P.862.1. [4]
Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.
G.723.1 is an audio codec for voice that compresses voice audio in 30 ms frames. An algorithmic look-ahead of 7.5 ms duration means that total algorithmic delay is 37.5 ms. Its official name is Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s. It is sometimes associated with a Truespeech trademark in coprocessors produced by DSP Group.
G.711 is a narrowband audio codec originally designed for use in telephony that provides toll-quality audio at 64 kbit/s. It is an ITU-T standard (Recommendation) for audio encoding, titled Pulse code modulation (PCM) of voice frequencies released for use in 1972.
A signal strength and readability report is a standardized format for reporting the strength of the radio signal and the readability (quality) of the radiotelephone (voice) or radiotelegraph signal transmitted by another station as received at the reporting station's location and by their radio station equipment. These report formats are usually designed for only one communications mode or the other, although a few are used for both telegraph and voice communications. All but one of these signal report formats involve the transmission of numbers.
G.729 is a royalty-free narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as Coding of speech at 8 kbit/s using code-excited linear prediction speech coding (CS-ACELP), and was introduced in 1996. The wide-band extension of G.729 is called G.729.1, which equals G.729 Annex J.
G.722 is an ITU-T standard 7 kHz wideband audio codec operating at 48, 56 and 64 kbit/s. It was approved by ITU-T in November 1988. Technology of the codec is based on sub-band ADPCM (SB-ADPCM). The corresponding narrow-band codec based on the same technology is G.726.
G.722.1 is a licensed royalty-free ITU-T standard audio codec providing high quality, moderate bit rate wideband (50 Hz – 7 kHz audio bandwidth, 16 ksps audio coding. It is a partial implementation of Siren 7 audio coding format developed by PictureTel Corp.. Its official name is Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. It uses a modified discrete cosine transform audio data compression algorithm.
Mean opinion score (MOS) is a measure used in the domain of Quality of Experience and telecommunications engineering, representing overall quality of a stimulus or system. It is the arithmetic mean over all individual "values on a predefined scale that a subject assigns to his opinion of the performance of a system quality". Such ratings are usually gathered in a subjective quality evaluation test, but they can also be algorithmically estimated.
Perceptual Speech Quality Measure (PSQM) is a computational and modeling algorithm defined in Recommendation ITU-T P.861 that objectively evaluates and quantifies voice quality of voice-band speech codecs. It may be used to rank the performance of these speech codecs with differing speech input levels, talkers, bit rates and transcodings. P.861 was withdrawn and replaced by Recommendation ITU-T P.862 (PESQ), which contains an improved speech assessment algorithm.
Video quality is a characteristic of a video passed through a video transmission or processing system that describes perceived video degradation. Video processing systems may introduce some amount of distortion or artifacts in the video signal that negatively impact the user's perception of the system. For many stakeholders in video production and distribution, ensuring video quality is an important task.
Subjective video quality is video quality as experienced by humans. It is concerned with how video is perceived by a viewer and designates their opinion on a particular video sequence. It is related to the field of Quality of Experience. Measuring subjective video quality is necessary because objective quality assessment algorithms such as PSNR have been shown to correlate poorly with subjective ratings. Subjective ratings may also be used as ground truth to develop new algorithms.
MUSHRA stands for Multiple Stimuli with Hidden Reference and Anchor and is a methodology for conducting a codec listening test to evaluate the perceived quality of the output from lossy audio compression algorithms. It is defined by ITU-R recommendation BS.1534-3. The MUSHRA methodology is recommended for assessing "intermediate audio quality". For very small audio impairments, Recommendation ITU-R BS.1116-3 (ABC/HR) is recommended instead.
G.729.1 is an 8-32 kbit/s embedded speech and audio codec providing bitstream interoperability with G.729, G.729 Annex A and G.729 Annex B. Its official name is G.729-based embedded variable bit rate codec: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729. It was introduced in 2006.
Perceptual Evaluation of Audio Quality (PEAQ) is a standardized algorithm for objectively measuring perceived audio quality, developed in 1994–1998 by a joint venture of experts within Task Group 6Q of the International Telecommunication Union's Radiocommunication Sector (ITU-R). It was originally released as ITU-R Recommendation BS.1387 in 1998 and last updated in 2023. It utilizes software to simulate perceptual properties of the human ear and then integrates multiple model output variables into a single metric.
Perceptual Evaluation of Video Quality(PEVQ) is an end-to-end (E2E) measurement algorithm to score the picture quality of a video presentation by means of a 5-point mean opinion score (MOS). It is, therefore, a video quality model. PEVQ was benchmarked by the Video Quality Experts Group (VQEG) in the course of the Multimedia Test Phase 2007–2008. Based on the performance results, in which the accuracy of PEVQ was tested against ratings obtained by human viewers, PEVQ became part of the new International Standard.
G.718 is an ITU-T Recommendation embedded scalable speech and audio codec providing high quality narrowband speech over the lower bit rates and high quality wideband speech over the complete range of bit rates. In addition, G.718 is designed to be highly robust to frame erasures, thereby enhancing the speech quality when used in Internet Protocol (IP) transport applications on fixed, wireless and mobile networks. Despite its embedded nature, the codec also performs well with both narrowband and wideband generic audio signals. The codec has an embedded scalable structure, enabling maximum flexibility in the transport of voice packets through IP networks of today and in future media-aware networks. In addition, the embedded structure of G.718 will easily allow the codec to be extended to provide a superwideband and stereo capability through additional layers which are currently under development in ITU-T Study Group 16. The bitstream may be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value without the need for out-of-band signalling. The encoder produces an embedded bitstream structured in five layers corresponding to the five available bit rates: 8, 12, 16, 24 & 32 kbit/s.
VQuad-HD(Objective perceptual multimedia video quality measurement of HDTV) is a video quality testing technology for high definition video signals. It is a full-reference model, meaning that it requires access to the original and the degraded signal to estimate the quality.
Perceptual Objective Listening Quality Analysis (POLQA) was the working title of an ITU-T standard that covers a model to predict speech quality by means of analyzing digital speech signals. The model was standardized as Recommendation ITU-T P.863 in 2011. The second edition of the standard appeared in 2014, and the third, currently in-force edition was adopted in 2018 under the title Perceptual objective listening quality prediction.
The Circuit Merit system is a measurement process designed to assess the voice-to-noise ratio in wired and wireless telephone circuits, especially the AMPS system, and although its reporting scale is sometimes used as input for calculating mean opinion score, the rating system is officially defined relative to given ranges of voice-to-noise ratios.
Hearing-Aid Speech Quality Index (HASQI) is a measure of audio quality originally designed for the evaluation of speech quality for those with a hearing aid,. It has also been shown to be able to gauge audio quality for non-speech sounds and for listeners without a hearing loss.