Hearing-Aid Speech Quality Index

Last updated

Hearing-Aid Speech Quality Index (HASQI) is a measure of audio quality originally designed for the evaluation of speech quality for those with a hearing aid,. [1] [2] It has also been shown to be able to gauge audio quality for non-speech sounds and for listeners without a hearing loss. [3]

Contents

Background

While the perception of audio quality can be gauged through perceptual measurements, the testing is time-consuming to undertake. Consequently, a number of metrics have been developed to allow audio quality to be evaluated without the need for human listening. Standardized examples from telephony include PESQ, POLQA, PEVQ and PEAQ. HASQI was originally developed by Kates and Arehart to evaluate how the distortions introduced by hearing aids degrade quality. [1] They also produced a new version in 2014. [2]

Kressner et al. [3] tested a speech corpus different from the dataset used to develop HASQI and showed that the index generalizes well for listeners without a hearing loss with a performance comparable to PESQ. Kendrick et al. [4] showed that HASQI can grade the audio quality of music and geophonic, biophonic, and anthrophonic quotidian sounds, although their study used a more limited set of degradations.

Method

HASQI and its 2014 revision are double-ended methods requiring both a clean reference and the degraded signal to allow evaluation. The index attempts to capture the effects of noise, nonlinear distortion, linear filtering and spectral changes, by computing the difference or correlation between key audio features. This is done by examining short-time signal envelopes to quantify the degradation caused by noise and nonlinear filtering, and long-time signal envelopes to quantify the effects of linear filtering. Version 2 of HASQI includes a model to capture some aspects of the peripheral auditory system for both normal and hearing impaired listeners.

Kendrick et al. developed a blind (single-ended) method, bHASQI, using machine learning. This enables the audio quality to be evaluated from just the degraded signal without needing the clean reference. [4]

See also

Related Research Articles

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

Voice analysis is the study of speech sounds for purposes other than linguistic content, such as in speech recognition. Such studies include mostly medical analysis of the voice (phoniatrics), but also speaker identification. More controversially, some believe that the truthfulness or emotional state of speakers can be determined using voice stress analysis or layered voice analysis.

Dynamic range is the ratio between the largest and smallest values that a certain quantity can assume. It is often used in the context of signals, like sound and light. It is measured either as a ratio or as a base-10 (decibel) or base-2 logarithmic value of the difference between the smallest and largest signal values.

In signal processing, group delay and phase delay are two related ways of describing how a signal's frequency components are delayed in time when passing through a linear time-invariant (LTI) system. Phase delay describes the time shift of a sinusoidal component. Group delay describes the time shift of the envelope of a wave packet, a "pack" or "group" of oscillations centered around one frequency that travel together, formed for instance by multiplying a sine wave by an envelope.

<span class="mw-page-title-main">Digital audio</span> Technology that records, stores, and reproduces sound

Digital audio is a representation of sound recorded in, or converted into, digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical samples in a continuous sequence. For example, in CD audio, samples are taken 44,100 times per second, each with 16-bit sample depth. Digital audio is also the name for the entire technology of sound recording and reproduction using audio signals that have been encoded in digital form. Following significant advances in digital audio technology during the 1970s and 1980s, it gradually replaced analog audio technology in many areas of audio engineering, record production and telecommunications in the 1990s and 2000s.

Sound can be recorded and stored and played using either digital or analog techniques. Both techniques introduce errors and distortions in the sound, and these methods can be systematically compared. Musicians and listeners have argued over the superiority of digital versus analog sound recordings. Arguments for analog systems include the absence of fundamental error mechanisms which are present in digital audio systems, including aliasing and associated anti-aliasing filter implementation, jitter and quantization noise. Advocates of digital point to the high levels of performance possible with digital audio, including excellent linearity in the audible band and low levels of noise and distortion.

<span class="mw-page-title-main">Sampling (signal processing)</span> Measurement of a signal at discrete time intervals

In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or space; this definition differs from the term's usage in statistics, which refers to a set of such values.

<span class="mw-page-title-main">Sound quality</span> Assessment of the audio output from an electronic device

Sound quality is typically an assessment of the accuracy, fidelity, or intelligibility of audio output from an electronic device. Quality can be measured objectively, such as when tools are used to gauge the accuracy with which the device reproduces an original sound; or it can be measured subjectively, such as when human listeners respond to the sound or gauge its perceived similarity to another sound.

<span class="mw-page-title-main">Audio system measurements</span> Means of quantifying system performance

Audio system measurements are a means of quantifying system performance. These measurements are made for several purposes. Designers take measurements so that they can specify the performance of a piece of equipment. Maintenance engineers make them to ensure equipment is still working to specification, or to ensure that the cumulative defects of an audio path are within limits considered acceptable. Audio system measurements often accommodate psychoacoustic principles to measure the system in a way that relates to human hearing.

Perceptual Speech Quality Measure (PSQM) is a computational and modeling algorithm defined in Recommendation ITU-T P.861 that objectively evaluates and quantifies voice quality of voice-band speech codecs. It may be used to rank the performance of these speech codecs with differing speech input levels, talkers, bit rates and transcodings. P.861 was withdrawn and replaced by Recommendation ITU-T P.862 (PESQ), which contains an improved speech assessment algorithm.

Bandwidth extension of signal is defined as the deliberate process of expanding the frequency range (bandwidth) of a signal in which it contains an appreciable and useful content, and/or the frequency range in which its effects are such. Its significant advancement in recent years has led to the technology being adopted commercially in several areas including psychacoustic bass enhancement of small loudspeakers and the high frequency enhancement of coded speech and audio.

Sound from ultrasound is the name given here to the generation of audible sound from modulated ultrasound without using an active receiver. This happens when the modulated ultrasound passes through a nonlinear medium which acts, intentionally or unintentionally, as a demodulator.

Perceptual Evaluation of Audio Quality (PEAQ) is a standardized algorithm for objectively measuring perceived audio quality, developed in 1994-1998 by a joint venture of experts within Task Group 6Q of the International Telecommunication Union's Radiocommunication Sector (ITU-R). It was originally released as ITU-R Recommendation BS.1387 in 1998 and last updated in 2023. It utilizes software to simulate perceptual properties of the human ear and then integrates multiple model output variables into a single metric.

Perceptual Evaluation of Speech Quality (PESQ) is a family of standards comprising a test methodology for automated assessment of the speech quality as experienced by a user of a telephony system. It was standardized as Recommendation ITU-T P.862 in 2001. PESQ is used for objective voice quality testing by phone manufacturers, network equipment vendors and telecom operators. Its usage requires a license. The first edition of PESQ's successor POLQA entered into force in 2011.

Audio equipment testing is the measurement of audio quality through objective and/or subjective means. The results of such tests are published in journals, magazines, whitepapers, websites, and in other media.

Psychoacoustics is the branch of psychophysics involving the scientific study of sound perception and audiology—how human auditory system perceives various sounds. More specifically, it is the branch of science studying the psychological responses associated with sound. Psychoacoustics is an interdisciplinary field of many areas, including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.

Perceptual Objective Listening Quality Analysis (POLQA) was the working title of an ITU-T standard that covers a model to predict speech quality by means of analyzing digital speech signals. The model was standardized as Recommendation ITU-T P.863 in 2011. The second edition of the standard appeared in 2014, and the third, currently in-force edition was adopted in 2018 under the title Perceptual objective listening quality prediction.

High-resolution audio is a term for audio files with greater than 44.1 kHz sample rate or higher than 16-bit audio bit depth. It commonly refers to 96 or 192 kHz sample rates. However, 44.1 kHz/24-bit, 48 kHz/24-bit and 88.2 kHz/24-bit recordings also exist that are labeled HD Audio.

The Circuit Merit system is a measurement process designed to assess the voice-to-noise ratio in wired and wireless telephone circuits, especially the AMPS system, and although its reporting scale is sometimes used as input for calculating mean opinion score, the rating system is officially defined relative to given ranges of voice-to-noise ratios.

Temporal envelope (ENV) and temporal fine structure (TFS) are changes in the amplitude and frequency of sound perceived by humans over time. These temporal changes are responsible for several aspects of auditory perception, including loudness, pitch and timbre perception and spatial hearing.

References

  1. 1 2 Kates, James; Arehart, Kathryn (2010). "The hearing-aid speech quality index (HASQI)". Journal of the Audio Engineering Society. 58 (5): 363–381.
  2. 1 2 Kates, James; Arehart, Kathryn (2014). "The hearing-aid speech quality index (HASQI) version 2". Journal of the Audio Engineering Society. 62 (3): 99–117. doi:10.17743/jaes.2014.0006.
  3. 1 2 Kressner, Abigail A.; Anderson, David V.; Rozell, Christopher J. (2013). "Evaluating the Generalization of the Hearing Aid Speech Quality Index (HASQI)". IEEE Transactions on Audio, Speech, and Language Processing. 21 (2): 407. doi:10.1109/TASL.2012.2217132. S2CID   2722337.
  4. 1 2 Kendrick, Paul; Li, Francis; Fazenda, Bruno; Jackson, Iain; Cox, Trevor (2015). "Perceived Audio Quality of Sounds Degraded by Nonlinear Distortions and Single-Ended Assessment Using HASQI". Journal of the Audio Engineering Society. 63 (9): 698–712. doi: 10.17743/jaes.2015.0068 .