Speech transmission index

Last updated

Speech Transmission Index (STI) is a measure of speech transmission quality. The absolute measurement of speech intelligibility is a complex science. The STI measures some physical characteristics of a transmission channel (a room, electro-acoustic equipment, telephone line, etc.), and expresses the ability of the channel to carry across the characteristics of a speech signal. STI is a well-established objective measurement predictor of how the characteristics of the transmission channel affect speech intelligibility.

Contents

The influence [1] that a transmission channel has on speech intelligibility is dependent on:

History

The STI was introduced by Tammo Houtgast and Herman Steeneken in 1971, [2] and was accepted by Acoustical Society of America in 1980. [3] Steeneken and Houtgast decided to develop the Speech Transmission Index because they were tasked to carry out a very lengthy series of tedious speech intelligibility measurements for the Netherlands Armed Forces. Instead, they spent the time developing a much quicker objective method (which was actually the predecessor to the STI). [4]

Houtgast and Steeneken developed the Speech Transmission Index while working at The Netherlands Organisation of Applied Scientific Research TNO. Their team at TNO kept supporting and developing the STI, improving the model and developing hardware and software for measuring the STI, until 2010. In that year, the TNO research group responsible for the STI spun out of TNO and continued its work as a privately owned company named Embedded Acoustics. Embedded Acoustics now continues to support development of the STI, with Herman Steeneken (now formally retired from TNO) still acting as a senior consultant.

In the early years (until approx. 1985) the use of the STI was largely limited to a relatively small international community of speech researchers. The introduction of the RASTI ("Room Acoustics STI") made the STI method available to a larger population of engineers and consultants, especially when Bruel & Kjaer introduced their RASTI measuring device (which was based on the earlier RASTI system developed by Steeneken and Houtgast at TNO). RASTI was designed to be much faster than the original ("full") STI, taking less than 30 seconds instead of 15 minutes for a measuring point. However, RASTI was only intended (as the name says) for pure room acoustics, not electro-acoustics. Application of RASTI to transmission chains featuring electro-acoustic components (such as loudspeakers and microphones) became fairly common, and led to complaints about inaccurate results. The use of RASTI was even specified by some application standards (such as CAA specification 15 for aircraft cabin PA systems) for applications featuring electro-acoustics, simply because it was the only feasible method at the time. The inadequacies of RASTI were sometimes simply accepted for lack of a better alternative. TNO did produce and sell instruments for measuring full STI and various other STI derivatives, but these devices were relatively expensive, large and heavy.

Around the year 2000, the need for an alternative to RASTI that could also be applied safely to Public Address (PA) systems had become fully apparent. At TNO, Jan Verhave and Herman Steeneken started work on a new STI method, that would later become known as STIPA (STI for Public Address systems). The first device to include STIPA measurements available for sale to the general public was made by Gold-Line. At this time, STIPA measuring instruments are available from various manufacturers.

RASTI was standardized internationally in 1988, in IEC-60268-16. Since then, IEC-60268-16 has been revised three times, the latest revisions (rev.4) appearing in 2011. Each revision included updates of the STI methodology that had become accepted in the STI research community over time, such as the inclusion of redundancy between adjacent octave bands (rev.2), level-dependent auditory masking (rev.3) and various methods for applying the STI to specific populations such as non-natives and the hearing impaired (rev.4). An IEC maintenance team is currently working on rev. 5.

RASTI was declared obsolete by the IEC in June 2011, with the appearance of rev. 4 of IEC-602682-16. At this time, this simplified STI derivative was still stipulated as a standard method in some industries. STIPA is now seen as the successor to RASTI for almost every application.

Scale

STI is a numeric representation measure of communication channel characteristics whose value varies from 0 = bad to 1 = excellent. [5] On this scale, an STI of at least .5 is desirable for most applications.

Barnett (1995, [6] 1999 [7] ) proposed to use a reference scale, the Common Intelligibility Scale (CIS), based on a mathematical relation with STI (CIS = 1 + log (STI)).

Speech Intelligibility may be expressed by a single number value. Two scales are most commonly used: STI and CIS STI CIS Scale.png
Speech Intelligibility may be expressed by a single number value. Two scales are most commonly used: STI and CIS

STI predicts the likelihood of syllables, words and sentences being comprehended. As an example, for native speakers, this likelihood is given by:

STI valueQuality according to IEC 60268-16Intelligibility of syllables in %Intelligibility of words in %Intelligibility of sentences in %
0 – 0.3bad0 – 340 – 670 – 89
0.3 – 0.45poor34 – 4867 – 7889 – 92
0.45 – 0.6fair48 – 6778 – 8792 – 95
0.6 – 0.75good67 – 9087 – 9495 – 96
0.75 – 1excellent90 – 9694 – 9696 – 100

If non-native speakers, people with speech disorders or hard-of-hearing people are involved, other probabilities hold.

It is interesting but not astonishing that STI prediction is independent of the language spoken – not astonishing, as the ability of the channel to transport patterns of physical speech is measured.

Another method is defined for computing a physical measure that is highly correlated with the intelligibility of speech as evaluated by speech perception tests given a group of talkers and listeners. This measure is called the Speech Intelligibility Index, or SII. [8]

Nominal qualification bands for STI

The IEC 60268-16 ed4 2011 Standard defines a qualification scale in order to provide flexibility for different applications. The values of this alpha-scale run from "U" to "A+". [9]

Nominal qualification bands for STI STI qualification bands.png
Nominal qualification bands for STI
Examples of STI qualification bands and typical applications Examples of STI qualification bands and typical applications.png
Examples of STI qualification bands and typical applications

Standards

STI has gained international acceptance as the quantifier of channel influence on speech intelligibility. The International Electrotechnical Commission Objective rating of speech intelligibility by speech transmission index, [9] as prepared by the TC 100 Technical Committee, defines the international standard.

Further the following standards have, as part of the requirements to be fulfilled, integrated testing the STI and realisation of a minimal speech transmission index:

STIPA

STIPA (Speech Transmission Index for Public Address Systems) is a version of the STI using a simplified method and test signal. Within the STIPA signal, each octave band is modulated simultaneously with two modulation frequencies. The modulation frequencies are spread among the octave bands in a balanced way, making it possible to obtain a reliable STI measurement based on a sparsely sampled Modulation Transfer Function matrix. Although initially designed for Public Address systems (and similar installations, such as Voice Evacuation Systems and Mass Notification Systems), STIPA can also be used for a variety of other applications. The only situation in which RASTI is currently considered inferior to full STI is in the presence of strong echoes.

A single STIPA measurement generally takes between 15 and 25 seconds, combining the speed of RASTI with (nearly) the wide scope of applicability and reliability of full STI.

Since STIPA has become widely available, and given the fact that RASTI has several disadvantages and no benefits over STIPA, RASTI is now considered obsolete.

Although the STIPA test signal does not resemble speech to the human ear, in terms of frequency content as well as intensity fluctuations it is a signal with speech-like characteristics.

Speech can be described as noise that is intensity-modulated by low-frequency signals. The STIPA signal contains such intensity modulations at 14 different modulation frequencies, spread across 7 octave bands. At the receiving end of the communication system, the depth of modulation of the received signal is measured and compared with that of the test signal in each of a number of frequency bands. Reductions in the modulation depth are associated with loss of intelligibility.

Indirect method

An alternative Impulse response method, also known as the "indirect method," assumes that the channel is linear and requires stricter synchronization of the sound source to the measurement instrument. The main benefit of the indirect method over the direct method (based on modulated test signals) is that the full MTF matrix is measured, covering all relevant modulation frequencies in all octave bands. In very large spaces (such as cathedrals), where echoes are likely to occur, the indirect method is usually preferred over direct method (e.g. using modulated STIPA signals). In general, the indirect method is often the best option when studying speech intelligibility based on "pure room acoustics," when no electro-acoustic components are present within the transmission path.

However, the requirement that the channel must be linear implies that the indirect method cannot be used reliably in many real-life applications: whenever the transmission chain features components that might exhibit non-linear behaviour (such as loudspeakers), indirect measurements may yield incorrect results. Also, depending on the type of impulse response measurement that is used, the influence of background noise present during measurements may not be dealt with correctly. This means that the indirect method should only be used with great care when measuring Public Address systems and Voice Evacuation systems. IEC-60268-16 rev. 4 does not disallow the indirect method for such applications, but issues the following words of warning: "Critical analysis is therefore required of how the impulse response is obtained and potentially influenced by non-linearities in the transmission system, particularly as in practice, system components can be operated at the limits of their performance range." In practice, verification of the validity of the linearity assumption is often too complex for everyday use, making the (direct) STIPA method the preferred method whenever loudspeakers are involved.

Although many measuring tools based on the indirect method offer STIPA as well as "full STI" options, the sparse Modulation Transfer Function matrix inherent to STIPA offers no advantages when using the indirect method. Impulse response based STIPA measurements must not be confused with direct STIPA measurements, as the validity of the result still depends on whether or not the channel is linear.

List of manufacturers of STI measuring instruments

STI measuring instruments are (and have been) made by various manufacturers. Below is a list of brands under which STI measuring instruments have been sold, in alphabetical order.

The market for STI measuring solution is still developing, so the above list is subject to change as manufacturers enter or leave the market. The list does not include software producers that produce STI-capable acoustic measuring and simulation software. Mobile apps for STIPA measurements (such as the ones sold by Studio Six Digital and Embedded Acoustics ) are also excluded from the list.

See also

Related Research Articles

<span class="mw-page-title-main">Amplitude modulation</span> Radio modulation via wave amplitude

Amplitude modulation (AM) is a modulation technique used in electronic communication, most commonly for transmitting messages with a radio wave. In amplitude modulation, the amplitude of the wave is varied in proportion to that of the message signal, such as an audio signal. This technique contrasts with angle modulation, in which either the frequency of the carrier wave is varied, as in frequency modulation, or its phase, as in phase modulation.

<span class="mw-page-title-main">Acoustics</span> Branch of physics involving mechanical waves

Acoustics is a branch of physics that deals with the study of mechanical waves in gases, liquids, and solids including topics such as vibration, sound, ultrasound and infrasound. A scientist who works in the field of acoustics is an acoustician while someone working in the field of acoustics technology may be called an acoustical engineer. The application of acoustics is present in almost all aspects of modern society with the most obvious being the audio and noise control industries.

<span class="mw-page-title-main">Single-sideband modulation</span> Type of modulation

In radio communications, single-sideband modulation (SSB) or single-sideband suppressed-carrier modulation (SSB-SC) is a type of modulation used to transmit information, such as an audio signal, by radio waves. A refinement of amplitude modulation, it uses transmitter power and bandwidth more efficiently. Amplitude modulation produces an output signal the bandwidth of which is twice the maximum frequency of the original baseband signal. Single-sideband modulation avoids this bandwidth increase, and the power wasted on a carrier, at the cost of increased device complexity and more difficult tuning at the receiver.

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.

In signal processing, distortion is the alteration of the original shape of a signal. In communications and electronics it means the alteration of the waveform of an information-bearing signal, such as an audio signal representing sound or a video signal representing images, in an electronic device or communication channel.

A signal generator is one of a class of electronic devices that generates electrical signals with set properties of amplitude, frequency, and wave shape. These generated signals are used as a stimulus for electronic measurements, typically used in designing, testing, troubleshooting, and repairing electronic or electroacoustic devices, though it often has artistic uses as well.

<span class="mw-page-title-main">Microphone</span> Device that converts sound into an electrical signal

A microphone, colloquially called a mic, or mike, is a transducer that converts sound into an electrical signal. Microphones are used in many applications such as telephones, hearing aids, public address systems for concert halls and public events, motion picture production, live and recorded audio engineering, sound recording, two-way radios, megaphones, and radio and television broadcasting. They are also used in computers and other electronic devices, such as mobile phones, for recording sounds, speech recognition, VoIP, and other purposes, such as ultrasonic sensors or knock sensors.

Reverberation, in acoustics, is a persistence of sound after it is produced. Reverberation is created when a sound or signal is reflected. This causes numerous reflections to build up and then decay as the sound is absorbed by the surfaces of objects in the space – which could include furniture, people, and air. This is most noticeable when the sound source stops but the reflections continue, their amplitude decreasing, until zero is reached.

Audio power is the electrical power transferred from an audio amplifier to a loudspeaker, measured in watts. The electrical power delivered to the loudspeaker, together with its efficiency, determines the sound power generated.

<span class="mw-page-title-main">Acoustical engineering</span> Branch of engineering dealing with sound and vibration

Acoustical engineering is the branch of engineering dealing with sound and vibration. It includes the application of acoustics, the science of sound and vibration, in technology. Acoustical engineers are typically concerned with the design, analysis and control of sound.

A weighting curve is a graph of a set of factors, that are used to 'weight' measured values of a variable according to their importance in relation to some outcome. An important example is frequency weighting in sound level measurement where a specific set of weighting curves known as A-, B-, C-, and D-weighting as defined in IEC 61672 are used. Unweighted measurements of sound pressure do not correspond to perceived loudness because the human ear is less sensitive at low and high frequencies, with the effect more pronounced at lower sound levels. The four curves are applied to the measured sound level, for example by the use of a weighting filter in a sound level meter, to arrive at readings of loudness in phons or in decibels (dB) above the threshold of hearing.

In speech communication, intelligibility is a measure of how comprehensible speech is in given conditions. Intelligibility is affected by the level and quality of the speech signal, the type and level of background noise, reverberation, and, for speech over communication devices, the properties of the communication system. A common standard measurement for the quality of the intelligibility of speech is the Speech Transmission Index (STI). The concept of speech intelligibility is relevant to several fields, including phonetics, human factors, acoustical engineering, and audiometry.

<span class="mw-page-title-main">Sound level meter</span> Device for acoustic measurements

A sound level meter is used for acoustic measurements. It is commonly a hand-held instrument with a microphone. The best type of microphone for sound level meters is the condenser microphone, which combines precision with stability and reliability. The diaphragm of the microphone responds to changes in air pressure caused by sound waves. That is why the instrument is sometimes referred to as a sound pressure level meter (SPL). This movement of the diaphragm, i.e. the sound pressure, is converted into an electrical signal. While describing sound in terms of sound pressure, a logarithmic conversion is usually applied and the sound pressure level is stated instead, in decibels (dB), with 0 dB SPL equal to 20 micropascals.

<span class="mw-page-title-main">Loudspeaker measurement</span> Quantifying the behaviour of loudspeakers

Loudspeaker measurement is the practice of determining the behaviour of loudspeakers by measuring various aspects of performance. This measurement is especially important because loudspeakers, being transducers, have a higher level of distortion than other audio system components used in playback or sound reinforcement.

<span class="mw-page-title-main">Underwater acoustic communication</span> Wireless technique of sending and receiving messages through water

Underwater acoustic communication is a technique of sending and receiving messages in water. There are several ways of employing such communication but the most common is by using hydrophones. Underwater communication is difficult due to factors such as multi-path propagation, time variations of the channel, small available bandwidth and strong signal attenuation, especially over long ranges. Compared to terrestrial communication, underwater communication has low data rates because it uses acoustic waves instead of electromagnetic waves.

<span class="mw-page-title-main">Smaart</span> Audio measurement software

Smaart is a suite of audio and acoustical measurements and instrumentation software tools introduced in 1996 by JBL's professional audio division. It is designed to help the live sound engineer optimize sound reinforcement systems before public performance and actively monitor acoustical parameters in real time while an audio system is in use. Most earlier analysis systems required specific test signals sent through the sound system, ones that would be unpleasant for the audience to hear. Smaart is a source-independent analyzer and therefore will work effectively with a variety of test signals including speech or music.

Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.

<span class="mw-page-title-main">NTi Audio</span> Liechtenstein audio equipment manufacturer

NTi Audio AG is a manufacturer of test and measurement instruments for acoustics, audio and vibration applications. With headquarters in Schaan, Liechtenstein, the company specializes in end-of-line audio testing for manufacturing quality control purposes, provides instruments for testing public address systems in safety-critical environments and also produces handheld Audio Analyzers and generators aimed at the professional audio industry.

Temporal envelope (ENV) and temporal fine structure (TFS) are changes in the amplitude and frequency of sound perceived by humans over time. These temporal changes are responsible for several aspects of auditory perception, including loudness, pitch and timbre perception and spatial hearing.

References

  1. Speech Intelligibility Measurement Methods
  2. Houtgast, T. and Steeneken, H. J. M. (1971), "Evaluation of Speech Transmission Channels by Using Artificial Signals", Acustica 25, 355–367.
  3. Steeneken, H. J. M. and Houtgast, T. and (1980), "A physical method for measuring speech-transmission quality", J. Acoust. Soc. Am 67, 318–326.
  4. Sander van Wijngaarden, Jan Verhave and Herman Steeneken (2012). The Speech Transmission Index after four decades of development.
  5. THE MEASUREMENT OF SPEECH INTELLIGIBILITY Herman J.M. Steeneken TNO Human Factors, Soesterberg, the Netherlands
  6. Barnett, P. W. and Knight, R.D. (1995). "The Common Intelligibility Scale", Proc. I.O.A. Vol 17, part 7.
  7. Barnett, P. W. (1999). "Overview of speech intelligibility" Proc. I.O.A Vol 21 Part 5.
  8. Speech Intelligibility Index site created by the Acoustical Society of America (ASA) Working Group S3-79
  9. 1 2 International Electrotechnical Commission IEC 60268-16: Sound system equipment – Part 16: Objective rating of speech intelligibility by speech transmission index Fourth edition 2011-06
  10. ISO 7240-24:2010 Fire detection and fire alarm systems -- Part 24: Sound-system loudspeakers
  11. NFPA 72 National Fire Alarm Code (2010 edition)
  12. BS 5839-8 Fire detection and alarm systems for buildings. Code of practice for the design, installation and servicing of voice alarm systems
  13. Deutsches Institut für Normung DIN 60849 System regulation with application regulation DIN VDE 0833-4

Jacob, K., McManus, S., Verhave, J.A., and Steeneken, H., (2002) "Development of an Accurate, Handheld, Simple-to-use Meter for the Prediction of Speech Intelligibility", Past, Present, and Future of the Speech Transmission Index, International Symposium on STI