The term silence suppression is used in telephony to describe the process of not transmitting information over the network when one of the parties involved in a telephone call is not speaking, thereby reducing bandwidth usage.
Telephony is the field of technology involving the development, application, and deployment of telecommunication services for the purpose of electronic transmission of voice, fax, or data, between distant parties. The history of telephony is intimately linked to the invention and development of the telephone.
A telecommunications network is a collection of terminal nodes in which links are connected so as to enable telecommunication between the terminals. The transmission links connect the nodes together. The nodes use circuit switching, message switching or packet switching to pass the signal through the correct links and nodes to reach the correct destination terminal.
A telephone call is a connection over a telephone network between the called party and the calling party.
Voice is carried over a digital telephone network by converting the analog signal to a digital signal which is then packetized and sent electronically over the network. The analogue signal is re-created at the receiving end of the network. When one of the parties does not speak, background noise is picked up and sent over the network. This is inefficient as this signal carries no useful information and thus, bandwidth is wasted.
Packet switching is a method of grouping data that is transmitted over a digital network into packets. Packets are made of a header and a payload. Data in the header are used by networking hardware to direct the packet to its destination where the payload is extracted and used by application software. Packet switching is the primary basis for data communications in computer networks worldwide.
Given that typically only one party in a conversation speaks at any one time, silence suppression can achieve overall bandwidth savings in the order of 50% over the duration of a telephone call. (While both parties may sometimes speak at the same time, there are times when both parties are silent.)
Silence suppression is achieved by recognizing the lack of speech through a speech processing mechanism called voice activity detection (VAD) which dynamically monitors background noise and sets a corresponding speech detection threshold. This technique is also known as speech activity detection (SAD).
Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signals. Aspects of speech processing includes the acquisition, manipulation, storage, transfer and output of speech signals. The input is called speech recognition and the output is called speech synthesis.
Voice activity detection (VAD), also known as speech activity detection or speech detection, is a technique used in speech processing in which the presence or absence of human speech is detected. The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol applications, saving on computation and on network bandwidth.
A similar principle is used for Discontinuous Reception and discontinuous transmission in GSM mobile telephone systems.
Discontinuous transmission (DTX) is a means by which a mobile telephone is temporarily shut off or muted while the phone lacks a voice input.
For further bandwidth gains, silence suppression is normally done after echo cancellation.
Background noise detection may be difficult in some circumstances (relatively low speech level, or relatively high background noise level, for example).
When silence suppression is active, the line appears to have gone dead at the other (egress) end of the call. For this reason, so-called comfort noise needs to be generated to compensate for the lack of background noise. The ingress end must therefore signal the egress end that silence suppression is in effect. For best results, the level of comfort noise being generated on egress should match that of the background noise at the ingress end.
Speech activity detection must occur very quickly, otherwise clipping might occur.
Speech activity detection does not work well on non-speech calls (fax or modem communication, for example).
Thus, silence suppression is generally an optional feature on telephony devices. In some cases, it is automatically turned on based on the characteristics of a call.
Synchronous optical networking (SONET) and synchronous digital hierarchy (SDH) are standardized protocols that transfer multiple digital bit streams synchronously over optical fiber using lasers or highly coherent light from light-emitting diodes (LEDs). At low transmission rates data can also be transferred via an electrical interface. The method was developed to replace the plesiochronous digital hierarchy (PDH) system for transporting large amounts of telephone calls and data traffic over the same fiber without synchronization problems.
Speex is an audio compression format specifically tuned for the reproduction of human speech and also a free software speech codec that may be used on VoIP applications and podcasts. It is based on the CELP speech coding algorithm. Speex claims to be free of any patent restrictions and is licensed under the revised (3-clause) BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP. It may also be used with the FLV container format.
In phonetics, the airstream mechanism is the method by which airflow is created in the vocal tract. Along with phonation and articulation, it is one of three main components of speech production. The airstream mechanism is mandatory for sound production and constitutes the first part of this process, which is called initiation.
In cryptography, SIGSALY was a secure speech system used in World War II for the highest-level Allied communications.
VoIP spam or SPIT is unsolicited, automatically dialed telephone calls, typically using voice over Internet Protocol (VoIP) technology.
G.729 is a royalty-free narrow-band vocoder-based audio data compression algorithm using a frame length of 10 milliseconds. It is officially described as Coding of speech at 8 kbit/s using code-excited linear prediction speech coding (CS-ACELP). The wide-band extension of G.729 is called G.729.1, which equals G.729 Annex J.
Selectable Mode Vocoder (SMV) is variable bitrate speech coding standard used in CDMA2000 networks. SMV provides multiple modes of operation that are selected based on input speech characteristics.
In computer networking and telecommunications, TDM over IP (TDMoIP) is the emulation of time-division multiplexing (TDM) over a packet switched network (PSN). TDM refers to a T1, E1, T3 or E3 signal, while the PSN is based either on IP or MPLS or on raw Ethernet. A related technology is circuit emulation, which enables transport of TDM traffic over cell-based (ATM) networks.
Secure voice is a term in cryptography for the encryption of voice communication over a range of communication types such as radio, telephone or IP.
Comfort noise is synthetic background noise used in radio and wireless communications to fill the artificial silence in a transmission resulting from voice activity detection or from the audio clarity of modern digital lines.
Sidetone is audible feedback to someone speaking when using a handset or headset as an indication of an active transmission. The term is often used in the telecommunication field.
Background noise or ambient noise is any sound other than the sound being monitored. Background noise is a form of noise pollution or interference. Background noise is an important concept in setting noise levels affect your background in formations. See noise criteria for cinema/home cinema applications.
TDM-to-packet conversion is the process of converting a digital signal in TDM format into packets for carrying over a packet network such as the Internet.
Wideband audio, also known as wideband voice or HD voice, is high definition voice quality for telephony audio, contrasted with standard digital telephony "toll quality". It extends the frequency range of audio signals transmitted over telephone lines, resulting in higher quality speech. The range of the human voice extends from 80 Hz to 14 kHz but traditional, voiceband or narrowband telephone calls limit audio frequencies to the range of 300 Hz to 3.4 kHz. Wideband audio relaxes the bandwidth limitation and transmits in the audio frequency range of 50 Hz to 7 kHz or even up to 22 kHz. In addition, some wideband codecs may use a higher audio bit depth of 16-bits to encode samples, also resulting in much better voice quality.
In digital telephony, a talkspurt is a continuous segment of speech between silent intervals where only background noise can be heard. Segmenting speech streams into talkspurts allows bandwidth to be conserved by not sending excess data in silent intervals, and also allows synchronization, buffering and other parameters of the communications system to be readjusted in the intervals between talkspurts.
Echo suppression and echo cancellation are methods used in telephony to improve voice quality by preventing echo from being created or removing it after it is already present. In addition to improving subjective audio quality, echo suppression increases the capacity achieved through silence suppression by preventing echo from traveling across a network. Echo suppressors were developed in the 1950s in response to the first use of satellites for telecommunications, but they have since been largely supplanted by better performing echo cancellers.