Secure voice

Last updated November 11, 2024

Secure voice (alternatively secure speech or ciphony) is a term in cryptography for the encryption of voice communication over a range of communication types such as radio, telephone or IP.

History

The implementation of voice encryption dates back to World War II when secure communication was paramount to the US armed forces. During that time, noise was simply added to a voice signal to prevent enemies from listening to the conversations. Noise was added by playing a record of noise in sync with the voice signal and when the voice signal reached the receiver, the noise signal was subtracted out, leaving the original voice signal. In order to subtract out the noise, the receiver needed to have exactly the same noise signal and the noise records were only made in pairs; one for the transmitter and one for the receiver. Having only two copies of records made it impossible for the wrong receiver to decrypt the signal. To implement the system, the army contracted Bell Laboratories and they developed a system called SIGSALY. With SIGSALY, ten channels were used to sample the voice frequency spectrum from 250 Hz to 3 kHz and two channels were allocated to sample voice pitch and background hiss. In the time of SIGSALY, the transistor had not been developed and the digital sampling was done by circuits using the model 2051 Thyratron vacuum tube. Each SIGSALY terminal used 40 racks of equipment weighing 55 tons and filled a large room. This equipment included radio transmitters and receivers and large phonograph turntables. The voice was keyed to two 410-millimetre (16 in) vinyl phonograph records that contained a frequency-shift keying (FSK) audio tone. The records were played on large precise turntables in sync with the voice transmission.

From the introduction of voice encryption to today, encryption techniques have evolved drastically. Digital technology has effectively replaced old analog methods of voice encryption and by using complex algorithms, voice encryption has become much more secure and efficient. One relatively modern voice encryption method is Sub-band coding. With Sub-band Coding, the voice signal is split into multiple frequency bands, using multiple bandpass filters that cover specific frequency ranges of interest. The output signals from the bandpass filters are then lowpass translated to reduce the bandwidth, which reduces the sampling rate. The lowpass signals are then quantized and encoded using special techniques like, pulse-code modulation (PCM). After the encoding stage, the signals are multiplexed and sent out along the communication network. When the signal reaches the receiver, the inverse operations are applied to the signal to get it back to its original state.^[1] A speech scrambling system was developed at Bell Laboratories in the 1970s by Subhash Kak and Nikil Jayant.^[2] In this system permutation matrices were used to scramble coded representations (such as pulse-code modulation and variants) of the speech data. Motorola developed a voice encryption system called Digital Voice Protection (DVP) as part of their first generation of voice encryption techniques. DVP uses a self-synchronizing encryption technique known as cipher feedback (CFB). The extremely high number of possible keys associated with the early DVP algorithm, makes the algorithm very robust and gives a high level of security. As with other symmetric keyed encryption systems, the encryption key is required to decrypt the signal with a special decryption algorithm.

Digital

A digital secure voice usually includes two components, a digitizer to convert between speech and digital signals and an encryption system to provide confidentiality. It is difficult in practice to send the encrypted signal over the same voiceband communication circuits used to transmit unencrypted voice, e.g. analog telephone lines or mobile radios, due to bandwidth expansion.

This has led to the use of Voice Coders (vocoders) to achieve tight bandwidth compression of the speech signals. NSA's STU-III, KY-57 and SCIP are examples of systems that operate over existing voice circuits. The STE system, by contrast, requires wide bandwidth ISDN lines for its normal mode of operation. For encrypting GSM and VoIP, which are natively digital, the standard protocol ZRTP could be used as an end-to-end encryption technology.

Secure voice's robustness greatly benefits from having the voice data compressed into very low bit-rates by special component called speech coding, voice compression or voice coder (also known as vocoder). The old secure voice compression standards include (CVSD, CELP, LPC-10e and MELP, where the latest standard is the state of the art MELPe algorithm.

Digital methods using voice compression: MELP or MELPe

The MELPe or enhanced-MELP (Mixed Excitation Linear Prediction) is a United States Department of Defense speech coding standard used mainly in military applications and satellite communications, secure voice, and secure radio devices. Its development was led and supported by NSA, and NATO. The US government's MELPe secure voice standard is also known as MIL-STD-3005, and the NATO's MELPe secure voice standard is also known as STANAG-4591.

The initial MELP was invented by Alan McCree around 1995.^[3] That initial speech coder was standardized in 1997 and was known as MIL-STD-3005.^[4] It surpassed other candidate vocoders in the US DoD competition, including: (a) Frequency Selective Harmonic Coder (FSHC), (b) Advanced Multi-Band Excitation (AMBE), (c) Enhanced Multiband Excitation (EMBE), (d) Sinusoid Transform Coder (STC), and (e) Subband LPC Coder (SBC). Due to its lower complexity^{[ citation needed ]} than Waveform Interpolative (WI) coder, the MELP vocoder won the DoD competition and was selected for MIL-STD-3005.

Between 1998 and 2001, a new MELP-based vocoder was created at half the rate (i.e. 1200 bit/s) and substantial enhancements were added to the MIL-STD-3005 by SignalCom (later acquired by Microsoft), AT&T Corporation, and Compandent which included (a) additional new vocoder at half the rate (i.e. 1200 bit/s), (b) substantially improved encoding (analysis), (c) substantially improved decoding (synthesis), (d) Noise-Preprocessing for removing background noise, (e) transcoding between the 2400 bit/s and 1200 bit/s bitstreams, and (f) new postfilter. This fairly significant development was aimed to create a new coder at half the rate and have it interoperable with the old MELP standard. This enhanced-MELP (also known as MELPe) was adopted as the new MIL-STD-3005 in 2001 in form of annexes and supplements made to the original MIL-STD-3005, enabling the same quality as the old 2400 bit/s MELP's at half the rate. One of the greatest advantages of the new 2400 bit/s MELPe is that it shares the same bit format as MELP, and hence can interoperate with legacy MELP systems, but would deliver better quality at both ends. MELPe provides much better quality than all older military standards, especially in noisy environments such as battlefield and vehicles and aircraft.

In 2002, following extensive competition and testing, the 2400 and 1200 bit/s US DoD MELPe was adopted also as NATO standard, known as STANAG-4591.^[5] As part of NATO testing for new NATO standard, MELPe was tested against other candidates such as France's HSX (Harmonic Stochastic eXcitation) and Turkey's SB-LPC (Split-Band Linear Predictive Coding), as well as the old secure voice standards such as FS1015 LPC-10e (2.4 kbit/s), FS1016 CELP (4.8 kbit/s) and CVSD (16 kbit/s). Subsequently, the MELPe won also the NATO competition, surpassing the quality of all other candidates as well as the quality of all old secure voice standards (CVSD, CELP and LPC-10e). The NATO competition concluded that MELPe substantially improved performance (in terms of speech quality, intelligibility, and noise immunity), while reducing throughput requirements. The NATO testing also included interoperability tests, used over 200 hours of speech data, and was conducted by three test laboratories worldwide. Compandent Inc, as a part of MELPe-based projects performed for NSA and NATO, provided NSA and NATO with special test-bed platform known as MELCODER device that provided the golden reference for real-time implementation of MELPe. The low-cost FLEXI-232 Data Terminal Equipment (DTE) made by Compandent, which are based on the MELCODER golden reference, are very popular and widely used for evaluating and testing MELPe in real-time, various channels & networks, and field conditions.

The NATO competition concluded that MELPe substantially improved performance (in terms of speech quality, intelligibility, and noise immunity), while reducing throughput requirements. The NATO testing also included interoperability tests, used over 200 hours of speech data, and was conducted by three test laboratories worldwide.

In 2005, a new 600 bit/s rate MELPe variation by Thales Group (France) was added (without extensive competition and testing as performed for the 2400/1200 bit/s MELPe) ^[6] to the NATO standard STANAG-4591, and there are more advanced efforts to lower the bitrates to 300 bit/s and even 150 bit/s.^[7]

In 2010, Lincoln Labs., Compandent, BBN, and General Dynamics also developed for DARPA a 300 bit/s MELP device.^[8] Its quality was better than the 600 bit/s MELPe, but its delay was longer.

Related Research Articles

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.

A vocoder is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation.

Linear predictive coding (LPC) is a method used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model.

In telecommunications, a scrambler is a device that transposes or inverts signals or otherwise encodes a message at the sender's side to make the message unintelligible at a receiver not equipped with an appropriately set descrambling device. Whereas encryption usually refers to operations carried out in the digital domain, scrambling usually refers to operations carried out in the analog domain. Scrambling is accomplished by the addition of components to the original signal or the changing of some important component of the original signal in order to make extraction of the original signal difficult. Examples of the latter might include removing or changing vertical or horizontal sync pulses in television signals; televisions will not be able to display a picture from such a signal. Some modern scramblers are actually encryption devices, the name remaining due to the similarities in use, as opposed to internal operation.

TEMPEST is a U.S. National Security Agency specification and a NATO certification referring to spying on information systems through leaking emanations, including unintentional radio or electrical signals, sounds, and vibrations. TEMPEST covers both methods to spy upon others and how to shield equipment against such spying. The protection efforts are also known as emission security (EMSEC), which is a subset of communications security (COMSEC). The reception methods fall under the umbrella of radiofrequency MASINT.

Interim Standard 95 (IS-95) was the first digital cellular technology that used code-division multiple access (CDMA). It was developed by Qualcomm and later adopted as a standard by the Telecommunications Industry Association in TIA/EIA/IS-95 release published in 1995. The proprietary name for IS-95 is cdmaOne.

<span class="mw-page-title-main">SIGSALY</span> Secure speech system

SIGSALY was a secure speech system used in World War II for the highest-level Allied communications. It pioneered a number of digital communications concepts, including the first transmission of speech using pulse-code modulation.

The Secure Communications Interoperability Protocol (SCIP) is a US standard for secure voice and data communication, for circuit-switched one-to-one connections, not packet-switched networks. SCIP derived from the US Government Future Narrowband Digital Terminal (FNBDT) project. SCIP supports a number of different modes, including national and multinational modes which employ different cryptography. Many nations and industries develop SCIP devices to support the multinational and national modes of SCIP.

Mixed-excitation linear prediction (MELP) is a United States Department of Defense speech coding standard used mainly in military applications and satellite communications, secure voice, and secure radio devices. Its standardization and later development was led and supported by the NSA and NATO. The current "enhanced" version is known as MELPe.

Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.

FIPS 137, originally issued as FED-STD-1015, is a secure telephony speech encoding standard for Linear Predictive Coding vocoder developed by the United States Department of Defense and finished on November 28, 1984. It was based on the earlier STANAG 4198 promulgated by NATO on February 13, 1984.

FS-1016 is a deprecated secure telephony speech encoding standard for Code-excited linear prediction (CELP) developed by the United States Department of Defense and finalized February 14, 1991.

MIL-STD-1553 is a military standard published by the United States Department of Defense that defines the mechanical, electrical, and functional characteristics of a serial data bus. It was originally designed as an avionic data bus for use with military avionics, but has also become commonly used in spacecraft on-board data handling (OBDH) subsystems, both military and civil, including use on the James Webb space telescope. It features multiple redundant balanced line physical layers, a (differential) network interface, time-division multiplexing, half-duplex command/response protocol, and can handle up to 31 Remote Terminals (devices); 32 is typically designated for broadcast messages. A version of MIL-STD-1553 using optical cabling in place of electrical is known as MIL-STD-1773.

<span class="mw-page-title-main">Secure telephone</span> Telephone that provides encrypted calls

A secure telephone is a telephone that provides voice security in the form of end-to-end encryption for the telephone call, and in some cases also the mutual authentication of the call parties, protecting them against a man-in-the-middle attack. Concerns about massive growth of telephone tapping incidents led to growing demand for secure telephones.

TADIL-A/Link 11 is a secure half-duplex tactical data link used by NATO to exchange digital data. It was originally developed by a joint committee including members from the Royal Canadian Navy, US Navy and Royal Navy to pass accurate targeting information between ships. The final standard was signed in Ottawa in November 1957, where the British proposed the name "TIDE" for "Tactical International Data Exchange". It was later made part of the NATO STANAG standardization process.

The AN/PRC-150(C) Falcon II Manpack Radio, is a tactical HF-SSB/ VHF-FM manpack radio manufactured by Harris Corporation. It holds an NSA certification for Type 1 encryption. The PRC-150 is the manpack HF radio for the Harris Falcon II family of radios, introduced in the early 2000s.

FASCINATOR is a series of Type 1 encryption modules designed in the late-1980s to be installed in Motorola SECURENET-capable voice radios. These radios were originally built to accept a proprietary Digital Voice Protection (DVP) or a DES-based encryption module that was not approved by NSA for classified communications. The FASCINATOR modules replaced the standard modules and can be used for classified conversations at all levels when used with appropriately classified keys. FASCINATOR operates at 12 kbit/s for encryption and decryption. It is not compatible with other encryption schema.

NXDN stands for Next Generation Digital Narrowband, and is an open standard for public land mobile radio systems; that is, systems of two-way radios (transceivers) for bidirectional person-to-person voice communication. It was developed jointly by Icom Incorporated and Kenwood Corporation as an advanced digital system using FSK modulation that supports encrypted transmission and data as well as voice transmission. Like other land mobile systems, NXDN systems use the VHF and UHF frequency bands. It is also used as a niche mode in amateur radio.

STANAG 4285, is a NATO HF radio technical standard for text-based radio broadcasts. It corresponds to "NATO mode" in the US military standard MIL-STD-188-110B.

References

↑ Owens, F. J. (1993). Signal Processing of Speech. Houndmills: MacMillan Press. ISBN 0-333-51922-1.
↑ Kak, S. and Jayant, N.S., Speech encryption using waveform scrambling. Bell System Technical Journal, vol. 56, pp. 781–808, May–June 1977.
↑ A Mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding, Alan V. McCree, Thomas P. Barnweell, 1995 in IEEE Trans. Speech and Audio Processing (Original MELP)
↑ Analog-to-Digital Conversion of Voice by 2,400 Bit/Second Mixed Excitation Linear Prediction (MELP), US DoD (MIL_STD-3005, Original MELP)
↑ THE 1200 AND 2400 BIT/S NATO INTEROPERABLE NARROW BAND VOICE CODER, STANAG-4591, NATO
↑ MELPe VARIATION FOR 600 BIT/S NATO NARROW BAND VOICE CODER, STANAG-4591, NATO
↑ Nichols, Randall K. & Lekkas, Panos C. (2002). "Speech cryptology". Wireless Security: Models, Threats, and Solutions . New York: McGraw-Hill. ISBN 0-07-138038-8.
↑ Alan McCree, “A scalable phonetic vocoder framework using joint predictive vector quantization of MELP parameters,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 2006, pp. I 705–708, Toulouse, France

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Owens, F. J. (1993). Signal Processing of Speech. Houndmills: MacMillan Press. ISBN 0-333-51922-1.

[2] Kak, S. and Jayant, N.S., Speech encryption using waveform scrambling. Bell System Technical Journal, vol. 56, pp. 781–808, May–June 1977.

[3] A Mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding, Alan V. McCree, Thomas P. Barnweell, 1995 in IEEE Trans. Speech and Audio Processing (Original MELP)

[4] Analog-to-Digital Conversion of Voice by 2,400 Bit/Second Mixed Excitation Linear Prediction (MELP), US DoD (MIL_STD-3005, Original MELP)

[5] THE 1200 AND 2400 BIT/S NATO INTEROPERABLE NARROW BAND VOICE CODER, STANAG-4591, NATO

[6] MELPe VARIATION FOR 600 BIT/S NATO NARROW BAND VOICE CODER, STANAG-4591, NATO

[7] Nichols, Randall K. & Lekkas, Panos C. (2002). "Speech cryptology". Wireless Security: Models, Threats, and Solutions . New York: McGraw-Hill. ISBN 0-07-138038-8.

[8] Alan McCree, “A scalable phonetic vocoder framework using joint predictive vector quantization of MELP parameters,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 2006, pp. I 705–708, Toulouse, France

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]