SIGSALY

Last updated June 09, 2024

SIGSALY (also known as the X System, Project X, Ciphony I, and the Green Hornet) was a secure speech system used in World War II for the highest-level Allied communications. It pioneered a number of digital communications concepts, including the first transmission of speech using pulse-code modulation.

The name SIGSALY was not an acronym, but a cover name that resembled an acronym—the SIG part was common in Army Signal Corps names (e.g., SIGABA).^[1] The prototype was called the "Green Hornet" after the radio show The Green Hornet , because it sounded like a buzzing hornet, resembling the show's theme tune, to anyone trying to eavesdrop on the conversation.^[2]

Development

At the time of its inception, long-distance telephone communications used the "A-3" voice scrambler developed by Western Electric. It worked on the voice inversion principle. The Germans had a listening station on the Dutch coast which could intercept and break A-3 traffic.^[1]

Although telephone scramblers were used by both sides in World War II, they were known not to be very secure in general, and both sides often cracked the scrambled conversations of the other. Inspection of the audio spectrum using a spectrum analyzer often provided significant clues to the scrambling technique. The insecurity of most telephone scrambler schemes led to the development of a more secure scrambler, based on the one-time pad principle.

A prototype was developed at Bell Telephone Laboratories, under the direction of A. B. Clark, assisted by British mathematician Alan Turing,^[1]^[3] and demonstrated to the US Army. The Army was impressed and awarded Bell Labs a contract for two systems in 1942. SIGSALY went into service in 1943 and remained in service until 1946.

Operation

SIGSALY used a random noise mask to encrypt voice conversations which had been encoded by a vocoder. The latter was used to minimize the amount of redundancy (which is high in voice traffic), in order to reduce the amount of information to be encrypted.^[2]

The voice encoding used the fact that speech varies fairly slowly as the components of the throat move. The system extracts information about the voice signal 50 times a second (every 20 milliseconds).^[4]

ten channels covering the telephone passband (250 Hz – 2,950 Hz); are rectified and filtered to extract how much energy is in each of these channels.
another signal indicating whether the sound is voiced or unvoiced;
if voiced, a signal indicating the pitch; this also varied at less than 25 Hz.

Next, each signal was sampled for its amplitude once every 20 milliseconds.^[4] For the band amplitude signals, the amplitude converted into one of six amplitude levels, with values from 0 through 5. The amplitude levels were on a nonlinear scale, with the steps between levels wide at high amplitudes and narrower at low amplitudes. This scheme, known as "companding" or "compressing-expanding", exploits the fact that the fidelity of voice signals is more sensitive to low amplitudes than to high amplitudes. The pitch signal, which required greater sensitivity, was encoded by a pair of six-level values (one coarse, and one fine), giving thirty-six levels in all.

A cryptographic key, consisting of a series of random values from the same set of six levels, was subtracted from each sampled voice amplitude value to encrypt them before transmission. The subtraction was performed using modular arithmetic: a "wraparound" fashion, meaning that if there was a negative result, it was added to six to give a positive result. For example, if the voice amplitude value was 3 and the random value was 5, then the subtraction would work as follows:

3-5\equiv -2,\ -2+6\equiv 4{\pmod {6}}\,

— giving a value of 4.

The sampled value was then transmitted, with each sample level transmitted on one of six corresponding frequencies in a frequency band, a scheme known as "frequency-shift keying (FSK)". The receiving SIGSALY read the frequency values, converted them into samples, and added the key values back to them to decrypt them. The addition was also performed in a "modulo" fashion, with six subtracted from any value over five. To match the example above, if the receiving SIGSALY got a sample value of 4 with a matching random value of 5, then the addition would be as follows:

4+5\equiv 9,\ 9-6\equiv 3{\pmod {6}}\,

— which gives the correct value of 3.

To convert the samples back into a voice waveform, they were first turned back into the dozen low-frequency vocoded signals. An inversion of the vocoder process was employed, which included:

a white noise source (used for unvoiced sounds);
a signal generator (used for voiced sounds) generating a set of harmonics, with the base frequency controlled by the pitch signal;
a switch, controlled by the voiced/unvoiced signal, to select one of these two as a source;
a set of filters (one for each band), all taking as input the same source (the source selected by the switch), along with amplifiers whose gain was controlled by the band amplitude signals.

The noise values used for the encryption key were originally produced by large mercury-vapor rectifying vacuum tubes and stored on a phonograph record. The record was then duplicated, with the records being distributed to SIGSALY systems on both ends of a conversation. The records served as the SIGSALY one-time pad, and distribution was very strictly controlled (although if one had been seized, it would have been of little importance, since only one pair of each was ever produced). For testing and setup purposes, a pseudo-random number generating system made out of relays, known as the "threshing machine", was used.

The records were played on turntables, but since the timing – the clock synchronization – between the two SIGSALY terminals had to be precise, the turntables were by no means just ordinary record-players. The rotation rate of the turntables was carefully controlled, and the records were started at highly specific times, based on precision time-of-day clock standards. Since each record only provided 12 minutes of key, each SIGSALY had two turntables, with a second record "queued up" while the first was "playing".

Usage

A SIGSALY terminal in 1943. SIGSALY-1943.jpg — A SIGSALY terminal in 1943.

The SIGSALY terminal was massive, consisting of 40 racks of equipment. It weighed over 50 tons, and used about 30 kW of power, necessitating an air-conditioned room to hold it. Too big and cumbersome for general use, it was only used for the highest level of voice communications.^[5]

A dozen SIGSALY terminal installations were eventually set up all over the world. The first was installed in the Pentagon building rather than the White House, which had an extension line, as the US President Franklin Roosevelt knew of the British Prime Minister Winston Churchill's insistence that he be able to call at any time of the day or night. The second was installed 60 metres (200 ft) below street level in the basement of Selfridges department store on Oxford Street, London, close to the US Embassy on Grosvenor Square. The first conference took place on 15 July 1943, and it was used by both General Dwight D. Eisenhower as the commander of SHAEF, and Churchill, before extensions were installed to the Embassy, 10 Downing Street and the Cabinet War Rooms.^[1] One was installed in a ship and followed General Douglas MacArthur during his South Pacific campaigns. In total during WW2, the system supported about 3,000 high-level telephone conferences.

The installation and maintenance of all SIGSALY machines was undertaken by the specially formed and vetted members of the 805th Signal Service Company of the US Army Signal Corps. The system was cumbersome, but it worked very effectively. When the Allies invaded Germany, an investigative team discovered that the Germans had recorded significant amounts of traffic from the system, but had erroneously concluded that it was a complex telegraphic encoding system.^[1]^{[ failed verification ]}

Significance

SIGSALY has been credited with a number of "firsts"; this list is taken from (Bennett, 1983):

The first realization of enciphered telephony
The first quantized speech transmission
The first transmission of speech by pulse-code modulation (PCM)
The first use of companded PCM
The first examples of multilevel frequency-shift keying (FSK)
The first useful realization of speech bandwidth compression
The first use of FSK-FDM (Frequency Shift Keying-Frequency Division Multiplex) as a viable transmission method over a fading medium
The first use of a multilevel "eye pattern" to adjust the sampling intervals (a new, and important, instrumentation technique)

Related Research Articles

<span class="mw-page-title-main">Amplitude modulation</span> Radio modulation via wave amplitude

Amplitude modulation (AM) is a modulation technique used in electronic communication, most commonly for transmitting messages with a radio wave. In amplitude modulation, the amplitude of the wave is varied in proportion to that of the message signal, such as an audio signal. This technique contrasts with angle modulation, in which either the frequency of the carrier wave is varied, as in frequency modulation, or its phase, as in phase modulation.

<span class="mw-page-title-main">DTMF</span> Telecommunication signaling system

Dual-tone multi-frequency signaling (DTMF) is a telecommunication signaling system using the voice-frequency band over telephone lines between telephone equipment and other communications devices and switching centers. DTMF was first developed in the Bell System in the United States, and became known under the trademark Touch-Tone for use in push-button telephones supplied to telephone customers, starting in 1963. DTMF is standardized as ITU-T Recommendation Q.23. It is also known in the UK as MF4.

<span class="mw-page-title-main">Frequency modulation</span> Encoding of information in a carrier wave by varying the instantaneous frequency of the wave

Frequency modulation (FM) is the encoding of information in a carrier wave by varying the instantaneous frequency of the wave. The technology is used in telecommunications, radio broadcasting, signal processing, and computing.

In electronics and telecommunications, modulation is the process of varying one or more properties of a periodic waveform, called the carrier signal, with a separate signal called the modulation signal that typically contains information to be transmitted. For example, the modulation signal might be an audio signal representing sound from a microphone, a video signal representing moving images from a video camera, or a digital signal representing a sequence of binary digits, a bitstream from a computer.

In radio communications, single-sideband modulation (SSB) or single-sideband suppressed-carrier modulation (SSB-SC) is a type of modulation used to transmit information, such as an audio signal, by radio waves. A refinement of amplitude modulation, it uses transmitter power and bandwidth more efficiently. Amplitude modulation produces an output signal the bandwidth of which is twice the maximum frequency of the original baseband signal. Single-sideband modulation avoids this bandwidth increase, and the power wasted on a carrier, at the cost of increased device complexity and more difficult tuning at the receiver.

A vocoder is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation.

<span class="mw-page-title-main">Baseband</span> Range of frequencies occupied by an unmodulated signal

In telecommunications and signal processing, baseband is the range of frequencies occupied by a signal that has not been modulated to higher frequencies. Baseband signals typically originate from transducers, converting some other variable into an electrical signal. For example, the electronic output of a microphone is a baseband signal that is analogous to the applied voice audio. In conventional analog radio broadcasting, the baseband audio signal is used to modulate an RF carrier signal of a much higher frequency.

Delta modulation is an analog-to-digital and digital-to-analog signal conversion technique used for transmission of voice information where quality is not of primary importance. DM is the simplest form of differential pulse-code modulation (DPCM) where the difference between successive samples is encoded into n-bit data streams. In delta modulation, the transmitted data are reduced to a 1-bit data stream representing either up (↗) or down (↘). Its main features are:

In telecommunication, an eye pattern, also known as an eye diagram, is an oscilloscope display in which a digital signal from a receiver is repetitively sampled and applied to the vertical input (y-axis), while the data rate is used to trigger the horizontal sweep (x-axis). It is so called because, for several types of coding, the pattern looks like a series of eyes between a pair of rails. It is a tool for the evaluation of the combined effects of channel noise, dispersion and intersymbol interference on the performance of a baseband pulse-transmission system. The technique was first used with the WWII SIGSALY secure speech transmission system.

<span class="mw-page-title-main">Frequency-shift keying</span> Data communications modulation protocol

Frequency-shift keying (FSK) is a frequency modulation scheme in which digital information is encoded on a carrier signal by periodically shifting the frequency of the carrier between several discrete frequencies. The technology is used for communication systems such as telemetry, weather balloon radiosondes, caller ID, garage door openers, and low frequency radio transmission in the VLF and ELF bands. The simplest FSK is binary FSK, in which the carrier is shifted between two discrete frequencies to transmit binary information.

In telecommunications, a scrambler is a device that transposes or inverts signals or otherwise encodes a message at the sender's side to make the message unintelligible at a receiver not equipped with an appropriately set descrambling device. Whereas encryption usually refers to operations carried out in the digital domain, scrambling usually refers to operations carried out in the analog domain. Scrambling is accomplished by the addition of components to the original signal or the changing of some important component of the original signal in order to make extraction of the original signal difficult. Examples of the latter might include removing or changing vertical or horizontal sync pulses in television signals; televisions will not be able to display a picture from such a signal. Some modern scramblers are actually encryption devices, the name remaining due to the similarities in use, as opposed to internal operation.

Angle modulation is a class of carrier modulation that is used in telecommunications transmission systems. The class comprises frequency modulation (FM) and phase modulation (PM), and is based on altering the frequency or the phase, respectively, of a carrier signal to encode the message signal. This contrasts with varying the amplitude of the carrier, practiced in amplitude modulation (AM) transmission, the earliest of the major modulation methods used widely in early radio broadcasting.

The International Telecommunication Union uses an internationally agreed system for classifying radio frequency signals. Each type of radio emission is classified according to its bandwidth, method of modulation, nature of the modulating signal, and type of information transmitted on the carrier signal. It is based on characteristics of the signal, not on the transmitter used.

Harmonic Vector Excitation Coding, abbreviated as HVXC is a speech coding algorithm specified in MPEG-4 Part 3 standard for very low bit rate speech coding. HVXC supports bit rates of 2 and 4 kbit/s in the fixed and variable bit rate mode and sampling frequency of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique. The total algorithmic delay for the encoder and decoder is 36 ms.

Secure voice is a term in cryptography for the encryption of voice communication over a range of communication types such as radio, telephone or IP.

In a digitally modulated signal or a line code, symbol rate, modulation rate or baud rate is the number of symbol changes, waveform changes, or signaling events across the transmission medium per unit of time. The symbol rate is measured in baud (Bd) or symbols per second. In the case of a line code, the symbol rate is the pulse rate in pulses per second. Each symbol can represent or convey one or several bits of data. The symbol rate is related to the gross bit rate, expressed in bits per second.

Homer W. Dudley was an American pioneering electronic and acoustic engineer who created the first electronic voice synthesizer for Bell Labs in the 1930s and led the development of a method of sending secure voice transmissions during World War Two. His awards include the Franklin Institute's Stuart Ballantine Medal (1965).

NXDN stands for Next Generation Digital Narrowband, and is an open standard for public land mobile radio systems; that is, systems of two-way radios (transceivers) for bidirectional person-to-person voice communication. It was developed jointly by Icom Incorporated and Kenwood Corporation as an advanced digital system using FSK modulation that supports encrypted transmission and data as well as voice transmission. Like other land mobile systems, NXDN systems use the VHF and UHF frequency bands. It is also used as a niche mode in amateur radio.

Pulse-code modulation (PCM) is a method used to digitally represent analog signals. It is the standard form of digital audio in computers, compact discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps.

References

1 2 3 4 5 The SIGSALY Story, by Patrick D. Weadon, National Security Agency/Central Security Service
1 2 "Vox Ex Machina". 99% Invisible. Retrieved 2022-03-28.
↑ Liat Clark; Ian Steadman. "Turing's achievements: codebreaking, AI and the birth of computer science" . Retrieved 12 February 2013.
1 2 Jon D. Paul. "Rebuilding a Piece of the First Digital Voice Scrambler". IEEE Spectrum. 2019.
↑ Boone, J.V.; Peterson, R.R. (July 2000). The Start of the Digital Revolution: SIGSALY, Secure Digital Voice Communications in World War II (pdf). National Security Agency.

Notes

William R. Bennett, Fellow, IEEE, "Secret Telephony as a Historical Example of Spread-Spectrum Communications," IEEE Transactions on Communications, Vol. COM-31, No. 1, January 1983, 99.
Weadon, P., "Sigsaly Story", National Security Agency Central Security Service, January 2009

External links

"The SIGSALY story"
"The start of the digital revolution"
"1941 'Secret telephony' US patent 3967067 on the system."
Images and description of SIGSALY at the Wayback Machine (archived January 27, 2007)
Ralph Miller is credited with a number of the related patents documented in Volume II of A History of Engineering and Science in the Bell System.
Sample of the KO-6 vocoder encoded voice, a later system similar to SIGSALY in performance: The Secret History of the Vocoder

This article, or an earlier version of it, incorporates material from Greg Goebel's Codes, Ciphers, & Codebreaking.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[autogenerated1-1] 1 2 3 4 5 The SIGSALY Story, by Patrick D. Weadon, National Security Agency/Central Security Service

[:0-2] 1 2 "Vox Ex Machina". 99% Invisible. Retrieved 2022-03-28.

[3] Liat Clark; Ian Steadman. "Turing's achievements: codebreaking, AI and the birth of computer science" . Retrieved 12 February 2013.

[jpaul-4] 1 2 Jon D. Paul. "Rebuilding a Piece of the First Digital Voice Scrambler". IEEE Spectrum. 2019.

[5] Boone, J.V.; Peterson, R.R. (July 2000). The Start of the Digital Revolution: SIGSALY, Secure Digital Voice Communications in World War II (pdf). National Security Agency.

[1]

[2]

[3]

[4]

[5]