Code (cryptography)

Last updated
A portion of the "Zimmermann Telegram" as decrypted by British Naval Intelligence codebreakers. The word Arizona was not in the German codebook and had therefore to be split into phonetic syllables. Ztel2.jpg
A portion of the "Zimmermann Telegram" as decrypted by British Naval Intelligence codebreakers. The word Arizona was not in the German codebook and had therefore to be split into phonetic syllables.
Partially burnt pages from a World War II Soviet KGB two-part codebook KGB Victory Code pages at National Cryptologic Museum.agr.jpg
Partially burnt pages from a World War II Soviet KGB two-part codebook

In cryptology, a code is a method used to encrypt a message that operates at the level of meaning; that is, words or phrases are converted into something else. A code might transform "change" into "CVGDK" or "cocktail lounge". The U.S. National Security Agency defined a code as "A substitution cryptosystem in which the plaintext elements are primarily words, phrases, or sentences, and the code equivalents (called "code groups") typically consist of letters or digits (or both) in otherwise meaningless combinations of identical length." [1] :Vol I,p. 12 A codebook is needed to encrypt, and decrypt the phrases or words.

Contents

By contrast, ciphers encrypt messages at the level of individual letters, or small groups of letters, or even, in modern ciphers, individual bits. Messages can be transformed first by a code, and then by a cipher. [2] Such multiple encryption, or "superencryption" aims to make cryptanalysis more difficult.

Another comparison between codes and ciphers is that a code typically represents a letter or groups of letters directly without the use of mathematics. As such the numbers are configured to represent these three values: 1001 = A, 1002 = B, 1003 = C, ... . The resulting message, then would be 1001 1002 1003 to communicate ABC. Ciphers, however, utilize a mathematical formula to represent letters or groups of letters. For example, A = 1, B = 2, C = 3, ... . Thus the message ABC results by multiplying each letter's value by 13. The message ABC, then would be 13 26 39.

Codes have a variety of drawbacks, including susceptibility to cryptanalysis and the difficulty of managing the cumbersome codebooks, so ciphers are now the dominant technique in modern cryptography.

In contrast, because codes are representational, they are not susceptible to mathematical analysis of the individual codebook elements. In the example, the message 13 26 39 can be cracked by dividing each number by 13 and then ranking them alphabetically. However, the focus of codebook cryptanalysis is the comparative frequency of the individual code elements matching the same frequency of letters within the plaintext messages using frequency analysis. In the above example, the code group, 1001, 1002, 1003, might occur more than once and that frequency might match the number of times that ABC occurs in plain text messages.

(In the past, or in non-technical contexts, code and cipher are often used to refer to any form of encryption).

One- and two-part codes

Codes are defined by "codebooks" (physical or notional), which are dictionaries of codegroups listed with their corresponding plaintext. Codes originally had the codegroups assigned in 'plaintext order' for convenience of the code designed, or the encoder. For example, in a code using numeric code groups, a plaintext word starting with "a" would have a low-value group, while one starting with "z" would have a high-value group. The same codebook could be used to "encode" a plaintext message into a coded message or "codetext", and "decode" a codetext back into plaintext message.

In order to make life more difficult for codebreakers, codemakers designed codes with no predictable relationship between the codegroups and the ordering of the matching plaintext. In practice, this meant that two codebooks were now required, one to find codegroups for encoding, the other to look up codegroups to find plaintext for decoding. Such "two-part" codes required more effort to develop, and twice as much effort to distribute (and discard safely when replaced), but they were harder to break. The Zimmermann Telegram in January 1917 used the German diplomatic "0075" two-part code system which contained upwards of 10,000 phrases and individual words. [3]

One-time code

A one-time code is a prearranged word, phrase or symbol that is intended to be used only once to convey a simple message, often the signal to execute or abort some plan or confirm that it has succeeded or failed. One-time codes are often designed to be included in what would appear to be an innocent conversation. Done properly they are almost impossible to detect, though a trained analyst monitoring the communications of someone who has already aroused suspicion might be able to recognize a comment like "Aunt Bertha has gone into labor" as having an ominous meaning. Famous example of one time codes include:

Sometimes messages are not prearranged and rely on shared knowledge hopefully known only to the recipients. An example is the telegram sent to U.S. President Harry Truman, then at the Potsdam Conference to meet with Soviet premier Joseph Stalin, informing Truman of the first successful test of an atomic bomb.

"Operated on this morning. Diagnosis not yet complete but results seem satisfactory and already exceed expectations. Local press release necessary as interest extends great distance. Dr. Groves pleased. He returns tomorrow. I will keep you posted."

See also one-time pad, an unrelated cypher algorithm

Idiot code

An idiot code is a code that is created by the parties using it. This type of communication is akin to the hand signals used by armies in the field.

Example: Any sentence where 'day' and 'night' are used means 'attack'. The location mentioned in the following sentence specifies the location to be attacked.

An early use of the term appears to be by George Perrault, a character in the science fiction book Friday [5] by Robert A. Heinlein:

The simplest sort [of code] and thereby impossible to break. The first ad told the person or persons concerned to carry out number seven or expect number seven or it said something about something designated as seven. This one says the same with respect to code item number ten. But the meaning of the numbers cannot be deduced through statistical analysis because the code can be changed long before a useful statistical universe can be reached. It's an idiot code... and an idiot code can never be broken if the user has the good sense not to go too often to the well.

Terrorism expert Magnus Ranstorp said that the men who carried out the September 11 attacks on the United States used basic e-mail and what he calls "idiot code" to discuss their plans. [6]

Cryptanalysis of codes

While solving a monoalphabetic substitution cipher is easy, solving even a simple code is difficult. Decrypting a coded message is a little like trying to translate a document written in a foreign language, with the task basically amounting to building up a "dictionary" of the codegroups and the plaintext words they represent.

One fingerhold on a simple code is the fact that some words are more common than others, such as "the" or "a" in English. In telegraphic messages, the codegroup for "STOP" (i.e., end of sentence or paragraph) is usually very common. This helps define the structure of the message in terms of sentences, if not their meaning, and this is cryptanalytically useful.

Further progress can be made against a code by collecting many codetexts encrypted with the same code and then using information from other sources

For example, a particular codegroup found almost exclusively in messages from a particular army and nowhere else might very well indicate the commander of that army. A codegroup that appears in messages preceding an attack on a particular location may very well stand for that location.

Cribs can be an immediate giveaway to the definitions of codegroups. As codegroups are determined, they can gradually build up a critical mass, with more and more codegroups revealed from context and educated guesswork. One-part codes are more vulnerable to such educated guesswork than two-part codes, since if the codenumber "26839" of a one-part code is determined to stand for "bulldozer", then the lower codenumber "17598" will likely stand for a plaintext word that starts with "a" or "b". At least, for simple one part codes.

Various tricks can be used to "plant" or "sow" information into a coded message, for example by executing a raid at a particular time and location against an enemy, and then examining code messages sent after the raid. Coding errors are a particularly useful fingerhold into a code; people reliably make errors, sometimes disastrous ones. Planting data and exploiting errors works against ciphers as well.

Constructing a new code is like building a new language and writing a dictionary for it; it was an especially big job before computers. If a code is compromised, the entire task must be done all over again, and that means a lot of work for both cryptographers and the code users. In practice, when codes were in widespread use, they were usually changed on a periodic basis to frustrate codebreakers, and to limit the useful life of stolen or copied codebooks.

Once codes have been created, codebook distribution is logistically clumsy, and increases chances the code will be compromised. There is a saying that "Three people can keep a secret if two of them are dead," (Benjamin Franklin - Wikiquote) and though it may be something of an exaggeration, a secret becomes harder to keep if it is shared among several people. Codes can be thought reasonably secure if they are only used by a few careful people, but if whole armies use the same codebook, security becomes much more difficult.

In contrast, the security of ciphers is generally dependent on protecting the cipher keys. Cipher keys can be stolen and people can betray them, but they are much easier to change and distribute.

Superencipherment

It was common to encipher a message after first encoding it, to increase the difficulty of cryptanalysis. With a numerical code, this was commonly done with an "additive" - simply a long key number which was digit-by-digit added to the code groups, modulo 10. Unlike the codebooks, additives would be changed frequently. The famous Japanese Navy code, JN-25, was of this design.

Related Research Articles

<span class="mw-page-title-main">Cipher</span> Algorithm for encrypting and decrypting information

In cryptography, a cipher is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is encipherment. To encipher or encode is to convert information into cipher or code. In common parlance, "cipher" is synonymous with "code", as they are both a set of steps that encrypt a message; however, the concepts are distinct in cryptography, especially classical cryptography.

<span class="mw-page-title-main">Cryptanalysis</span> Study of analyzing information systems in order to discover their hidden aspects

Cryptanalysis refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic security systems and gain access to the contents of encrypted messages, even if the cryptographic key is unknown.

<span class="mw-page-title-main">One-time pad</span> Encryption technique

In cryptography, the one-time pad (OTP) is an encryption technique that cannot be cracked, but requires the use of a single-use pre-shared key that is larger than or equal to the size of the message being sent. In this technique, a plaintext is paired with a random secret key. Then, each bit or character of the plaintext is encrypted by combining it with the corresponding bit or character from the pad using modular addition.

In cryptography, a substitution cipher is a method of encrypting in which units of plaintext are replaced with the ciphertext, in a defined manner, with the help of a key; the "units" may be single letters, pairs of letters, triplets of letters, mixtures of the above, and so forth. The receiver deciphers the text by performing the inverse substitution process to extract the original message.

<span class="mw-page-title-main">Transposition cipher</span> Method of encryption

In cryptography, a transposition cipher is a method of encryption which scrambles the positions of characters (transposition) without changing the characters themselves. Transposition ciphers reorder units of plaintext according to a regular system to produce a ciphertext which is a permutation of the plaintext. They differ from substitution ciphers, which do not change the position of units of plaintext but instead change the units themselves. Despite the difference between transposition and substitution operations, they are often combined, as in historical ciphers like the ADFGVX cipher or complex high-quality encryption methods like the modern Advanced Encryption Standard (AES).

<span class="mw-page-title-main">Caesar cipher</span> Simple and widely known encryption technique

In cryptography, a Caesar cipher, also known as Caesar's cipher, the shift cipher, Caesar's code, or Caesar shift, is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number of positions down the alphabet. For example, with a left shift of 3, D would be replaced by A, E would become B, and so on. The method is named after Julius Caesar, who used it in his private correspondence.

<span class="mw-page-title-main">Symmetric-key algorithm</span> Algorithm

Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both the encryption of plaintext and the decryption of ciphertext. The keys may be identical, or there may be a simple transformation to go between the two keys. The keys, in practice, represent a shared secret between two or more parties that can be used to maintain a private information link. The requirement that both parties have access to the secret key is one of the main drawbacks of symmetric-key encryption, in comparison to public-key encryption. However, symmetric-key encryption algorithms are usually better for bulk encryption. With exception of the one-time pad they have a smaller key size, which means less storage space and faster transmission. Due to this, asymmetric-key encryption is often used to exchange the secret key for symmetric-key encryption.

A chosen-plaintext attack (CPA) is an attack model for cryptanalysis which presumes that the attacker can obtain the ciphertexts for arbitrary plaintexts. The goal of the attack is to gain information that reduces the security of the encryption scheme.

<span class="mw-page-title-main">Vigenère cipher</span> Simple type of polyalphabetic encryption system

The Vigenère cipher is a method of encrypting alphabetic text where each letter of the plaintext is encoded with a different Caesar cipher, whose increment is determined by the corresponding letter of another text, the key.

The poem code is a simple and insecure, cryptographic method which was used during World War II by the British Special Operations Executive (SOE) to communicate with their agents in Nazi-occupied Europe.

<span class="mw-page-title-main">Frequency analysis</span> Study of the frequency of letters or groups of letters in a ciphertext

In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers.

<span class="mw-page-title-main">Playfair cipher</span> Early block substitution cipher

The Playfair cipher or Playfair square or Wheatstone–Playfair cipher is a manual symmetric encryption technique and was the first literal digram substitution cipher. The scheme was invented in 1854 by Charles Wheatstone, but bears the name of Lord Playfair for promoting its use.

<span class="mw-page-title-main">Ciphertext</span> Encrypted information

In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext that is unreadable by a human or computer without the proper cipher to decrypt it. This process prevents the loss of sensitive information via hacking. Decryption, the inverse of encryption, is the process of turning ciphertext into readable plaintext. Ciphertext is not to be confused with codetext because the latter is a result of a code, not a cipher.

<span class="mw-page-title-main">Book cipher</span> Encryption and decryption method

A book cipher is a cipher in which each word or letter in the plaintext of a message is replaced by some code that locates it in another text, the key.

In classical cryptography, the running key cipher is a type of polyalphabetic substitution cipher in which a text, typically from a book, is used to provide a very long keystream. The earliest description of such a cipher was given in 1892 by French mathematician Arthur Joseph Hermann. Usually, the book to be used would be agreed ahead of time, while the passage to be used would be chosen randomly for each message and secretly indicated somewhere in the message.

The known-plaintext attack (KPA) is an attack model for cryptanalysis where the attacker has access to both the plaintext and its encrypted version (ciphertext). These can be used to reveal secret keys and code books. The term "crib" originated at Bletchley Park, the British World War II decryption operation, where it was defined as:

A plain language passage of any length, usually obtained by solving one or more cipher or code messages, and occurring or believed likely to occur in a different cipher or code message, which it may provide a means of solving.

<span class="mw-page-title-main">Codebook</span> Book for ciphers

A codebook is a type of document used for gathering and storing cryptography codes. Originally, codebooks were often literally books, but today "codebook" is a byword for the complete record of a series of codes, regardless of physical format.

The Two-square cipher, also called double Playfair, is a manual symmetric encryption technique. It was developed to ease the cumbersome nature of the large encryption/decryption matrix used in the four-square cipher while still being slightly stronger than the single-square Playfair cipher.

With the rise of easily-intercepted wireless telegraphy, codes and ciphers were used extensively in World War I. The decoding by British Naval intelligence of the Zimmermann telegram helped bring the United States into the war.

The cipher system that the Uesugi are said to have used is a simple substitution usually known as a Polybius square or "checkerboard." The i-ro-ha alphabet contains forty-eight letters, so a seven-by-seven square is used, with one of the cells left blank. The rows and columns are labeled with a number or a letter. In the table below, the numbers start in the top left, as does the i-ro-ha alphabet. In practice these could start in any corner.

References

  1. A History of U.S. Communications Security; the David G. Boak Lectures Archived 2016-09-18 at the Wayback Machine , National Security Agency (NSA), Volumes I, 1973, Volumes II 1981, partially released 2008, additional portions declassified October 14, 2015
  2. "16.2: Substitution Ciphers". Mathematics LibreTexts. 2020-01-22. Archived from the original on 2021-09-19. Retrieved 2021-09-19.
  3. "Zimmermann Telegram: The Original Draft" Archived 2021-04-27 at the Wayback Machine , 2007, Joachim von zur Gathen, "Cryptologia", Volume 31, Issue 1
  4. I Samuel 20:20-22
  5. Friday (1982) by Robert A. Heinlein
  6. Radio Free Europe / Radio Liberty: "Middle East: Islamic Militants Take Jihad To The Internet" By Jeffrey Donovan Archived 2008-01-15 at the Wayback Machine , 16 June 2004.

Sources

See also