Polygraphic substitution

Last updated

Polygraphic substitution is a cipher in which a uniform substitution is performed on blocks of letters. When the length of the block is specifically known, more precise terms are used: for instance, a cipher in which pairs of letters are substituted is bigraphic.

Contents

As a concept, polygraphic substitution contrasts with monoalphabetic (or simple) substitutions in which individual letters are uniformly substituted, or polyalphabetic substitutions in which individual letters are substituted in different ways depending on their position in the text. In theory, there is some overlap in these definitions; one could conceivably consider a Vigenère cipher with an eight-letter key to be an octographic substitution. In practice, this is not a useful observation since it is far more fruitful to consider it to be a polyalphabetic substitution cipher.

Specific ciphers

In 1563, Giambattista della Porta devised the first biographic substitution. However, it was nothing more than a matrix of symbols. In practice, it would have been all but impossible to memorize, and carrying around the table would lead to risks of falling into enemy hands.

In 1854, Charles Wheatstone came up with the Playfair cipher, a keyword-based system that could be performed on paper in the field. This was followed up over the next fifty years with the closely related four-square and two-square ciphers, which are slightly more cumbersome but offer slightly better security.

In 1929, Lester S. Hill developed the Hill cipher, which uses matrix algebra to encrypt blocks of any desired length. However, encryption is very difficult to perform by hand for any sufficiently large block size, although it has been implemented by machine or computer. This is therefore on the frontier between classical and modern cryptography.

Cryptanalysis of general polygraphic substitutions

Polygraphic systems do provide a significant improvement in security over monoalphabetic substitutions. Given an individual letter 'E' in a message, it could be encrypted using any of 52 instructions depending on its location and neighbors, which can be used to great advantage to mask the frequency of individual letters. However, the security boost is limited; while it generally requires a larger sample of text to crack, it can still be done by hand.

One can identify a polygraphically-encrypted text by performing a frequency chart of polygrams and not merely of individual letters. These can be compared to the frequency of plaintext English. The distribution of digrams is even more stark than individual letters. For example, the six most common letters in English (23%) represent approximately half of English plaintext, but it takes only the most frequent 8% of the 676 digrams to achieve the same potency. In addition, even in a plaintext many thousands of characters long, one would expect that nearly half of the digrams would not occur, or only barely. In addition, looking over the text one would expect to see a fairly regular scattering of repeated text in multiples of the block length and relatively few that are not multiples.

Cracking a code identified as polygraphic is similar to cracking a general monoalphabetic substitution except with a larger 'alphabet'. One identifies the most frequent polygrams, experiments with replacing them with common plaintext polygrams, and attempts to build up common words, phrases, and finally meaning. Naturally, if the investigation led the cryptanalyst to suspect that a code was of a specific type, like a Playfair or order-2 Hill cipher, then they could use a more specific attack.

See also

Related Research Articles

Cipher Algorithm for encrypting and decrypting information

In cryptography, a cipher is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is encipherment. To encipher or encode is to convert information into cipher or code. In common parlance, "cipher" is synonymous with "code", as they are both a set of steps that encrypt a message; however, the concepts are distinct in cryptography, especially classical cryptography.

Cryptanalysis Study of analyzing information systems in order to discover their hidden aspects

Cryptanalysis refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic security systems and gain access to the contents of encrypted messages, even if the cryptographic key is unknown.

In cryptography, a substitution cipher is a method of encrypting in which units of plaintext are replaced with the ciphertext, in a defined manner, with the help of a key; the "units" may be single letters, pairs of letters, triplets of letters, mixtures of the above, and so forth. The receiver deciphers the text by performing the inverse substitution process to extract the original message.

In cryptography, a transposition cipher is a method of encryption which scrambles the positions of characters (transposition) without changing the characters themselves. Transposition ciphers reorder units of plaintext according to a regular system to produce a ciphertext which is a permutation of the plaintext. They differ from substitution ciphers, which do not change the position of units of plaintext but instead change the units themselves. Despite the difference between transposition and substitution operations, they are often combined, as in historical ciphers like the ADFGVX cipher or complex high-quality encryption methods like the modern Advanced Encryption Standard (AES).

Caesar cipher Simple and widely known encryption technique

In cryptography, a Caesar cipher, also known as Caesar's cipher, the shift cipher, Caesar's code or Caesar shift, is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number of positions down the alphabet. For example, with a left shift of 3, D would be replaced by A, E would become B, and so on. The method is named after Julius Caesar, who used it in his private correspondence.

Tabula recta Fundamental tool in cryptography

In cryptography, the tabula recta is a square table of alphabets, each row of which is made by shifting the previous one to the left. The term was invented by the German author and monk Johannes Trithemius in 1508, and used in his Trithemius cipher.

In cryptography, coincidence counting is the technique of putting two texts side-by-side and counting the number of times that identical letters appear in the same position in both texts. This count, either as a ratio of the total or normalized by dividing by the expected count for a random source model, is known as the index of coincidence, or IC for short.

Frequency analysis Study of the frequency of letters or groups of letters in a ciphertext

In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers.

Playfair cipher Early block substitution cipher

The Playfair cipher or Playfair square or Wheatstone–Playfair cipher is a manual symmetric encryption technique and was the first literal digram substitution cipher. The scheme was invented in 1854 by Charles Wheatstone, but bears the name of Lord Playfair for promoting its use.

Ciphertext Encrypted information

In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext that is unreadable by a human or computer without the proper cipher to decrypt it. This process prevents the loss of sensitive information via hacking. Decryption, the inverse of encryption, is the process of turning ciphertext into readable plaintext. Ciphertext is not to be confused with codetext because the latter is a result of a code, not a cipher.

Rotor machine

In cryptography, a rotor machine is an electro-mechanical stream cipher device used for encrypting and decrypting messages. Rotor machines were the cryptographic state-of-the-art for a prominent period of history; they were in widespread use in the 1920s–1970s. The most famous example is the German Enigma machine, the output of which was deciphered by the Allies during World War II, producing intelligence code-named Ultra.

In classical cryptography, the running key cipher is a type of polyalphabetic substitution cipher in which a text, typically from a book, is used to provide a very long keystream. Usually, the book to be used would be agreed ahead of time, while the passage to be used would be chosen randomly for each message and secretly indicated somewhere in the message.

The affine cipher is a type of monoalphabetic substitution cipher, where each letter in an alphabet is mapped to its numeric equivalent, encrypted using a simple mathematical function, and converted back to a letter. The formula used means that each letter encrypts to one other letter, and back again, meaning the cipher is essentially a standard substitution cipher with a rule governing which letter goes to which. As such, it has the weaknesses of all substitution ciphers. Each letter is enciphered with the function (ax + b) mod 26, where b is the magnitude of the shift.

Polybius square Type of code

The Polybius square, also known as the Polybius checkerboard, is a device invented by the ancient Greeks Cleoxenus and Democleitus, and made famous by the historian and scholar Polybius. The device is used for fractionating plaintext characters so that they can be represented by a smaller set of symbols, which is useful for telegraphy, steganography, and cryptography. The device was originally used for fire signalling, allowing for the coded transmission of any message, not just a finite amount of predetermined options as was the convention before.

In cryptography, a classical cipher is a type of cipher that was used historically but for the most part, has fallen into disuse. In contrast to modern cryptographic algorithms, most classical ciphers can be practically computed and solved by hand. However, they are also usually very simple to break with modern technology. The term includes the simple systems used since Greek and Roman times, the elaborate Renaissance ciphers, World War II cryptography such as the Enigma machine and beyond.

In cryptanalysis, Kasiski examination is a method of attacking polyalphabetic substitution ciphers, such as the Vigenère cipher. It was first published by Friedrich Kasiski in 1863, but seems to have been independently discovered by Charles Babbage as early as 1846.

Code (cryptography) Method used to encrypt a message

In cryptology, a code is a method used to encrypt a message that operates at the level of meaning; that is, words or phrases are converted into something else. A code might transform "change" into "CVGDK" or "cocktail lounge". The U.S. National Security Agency defined a code as "A substitution cryptosystem in which the plaintext elements are primarily words, phrases, or sentences, and the code equivalents typically consist of letters or digits in otherwise meaningless combinations of identical length." A codebook is needed to encrypt, and decrypt the phrases or words.

The four-square cipher is a manual symmetric encryption technique. It was invented by the French cryptographer Felix Delastelle.

The Two-square cipher, also called double Playfair, is a manual symmetric encryption technique. It was developed to ease the cumbersome nature of the large encryption/decryption matrix used in the four-square cipher while still being slightly stronger than the single-square Playfair cipher.

The following outline is provided as an overview of and topical guide to cryptography: