Unicity distance

Last updated April 22, 2024

In cryptography, unicity distance is the length of an original ciphertext needed to break the cipher by reducing the number of possible spurious keys to zero in a brute force attack. That is, after trying every possible key, there should be just one decipherment that makes sense, i.e. expected amount of ciphertext needed to determine the key completely, assuming the underlying message has redundancy.^[1]

Consider an attack on the ciphertext string "WNAIW" encrypted using a Vigenère cipher with a five letter key. Conceivably, this string could be deciphered into any other string—RIVER and WATER are both possibilities for certain keys. This is a general rule of cryptanalysis: with no additional information it is impossible to decode this message.

Of course, even in this case, only a certain number of five letter keys will result in English words. Trying all possible keys we will not only get RIVER and WATER, but SXOOS and KHDOP as well. The number of "working" keys will likely be very much smaller than the set of all possible keys. The problem is knowing which of these "working" keys is the right one; the rest are spurious.

Relation with key size and possible plaintexts

In general, given particular assumptions about the size of the key and the number of possible messages, there is an average ciphertext length where there is only one key (on average) that will generate a readable message. In the example above we see only upper case English characters, so if we assume that the plaintext has this form, then there are 26 possible letters for each position in the string. Likewise if we assume five-character upper case keys, there are K=26⁵ possible keys, of which the majority will not "work".

A tremendous number of possible messages, N, can be generated using even this limited set of characters: N = 26^L, where L is the length of the message. However, only a smaller set of them is readable plaintext due to the rules of the language, perhaps M of them, where M is likely to be very much smaller than N. Moreover, M has a one-to-one relationship with the number of keys that work, so given K possible keys, only K × (M/N) of them will "work". One of these is the correct key, the rest are spurious.

Since M/N gets arbitrarily small as the length L of the message increases, there is eventually some L that is large enough to make the number of spurious keys equal to zero. Roughly speaking, this is the L that makes KM/N=1. This L is the unicity distance.

Relation with key entropy and plaintext redundancy

The unicity distance can equivalently be defined as the minimum amount of ciphertext required to permit a computationally unlimited adversary to recover the unique encryption key.^[1]

The expected unicity distance can then be shown to be:^[1]

U=H(k)/D

where U is the unicity distance, H(k) is the entropy of the key space (e.g. 128 for 2¹²⁸ equiprobable keys, rather less if the key is a memorized pass-phrase). D is defined as the plaintext redundancy in bits per character.

Now an alphabet of 32 characters can carry 5 bits of information per character (as 32 = 2⁵). In general the number of bits of information per character is $log 2 (N)$ , where N is the number of characters in the alphabet and $log 2$ is the binary logarithm. So for English each character can convey $log 2 (26) = 4.7$ bits of information.

However the average amount of actual information carried per character in meaningful English text is only about 1.5 bits per character. So the plain text redundancy is D = 4.7 − 1.5 = 3.2.^[1]

Basically the bigger the unicity distance the better. For a one time pad of unlimited size, given the unbounded entropy of the key space, we have $U=\infty$ , which is consistent with the one-time pad being unbreakable.

Unicity distance of substitution cipher

For a simple substitution cipher, the number of possible keys is $26! = 4.0329 \times 10 26 = 2 88.4$ , the number of ways in which the alphabet can be permuted. Assuming all keys are equally likely, $H (k) = log 2 (26!) = 88.4$ bits. For English text $D = 3.2$ , thus $U = 88.4/3.2 = 28$ .

So given 28 characters of ciphertext it should be theoretically possible to work out an English plaintext and hence the key.

Practical application

Unicity distance is a useful theoretical measure, but it does not say much about the security of a block cipher when attacked by an adversary with real-world (limited) resources. Consider a block cipher with a unicity distance of three ciphertext blocks. Although there is clearly enough information for a computationally unbounded adversary to find the right key (simple exhaustive search), this may be computationally infeasible in practice.

The unicity distance can be increased by reducing the plaintext redundancy. One way to do this is to deploy data compression techniques prior to encryption, for example by removing redundant vowels while retaining readability. This is a good idea anyway, as it reduces the amount of data to be encrypted.

Ciphertexts greater than the unicity distance can be assumed to have only one meaningful decryption. Ciphertexts shorter than the unicity distance may have multiple plausible decryptions. Unicity distance is not a measure of how much ciphertext is required for cryptanalysis,^{[ why? ]} but how much ciphertext is required for there to be only one reasonable solution for cryptanalysis.

Related Research Articles

In cryptography, a block cipher is a deterministic algorithm that operates on fixed-length groups of bits, called blocks. Block ciphers are the elementary building blocks of many cryptographic protocols. They are ubiquitous in the storage and exchange of data, where such data is secured and authenticated via encryption.

In cryptography, a cipher is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is encipherment. To encipher or encode is to convert information into cipher or code. In common parlance, "cipher" is synonymous with "code", as they are both a set of steps that encrypt a message; however, the concepts are distinct in cryptography, especially classical cryptography.

<span class="mw-page-title-main">Cryptanalysis</span> Study of analyzing information systems in order to discover their hidden aspects

Cryptanalysis refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic security systems and gain access to the contents of encrypted messages, even if the cryptographic key is unknown.

Differential cryptanalysis is a general form of cryptanalysis applicable primarily to block ciphers, but also to stream ciphers and cryptographic hash functions. In the broadest sense, it is the study of how differences in information input can affect the resultant difference at the output. In the case of a block cipher, it refers to a set of techniques for tracing differences through the network of transformation, discovering where the cipher exhibits non-random behavior, and exploiting such properties to recover the secret key.

In cryptography, the one-time pad (OTP) is an encryption technique that cannot be cracked, but requires the use of a single-use pre-shared key that is larger than or equal to the size of the message being sent. In this technique, a plaintext is paired with a random secret key. Then, each bit or character of the plaintext is encrypted by combining it with the corresponding bit or character from the pad using modular addition.

In cryptography, a substitution cipher is a method of encrypting in which units of plaintext are replaced with the ciphertext, in a defined manner, with the help of a key; the "units" may be single letters, pairs of letters, triplets of letters, mixtures of the above, and so forth. The receiver deciphers the text by performing the inverse substitution process to extract the original message.

In cryptography, a transposition cipher is a method of encryption which scrambles the positions of characters (transposition) without changing the characters themselves. Transposition ciphers reorder units of plaintext according to a regular system to produce a ciphertext which is a permutation of the plaintext. They differ from substitution ciphers, which do not change the position of units of plaintext but instead change the units themselves. Despite the difference between transposition and substitution operations, they are often combined, as in historical ciphers like the ADFGVX cipher or complex high-quality encryption methods like the modern Advanced Encryption Standard (AES).

In cryptography, linear cryptanalysis is a general form of cryptanalysis based on finding affine approximations to the action of a cipher. Attacks have been developed for block ciphers and stream ciphers. Linear cryptanalysis is one of the two most widely used attacks on block ciphers; the other being differential cryptanalysis.

A chosen-plaintext attack (CPA) is an attack model for cryptanalysis which presumes that the attacker can obtain the ciphertexts for arbitrary plaintexts. The goal of the attack is to gain information that reduces the security of the encryption scheme.

In cryptography, an initialization vector (IV) or starting variable is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to be unpredictable or unique. Randomization is crucial for some encryption schemes to achieve semantic security, a property whereby repeated usage of the scheme under the same key does not allow an attacker to infer relationships between segments of the encrypted message. For block ciphers, the use of an IV is described by the modes of operation.

An autokey cipher is a cipher that incorporates the message into the key. The key is generated from the message in some automated fashion, sometimes by selecting certain letters from the text or, more commonly, by adding a short primer key to the front of the message.

In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext that is unreadable by a human or computer without the proper cipher to decrypt it. This process prevents the loss of sensitive information via hacking. Decryption, the inverse of encryption, is the process of turning ciphertext into readable plaintext. Ciphertext is not to be confused with codetext because the latter is a result of a code, not a cipher.

In cryptography, a ciphertext-only attack (COA) or known ciphertext attack is an attack model for cryptanalysis where the attacker is assumed to have access only to a set of ciphertexts. While the attacker has no channel providing access to the plaintext prior to encryption, in all practical ciphertext-only attacks, the attacker still has some knowledge of the plaintext. For instance, the attacker might know the language in which the plaintext is written or the expected statistical distribution of characters in the plaintext. Standard protocol data and messages are commonly part of the plaintext in many deployed systems, and can usually be guessed or known efficiently as part of a ciphertext-only attack on these systems.

In cryptography, Madryga is a block cipher published in 1984 by W. E. Madryga. It was designed to be easy and efficient for implementation in software. Serious weaknesses have since been found in the algorithm, but it was one of the first encryption algorithms to make use of data-dependent rotations, later used in other ciphers, such as RC5 and RC6.

In cryptanalysis, Kasiski examination is a method of attacking polyalphabetic substitution ciphers, such as the Vigenère cipher. It was first published by Friedrich Kasiski in 1863, but seems to have been independently discovered by Charles Babbage as early as 1846.

Stream ciphers, where plaintext bits are combined with a cipher bit stream by an exclusive-or operation (xor), can be very secure if used properly. However, they are vulnerable to attacks if certain precautions are not followed:

The four-square cipher is a manual symmetric encryption technique. It was invented by the French cryptographer Felix Delastelle.

The Two-square cipher, also called double Playfair, is a manual symmetric encryption technique. It was developed to ease the cumbersome nature of the large encryption/decryption matrix used in the four-square cipher while still being slightly stronger than the single-square Playfair cipher.

The slide attack is a form of cryptanalysis designed to deal with the prevailing idea that even weak ciphers can become very strong by increasing the number of rounds, which can ward off a differential attack. The slide attack works in such a way as to make the number of rounds in a cipher irrelevant. Rather than looking at the data-randomizing aspects of the block cipher, the slide attack works by analyzing the key schedule and exploiting weaknesses in it to break the cipher. The most common one is the keys repeating in a cyclic manner.

In cryptanalysis, attack models or attack types are a classification of cryptographic attacks specifying the kind of access a cryptanalyst has to a system under attack when attempting to "break" an encrypted message generated by the system. The greater the access the cryptanalyst has to the system, the more useful information they can get to utilize for breaking the cypher.

References

1 2 3 4 Alfred J. Menezes; Paul C. van Oorschot; Scott A. Vanstone. "Chapter 7 - Block Ciphers" (PDF). Handbook of Applied Cryptography. p. 246.
↑ Deavours, C.A. (1977). "Unicity Points in Cryptanalysis". Cryptologia . 1 (1): 46–68. doi:10.1080/0161-117791832797. ISSN 0161-1194.

External links

Bruce Schneier: How to Recognize Plaintext (Crypto-Gram Newsletter December 15, 1998)
Unicity Distance computed for common ciphers

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[hac-1] 1 2 3 4 Alfred J. Menezes; Paul C. van Oorschot; Scott A. Vanstone. "Chapter 7 - Block Ciphers" (PDF). Handbook of Applied Cryptography. p. 246.

[2] Deavours, C.A. (1977). "Unicity Points in Cryptanalysis". Cryptologia . 1 (1): 46–68. doi:10.1080/0161-117791832797. ISSN 0161-1194.

[1]

[2]