Hill cipher

Last updated
Hill's cipher machine, from figure 4 of the patent Hill's message protector.png
Hill's cipher machine, from figure 4 of the patent

In classical cryptography, the Hill cipher is a polygraphic substitution cipher based on linear algebra. Invented by Lester S. Hill in 1929, it was the first polygraphic cipher in which it was practical (though barely) to operate on more than three symbols at once.

Contents

The following discussion assumes an elementary knowledge of matrices.

Encryption

Each letter is represented by a number modulo 26. Though this is not an essential feature of the cipher, this simple scheme is often used:

LetterABCDEFGHIJKLMNOPQRSTUVWXYZ
Number012345678910111213141516171819202122232425

To encrypt a message, each block of n letters (considered as an n-component vector) is multiplied by an invertible n×n matrix, against modulus 26. To decrypt the message, each block is multiplied by the inverse of the matrix used for encryption.

The matrix used for encryption is the cipher key, and it should be chosen randomly from the set of invertible n×n matrices (modulo 26). The cipher can, of course, be adapted to an alphabet with any number of letters; all arithmetic just needs to be done modulo the number of letters instead of modulo 26.

Consider the message 'ACT', and the key below (or GYB/NQK/URP in letters):

Since 'A' is 0, 'C' is 2 and 'T' is 19, the message is the vector:

Thus the enciphered vector is given by:

which corresponds to a ciphertext of 'POH'. Now, suppose that our message is instead 'CAT', or:

This time, the enciphered vector is given by:

which corresponds to a ciphertext of 'FIN'. Every letter has changed. The Hill cipher has achieved Shannon's diffusion, and an n-dimensional Hill cipher can diffuse fully across n symbols at once.

Decryption

In order to decrypt, we turn the ciphertext back into a vector, then simply multiply by the inverse matrix of the key matrix (IFK/VIV/VMI in letters). We find that, modulo 26, the inverse of the matrix used in the previous example is:

Taking the previous example ciphertext of 'POH', we get:

which gets us back to 'ACT', as expected.

One complications exist in picking the encrypting matrix:

  1. Not all matrices have an inverse. The matrix will have an inverse if and only if its determinant is inversible modulo n, where n is the modular base.

Thus, if we work modulo 26 as above, the determinant must be nonzero, and must not be divisible by 2 or 13. If the determinant is 0, or has common factors with the modular base, then the matrix cannot be used in the Hill cipher, and another matrix must be chosen (otherwise it will not be possible to decrypt). Fortunately, matrices which satisfy the conditions to be used in the Hill cipher are fairly common.

For our example key matrix:

So, modulo 26, the determinant is 25. Since and , 25 has no common factors with 26, and this matrix can be used for the Hill cipher.

The risk of the determinant having common factors with the modulus can be eliminated by making the modulus prime. Consequently, a useful variant of the Hill cipher adds 3 extra symbols (such as a space, a period and a question mark) to increase the modulus to 29.

Example

Let

be the key and suppose the plaintext message is 'HELP'. Then this plaintext is represented by two pairs

Then we compute

and

and continue encryption as follows:

The matrix K is invertible, hence exists such that . The inverse of K can be computed by using the formula

This formula still holds after a modular reduction if a modular multiplicative inverse is used to compute . Hence in this case, we compute

Then we compute

and

Therefore,

.

Security

The basic Hill cipher is vulnerable to a known-plaintext attack because it is completely linear. An opponent who intercepts plaintext/ciphertext character pairs can set up a linear system which can (usually) be easily solved; if it happens that this system is indeterminate, it is only necessary to add a few more plaintext/ciphertext pairs. Calculating this solution by standard linear algebra algorithms then takes very little time.

While matrix multiplication alone does not result in a secure cipher it is still a useful step when combined with other non-linear operations, because matrix multiplication can provide diffusion. For example, an appropriately chosen matrix can guarantee that small differences before the matrix multiplication will result in large differences after the matrix multiplication. Indeed, some modern ciphers use a matrix multiplication step to provide diffusion. For example, the MixColumns step in AES is a matrix multiplication. The function g in Twofish is a combination of non-linear S-boxes with a carefully chosen matrix multiplication (MDS).

Key space size

The key space is the set of all possible keys. The key space size is the number of possible keys. The effective key size, in number of bits, is the binary logarithm of the key space size.

There are matrices of dimension n × n. Thus or about is an upper bound on the key size of the Hill cipher using n × n matrices. This is only an upper bound because not every matrix is invertible and thus usable as a key. The number of invertible matrices can be computed via the Chinese Remainder Theorem. I.e., a matrix is invertible modulo 26 if and only if it is invertible both modulo 2 and modulo 13. The number of invertible n × n matrices modulo 2 is equal to the order of the general linear group GL(n,Z2). It is

Equally, the number of invertible matrices modulo 13 (i.e. the order of GL(n,Z13)) is

The number of invertible matrices modulo 26 is the product of those two numbers. Hence it is

Additionally it seems to be prudent to avoid too many zeroes in the key matrix, since they reduce diffusion. The net effect is that the effective keyspace of a basic Hill cipher is about . For a 5 × 5 Hill cipher, that is about 114 bits. Of course, key search is not the most efficient known attack.

Mechanical implementation

When operating on 2 symbols at once, a Hill cipher offers no particular advantage over Playfair or the bifid cipher, and in fact is weaker than either, and slightly more laborious to operate by pencil-and-paper. As the dimension increases, the cipher rapidly becomes infeasible for a human to operate by hand.

A Hill cipher of dimension 6 was implemented mechanically. Hill and a partner were awarded a patent ( U.S. patent 1,845,947 ) for this device, which performed a 6 × 6 matrix multiplication modulo 26 using a system of gears and chains.

Unfortunately the gearing arrangements (and thus the key) were fixed for any given machine, so triple encryption was recommended for security: a secret nonlinear step, followed by the wide diffusive step from the machine, followed by a third secret nonlinear step. (The much later Even–Mansour cipher also uses an unkeyed diffusive middle step). Such a combination was actually very powerful for 1929, and indicates that Hill apparently understood the concepts of a meet-in-the-middle attack as well as confusion and diffusion. Unfortunately, his machine did not sell.[ citation needed ]

See also

Other practical "pencil-and-paper" polygraphic ciphers include:

Related Research Articles

In mathematics, the determinant is a scalar value that is a certain function of the entries of a square matrix. The determinant of a matrix A is commonly denoted det(A), det A, or |A|. Its value characterizes some properties of the matrix and the linear map represented, on a given basis, by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the corresponding linear map is an isomorphism. The determinant of a product of matrices is the product of their determinants.

RSA (Rivest–Shamir–Adleman) is a public-key cryptosystem, one of the oldest widely used for secure data transmission. The initialism "RSA" comes from the surnames of Ron Rivest, Adi Shamir and Leonard Adleman, who publicly described the algorithm in 1977. An equivalent system was developed secretly in 1973 at Government Communications Headquarters (GCHQ), the British signals intelligence agency, by the English mathematician Clifford Cocks. That system was declassified in 1997.

<span class="mw-page-title-main">Matrix multiplication</span> Mathematical operation in linear algebra

In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The resulting matrix, known as the matrix product, has the number of rows of the first and the number of columns of the second matrix. The product of matrices A and B is denoted as AB.

<span class="mw-page-title-main">Cayley–Hamilton theorem</span> Every square matrix over a commutative ring satisfies its own characteristic equation

In linear algebra, the Cayley–Hamilton theorem states that every square matrix over a commutative ring satisfies its own characteristic equation.

In modular arithmetic, a number g is a primitive root modulo n if every number a coprime to n is congruent to a power of g modulo n. That is, g is a primitive root modulo n if for every integer a coprime to n, there is some integer k for which gka. Such a value k is called the index or discrete logarithm of a to the base g modulo n. So g is a primitive root modulo n if and only if g is a generator of the multiplicative group of integers modulo n.

In linear algebra, an n-by-n square matrix A is called invertible if there exists an n-by-n square matrix B such that

In mathematics, the conjugate transpose, also known as the Hermitian transpose, of an complex matrix is an matrix obtained by transposing and applying complex conjugation to each entry. There are several notations, such as or , , or .

<span class="mw-page-title-main">Scalar multiplication</span> Algebraic operation

In mathematics, scalar multiplication is one of the basic operations defining a vector space in linear algebra. In common geometrical contexts, scalar multiplication of a real Euclidean vector by a positive real number multiplies the magnitude of the vector without changing its direction. Scalar multiplication is the multiplication of a vector by a scalar, and is to be distinguished from inner product of two vectors.

The Rabin cryptosystem is a family of public-key encryption schemes based on a trapdoor function whose security, like that of RSA, is related to the difficulty of integer factorization.

<span class="mw-page-title-main">Block matrix</span> Matrix defined using smaller matrices called blocks

In mathematics, a block matrix or a partitioned matrix is a matrix that is interpreted as having been broken into sections called blocks or submatrices.

The NTRUEncrypt public key cryptosystem, also known as the NTRU encryption algorithm, is an NTRU lattice-based alternative to RSA and elliptic curve cryptography (ECC) and is based on the shortest vector problem in a lattice.

The quadratic sieve algorithm (QS) is an integer factorization algorithm and, in practice, the second-fastest method known. It is still the fastest for integers under 100 decimal digits or so, and is considerably simpler than the number field sieve. It is a general-purpose factorization algorithm, meaning that its running time depends solely on the size of the integer to be factored, and not on special structure or properties. It was invented by Carl Pomerance in 1981 as an improvement to Schroeppel's linear sieve.

In cryptography, mod n cryptanalysis is an attack applicable to block and stream ciphers. It is a form of partitioning cryptanalysis that exploits unevenness in how the cipher operates over equivalence classes modulo n. The method was first suggested in 1999 by John Kelsey, Bruce Schneier, and David Wagner and applied to RC5P and M6. These attacks used the properties of binary addition and bit rotation modulo a Fermat prime.

Modular exponentiation is exponentiation performed over a modulus. It is useful in computer science, especially in the field of public-key cryptography, where it is used in both Diffie–Hellman key exchange and RSA public/private keys.

In mathematics, the Smith normal form is a normal form that can be defined for any matrix with entries in a principal ideal domain (PID). The Smith normal form of a matrix is diagonal, and can be obtained from the original matrix by multiplying on the left and right by invertible square matrices. In particular, the integers are a PID, so one can always calculate the Smith normal form of an integer matrix. The Smith normal form is very useful for working with finitely generated modules over a PID, and in particular for deducing the structure of a quotient of a free module. It is named after the Irish mathematician Henry John Stephen Smith.

In mathematics, Hensel's lemma, also known as Hensel's lifting lemma, named after Kurt Hensel, is a result in modular arithmetic, stating that if a univariate polynomial has a simple root modulo a prime number p, then this root can be lifted to a unique root modulo any higher power of p. More generally, if a polynomial factors modulo p into two coprime polynomials, this factorization can be lifted to a factorization modulo any higher power of p.

In algebra, the zero-product property states that the product of two nonzero elements is nonzero. In other words,

A Frobenius matrix is a special kind of square matrix from numerical analysis. A matrix is a Frobenius matrix if it has the following three properties:

In mathematics, particularly in the area of arithmetic, a modular multiplicative inverse of an integer a is an integer x such that the product ax is congruent to 1 with respect to the modulus m. In the standard notation of modular arithmetic this congruence is written as

In abstract algebra, a matrix field is a field with matrices as elements. In field theory there are two types of fields: finite fields and infinite fields. There are several examples of matrix fields of different characteristic and cardinality.

References