Integer factorization

Last updated
Unsolved problem in computer science:

Can integer factorization be solved in polynomial time on a classical computer?

Contents

In number theory, integer factorization is the decomposition of a positive integer into a product of integers. Every positive integer greater than 1 is either the product of two or more integer factors, in which case it is called a composite number, or it is not, in which case it is called a prime number. For example, 15 is a composite number because 15 = 3·5, but 7 is a prime number because it cannot be decomposed in this way. If one of the factors is composite, it can in turn be written as a product of smaller factors, for example 60 = 3·20 = 3·(5·4). Continuing this process until every factor is prime is called prime factorization; the result is always unique up to the order of the factors by the prime factorization theorem.

To factorize a small integer n using mental or pen-and-paper arithmetic, the simplest method is trial division: checking if the number is divisible by prime numbers 2, 3, 5, and so on, up to the square root of n. For larger numbers, especially when using a computer, various more sophisticated factorization algorithms are more efficient. A prime factorization algorithm typically involves testing whether each factor is prime each time a factor is found.

When the numbers are sufficiently large, no efficient non-quantum integer factorization algorithm is known. However, it has not been proven that such an algorithm does not exist. The presumed difficulty of this problem is important for the algorithms used in cryptography such as RSA public-key encryption and the RSA digital signature. [1] Many areas of mathematics and computer science have been brought to bear on the problem, including elliptic curves, algebraic number theory, and quantum computing.

Not all numbers of a given length are equally hard to factor. The hardest instances of these problems (for currently known techniques) are semiprimes, the product of two prime numbers. When they are both large, for instance more than two thousand bits long, randomly chosen, and about the same size (but not too close, for example, to avoid efficient factorization by Fermat's factorization method), even the fastest prime factorization algorithms on the fastest computers can take enough time to make the search impractical; that is, as the number of digits of the integer being factored increases, the number of operations required to perform the factorization on any computer increases drastically.

Many cryptographic protocols are based on the difficulty of factoring large composite integers or a related problem—for example, the RSA problem. An algorithm that efficiently factors an arbitrary integer would render RSA-based public-key cryptography insecure.

Prime decomposition

Prime decomposition of n = 864 as 2 x 3 PrimeDecompositionExample.svg
Prime decomposition of n = 864 as 2 × 3

By the fundamental theorem of arithmetic, every positive integer has a unique prime factorization. (By convention, 1 is the empty product.) Testing whether the integer is prime can be done in polynomial time, for example, by the AKS primality test. If composite, however, the polynomial time tests give no insight into how to obtain the factors.

Given a general algorithm for integer factorization, any integer can be factored into its constituent prime factors by repeated application of this algorithm. The situation is more complicated with special-purpose factorization algorithms, whose benefits may not be realized as well or even at all with the factors produced during decomposition. For example, if n = 171 × p × q where p < q are very large primes, trial division will quickly produce the factors 3 and 19 but will take p divisions to find the next factor. As a contrasting example, if n is the product of the primes 13729, 1372933, and 18848997161, where 13729 × 1372933 = 18848997157, Fermat's factorization method will begin with n⌉ = 18848997159 which immediately yields b = a2n = 4 = 2 and hence the factors ab = 18848997157 and a + b = 18848997161. While these are easily recognized as composite and prime respectively, Fermat's method will take much longer to factor the composite number because the starting value of 18848997157⌉ = 137292 for a is a factor of 10 from 1372933.

Current state of the art

Among the b-bit numbers, the most difficult to factor in practice using existing algorithms are those semiprimes whose factors are of similar size. For this reason, these are the integers used in cryptographic applications.

In 2019, Fabrice Boudot, Pierrick Gaudry, Aurore Guillevic, Nadia Heninger, Emmanuel Thomé and Paul Zimmermann factored a 240-digit (795-bit) number (RSA-240) utilizing approximately 900 core-years of computing power. [2] The researchers estimated that a 1024-bit RSA modulus would take about 500 times as long. [3]

The largest such semiprime yet factored was RSA-250, an 829-bit number with 250 decimal digits, in February 2020. The total computation time was roughly 2700 core-years of computing using Intel Xeon Gold 6130 at 2.1 GHz. Like all recent factorization records, this factorization was completed with a highly optimized implementation of the general number field sieve run on hundreds of machines.

Difficulty and complexity

No algorithm has been published that can factor all integers in polynomial time, that is, that can factor a b-bit number n in time O(bk) for some constant k. Neither the existence nor non-existence of such algorithms has been proved, but it is generally suspected that they do not exist and hence that the problem is not in class P. [4] [5] The problem is clearly in class NP, but it is generally suspected that it is not NP-complete, though this has not been proven. [6]

There are published algorithms that are faster than O((1 + ε)b) for all positive ε, that is, sub-exponential. As of 2022, the algorithm with best theoretical asymptotic running time is the general number field sieve (GNFS), first published in 1993, [7] running on a b-bit number n in time:

For current computers, GNFS is the best published algorithm for large n (more than about 400 bits). For a quantum computer, however, Peter Shor discovered an algorithm in 1994 that solves it in polynomial time. This will have significant implications for cryptography if quantum computation becomes scalable. Shor's algorithm takes only O(b3) time and O(b) space on b-bit number inputs. In 2001, Shor's algorithm was implemented for the first time, by using NMR techniques on molecules that provide seven qubits. [8]

It is not known exactly which complexity classes contain the decision version of the integer factorization problem (that is: does n have a factor smaller than k besides 1?). It is known to be in both NP and co-NP, meaning that both "yes" and "no" answers can be verified in polynomial time. An answer of "yes" can be certified by exhibiting a factorization n = d(n/d) with dk. An answer of "no" can be certified by exhibiting the factorization of n into distinct primes, all larger than k; one can verify their primality using the AKS primality test, and then multiply them to obtain n. The fundamental theorem of arithmetic guarantees that there is only one possible string of increasing primes that will be accepted, which shows that the problem is in both UP and co-UP. [9] It is known to be in BQP because of Shor's algorithm.

The problem is suspected to be outside all three of the complexity classes P, NP-complete, and co-NP-complete. It is therefore a candidate for the NP-intermediate complexity class. If it could be proved to be either NP-complete or co-NP-complete, this would imply NP = co-NP, a very surprising result, and therefore integer factorization is widely suspected to be outside both these classes.

In contrast, the decision problem "Is n a composite number?" (or equivalently: "Is n a prime number?") appears to be much easier than the problem of specifying factors of n. The composite/prime problem can be solved in polynomial time (in the number b of digits of n) with the AKS primality test. In addition, there are several probabilistic algorithms that can test primality very quickly in practice if one is willing to accept a vanishingly small possibility of error. The ease of primality testing is a crucial part of the RSA algorithm, as it is necessary to find large prime numbers to start with.

Factoring algorithms

Special-purpose

A special-purpose factoring algorithm's running time depends on the properties of the number to be factored or on one of its unknown factors: size, special form, etc. The parameters which determine the running time vary among algorithms.

An important subclass of special-purpose factoring algorithms is the Category 1 or First Category algorithms, whose running time depends on the size of smallest prime factor. Given an integer of unknown form, these methods are usually applied before general-purpose methods to remove small factors. [10] For example, naive trial division is a Category 1 algorithm.

General-purpose

A general-purpose factoring algorithm, also known as a Category 2, Second Category, or Kraitchik family algorithm, [10] has a running time which depends solely on the size of the integer to be factored. This is the type of algorithm used to factor RSA numbers. Most general-purpose factoring algorithms are based on the congruence of squares method.

Other notable algorithms

Heuristic running time

In number theory, there are many integer factoring algorithms that heuristically have expected running time

in little-o and L-notation. Some examples of those algorithms are the elliptic curve method and the quadratic sieve. Another such algorithm is the class group relations method proposed by Schnorr, [11] Seysen, [12] and Lenstra, [13] which they proved only assuming the unproved Generalized Riemann Hypothesis (GRH).

Rigorous running time

The Schnorr–Seysen–Lenstra probabilistic algorithm has been rigorously proven by Lenstra and Pomerance [14] to have expected running time Ln[1/2, 1+o(1)] by replacing the GRH assumption with the use of multipliers. The algorithm uses the class group of positive binary quadratic forms of discriminant Δ denoted by GΔ. GΔ is the set of triples of integers (a, b, c) in which those integers are relative prime.

Schnorr–Seysen–Lenstra algorithm

Given an integer n that will be factored, where n is an odd positive integer greater than a certain constant. In this factoring algorithm the discriminant Δ is chosen as a multiple of n, Δ = −dn, where d is some positive multiplier. The algorithm expects that for one d there exist enough smooth forms in GΔ. Lenstra and Pomerance show that the choice of d can be restricted to a small set to guarantee the smoothness result.

Denote by PΔ the set of all primes q with Kronecker symbol (Δ/q) = 1. By constructing a set of generators of GΔ and prime forms fq of GΔ with q in PΔ a sequence of relations between the set of generators and fq are produced. The size of q can be bounded by c0(log|Δ|)2 for some constant c0.

The relation that will be used is a relation between the product of powers that is equal to the neutral element of GΔ. These relations will be used to construct a so-called ambiguous form of GΔ, which is an element of GΔ of order dividing 2. By calculating the corresponding factorization of Δ and by taking a gcd, this ambiguous form provides the complete prime factorization of n. This algorithm has these main steps:

Let n be the number to be factored.

  1. Let Δ be a negative integer with Δ = −dn, where d is a multiplier and Δ is the negative discriminant of some quadratic form.
  2. Take the t first primes p1 = 2, p2 = 3, p3 = 5, ..., pt, for some tN.
  3. Let fq be a random prime form of GΔ with (Δ/q) = 1.
  4. Find a generating set X of GΔ.
  5. Collect a sequence of relations between set X and {fq : qPΔ} satisfying:
  6. Construct an ambiguous form (a, b, c) that is an element fGΔ of order dividing 2 to obtain a coprime factorization of the largest odd divisor of Δ in which Δ = −4ac or Δ = a(a − 4c) or Δ = (b − 2a)(b + 2a).
  7. If the ambiguous form provides a factorization of n then stop, otherwise find another ambiguous form until the factorization of n is found. In order to prevent useless ambiguous forms from generating, build up the 2-Sylow group Sll2(Δ) of G(Δ).

To obtain an algorithm for factoring any positive integer, it is necessary to add a few steps to this algorithm such as trial division, and the Jacobi sum test.

Expected running time

The algorithm as stated is a probabilistic algorithm as it makes random choices. Its expected running time is at most Ln[1/2, 1+o(1)]. [14]

See also

Notes

  1. Lenstra, Arjen K. (2011), "Integer Factoring", in van Tilborg, Henk C. A.; Jajodia, Sushil (eds.), Encyclopedia of Cryptography and Security, Boston, MA: Springer US, pp. 611–618, doi:10.1007/978-1-4419-5906-5_455, ISBN   978-1-4419-5905-8 , retrieved 2022-06-22
  2. "[Cado-nfs-discuss] 795-bit factoring and discrete logarithms". Archived from the original on 2019-12-02.
  3. Kleinjung; et al. (2010-02-18). "Factorization of a 768-bit RSA modulus" (PDF). International Association for Cryptologic Research . Retrieved 2010-08-09.{{cite journal}}: Cite journal requires |journal= (help)
  4. Krantz, Steven G. (2011), The Proof is in the Pudding: The Changing Nature of Mathematical Proof, New York: Springer, p. 203, doi:10.1007/978-0-387-48744-1, ISBN   978-0-387-48908-7, MR   2789493
  5. Arora, Sanjeev; Barak, Boaz (2009), Computational complexity, Cambridge: Cambridge University Press, p. 230, doi:10.1017/CBO9780511804090, ISBN   978-0-521-42426-4, MR   2500087, S2CID   215746906
  6. Goldreich, Oded; Wigderson, Avi (2008), "IV.20 Computational Complexity", in Gowers, Timothy; Barrow-Green, June; Leader, Imre (eds.), The Princeton Companion to Mathematics, Princeton, New Jersey: Princeton University Press, pp. 575–604, ISBN   978-0-691-11880-2, MR   2467561 . See in particular p. 583.
  7. Buhler, J. P.; Lenstra, H. W. Jr.; Pomerance, Carl (1993). "Factoring integers with the number field sieve". The development of the number field sieve. Lecture Notes in Mathematics. Vol. 1554. Springer. pp. 50–94. doi:10.1007/BFb0091539. hdl:1887/2149. ISBN   978-3-540-57013-4 . Retrieved 12 March 2021.
  8. Vandersypen, Lieven M. K.; et al. (2001). "Experimental realization of Shor's quantum factoring algorithm using nuclear magnetic resonance". Nature . 414 (6866): 883–887. arXiv: quant-ph/0112176 . Bibcode:2001Natur.414..883V. doi:10.1038/414883a. PMID   11780055. S2CID   4400832.
  9. Lance Fortnow (2002-09-13). "Computational Complexity Blog: Complexity Class of the Week: Factoring".
  10. 1 2 David Bressoud and Stan Wagon (2000). A Course in Computational Number Theory . Key College Publishing/Springer. pp.  168–69. ISBN   978-1-930190-10-8.
  11. Schnorr, Claus P. (1982). "Refined analysis and improvements on some factoring algorithms". Journal of Algorithms. 3 (2): 101–127. doi:10.1016/0196-6774(82)90012-8. MR   0657269. Archived from the original on September 24, 2017.
  12. Seysen, Martin (1987). "A probabilistic factorization algorithm with quadratic forms of negative discriminant". Mathematics of Computation. 48 (178): 757–780. doi: 10.1090/S0025-5718-1987-0878705-X . MR   0878705.
  13. Lenstra, Arjen K (1988). "Fast and rigorous factorization under the generalized Riemann hypothesis" (PDF). Indagationes Mathematicae. 50 (4): 443–454. doi:10.1016/S1385-7258(88)80022-2.
  14. 1 2 Lenstra, H. W.; Pomerance, Carl (July 1992). "A Rigorous Time Bound for Factoring Integers" (PDF). Journal of the American Mathematical Society. 5 (3): 483–516. doi: 10.1090/S0894-0347-1992-1137100-0 . MR   1137100.

Related Research Articles

In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm.

<span class="mw-page-title-main">Prime number</span> Number divisible only by 1 or itself

A prime number is a natural number greater than 1 that is not a product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime because the only ways of writing it as a product, 1 × 5 or 5 × 1, involve 5 itself. However, 4 is composite because it is a product (2 × 2) in which both numbers are smaller than 4. Primes are central in number theory because of the fundamental theorem of arithmetic: every natural number greater than 1 is either a prime itself or can be factorized as a product of primes that is unique up to their order.

RSA (Rivest–Shamir–Adleman) is a public-key cryptosystem, one of the oldest widely used for secure data transmission. The initialism "RSA" comes from the surnames of Ron Rivest, Adi Shamir and Leonard Adleman, who publicly described the algorithm in 1977. An equivalent system was developed secretly in 1973 at Government Communications Headquarters (GCHQ), the British signals intelligence agency, by the English mathematician Clifford Cocks. That system was declassified in 1997.

Shor's algorithm is a quantum algorithm for finding the prime factors of an integer. It was developed in 1994 by the American mathematician Peter Shor. It is one of the few known quantum algorithms with compelling potential applications and strong evidence of superpolynomial speedup compared to best known classical algorithms. On the other hand, factoring numbers of practical significance requires far more qubits than available in the near future. Another concern is that noise in quantum circuits may undermine results, requiring additional qubits for quantum error correction.

<span class="mw-page-title-main">Sieve of Eratosthenes</span> Ancient algorithm for generating prime numbers

In mathematics, the sieve of Eratosthenes is an ancient algorithm for finding all prime numbers up to any given limit.

In number theory, the general number field sieve (GNFS) is the most efficient classical algorithm known for factoring integers larger than 10100. Heuristically, its complexity for factoring an integer n (consisting of ⌊log2n⌋ + 1 bits) is of the form

The Lenstra elliptic-curve factorization or the elliptic-curve factorization method (ECM) is a fast, sub-exponential running time, algorithm for integer factorization, which employs elliptic curves. For general-purpose factoring, ECM is the third-fastest known factoring method. The second-fastest is the multiple polynomial quadratic sieve, and the fastest is the general number field sieve. The Lenstra elliptic-curve factorization is named after Hendrik Lenstra.

A primality test is an algorithm for determining whether an input number is prime. Among other fields of mathematics, it is used for cryptography. Unlike integer factorization, primality tests do not generally give prime factors, only stating whether the input number is prime or not. Factorization is thought to be a computationally difficult problem, whereas primality testing is comparatively easy. Some primality tests prove that a number is prime, while others like Miller–Rabin prove that a number is composite. Therefore, the latter might more accurately be called compositeness tests instead of primality tests.

The AKS primality test is a deterministic primality-proving algorithm created and published by Manindra Agrawal, Neeraj Kayal, and Nitin Saxena, computer scientists at the Indian Institute of Technology Kanpur, on August 6, 2002, in an article titled "PRIMES is in P". The algorithm was the first one which is able to determine in polynomial time, whether a given number is prime or composite and this without relying on mathematical conjectures such as the generalized Riemann hypothesis. The proof is also notable for not relying on the field of analysis. In 2006 the authors received both the Gödel Prize and Fulkerson Prize for their work.

In mathematics, the RSA numbers are a set of large semiprimes that were part of the RSA Factoring Challenge. The challenge was to find the prime factors of each number. It was created by RSA Laboratories in March 1991 to encourage research into computational number theory and the practical difficulty of factoring large integers. The challenge was ended in 2007.

The quadratic sieve algorithm (QS) is an integer factorization algorithm and, in practice, the second-fastest method known. It is still the fastest for integers under 100 decimal digits or so, and is considerably simpler than the number field sieve. It is a general-purpose factorization algorithm, meaning that its running time depends solely on the size of the integer to be factored, and not on special structure or properties. It was invented by Carl Pomerance in 1981 as an improvement to Schroeppel's linear sieve.

In number theory, a branch of mathematics, the special number field sieve (SNFS) is a special-purpose integer factorization algorithm. The general number field sieve (GNFS) was derived from it.

In number theory, an n-smooth (or n-friable) number is an integer whose prime factors are all less than or equal to n. For example, a 7-smooth number is a number whose every prime factor is at most 7, so 49 = 72 and 15750 = 2 × 32 × 53 × 7 are both 7-smooth, while 11 and 702 = 2 × 33 × 13 are not 7-smooth. The term seems to have been coined by Leonard Adleman. Smooth numbers are especially important in cryptography, which relies on factorization of integers. The 2-smooth numbers are just the powers of 2, while 5-smooth numbers are known as regular numbers.

In computational number theory, a variety of algorithms make it possible to generate prime numbers efficiently. These are used in various applications, for example hashing, public-key cryptography, and search of prime factors in large numbers.

L-notation is an asymptotic notation analogous to big-O notation, denoted as for a bound variable tending to infinity. Like big-O notation, it is usually used to roughly convey the rate of growth of a function, such as the computational complexity of a particular algorithm.

In mathematics and computer science, a primality certificate or primality proof is a succinct, formal proof that a number is prime. Primality certificates allow the primality of a number to be rapidly checked without having to run an expensive or unreliable primality test. "Succinct" usually means that the proof should be at most polynomially larger than the number of digits in the number itself.

<span class="mw-page-title-main">Arjen Lenstra</span> Dutch mathematician

Arjen Klaas Lenstra is a Dutch mathematician, cryptographer and computational number theorist. He is a professor emeritus from the École Polytechnique Fédérale de Lausanne (EPFL) where he headed of the Laboratory for Cryptologic Algorithms.

In computational number theory, the Adleman–Pomerance–Rumely primality test is an algorithm for determining whether a number is prime. Unlike other, more efficient algorithms for this purpose, it avoids the use of random numbers, so it is a deterministic primality test. It is named after its discoverers, Leonard Adleman, Carl Pomerance, and Robert Rumely. The test involves arithmetic in cyclotomic fields.

In cryptography, Very Smooth Hash (VSH) is a provably secure cryptographic hash function invented in 2005 by Scott Contini, Arjen Lenstra and Ron Steinfeld. Provably secure means that finding collisions is as difficult as some known hard mathematical problem. Unlike other provably secure collision-resistant hashes, VSH is efficient and usable in practice. Asymptotically, it only requires a single multiplication per log(n) message-bits and uses RSA-type arithmetic. Therefore, VSH can be useful in embedded environments where code space is limited.

References