Web Hosting by Netfirms | Free Domain Names by Netfirms

# The PRIMES is in P little FAQ

This FAQ is compiled by Anton Stiglic (anton at instantlogic.net).
Special thanks to (in alphabetic order) Manindra Agrawal, Phil Carmody, Bill Heelan, Piyush P Kurur, Vaughan Pratt, Robert Silverman, Adam Smith.

Initial Version:  August 8th, 2002.
Last Updated:  September 22th, 2008.

Q1.  What is the purpose of this FAQ?
Q2.  What is a prime number?

Q3.  What is complexity theory?
Q4.  What is P?
Q5.  Are there complexity classes other than P?
Q6.  Is there a relation between these different classes?
Q7.  What is PRIMES?
Q8.  What algorithms were known for PRIMES before the publication of [1]?
Q9.  What exactly is the main result presented in [1] ?
Q10. What does the new primality testing algorithm look like?
Q11. Are there any available implementations of the new algorithm?
Q12. Could this result break cryptographic algorithms?
Q13. Does this result have any impact in cryptography at all?
Q14. What are some on-line references?

[1]  Manindra Agrawal, Neeraj Kayal and Nitin Saxena, PRIMES is in P .
http://www.cse.iitk.ac.in/news/primality.html

Q1.  What is the purpose of this FAQ?

This FAQ is intended to answer some of the most common questions related to a recent, very beautiful result in complexity theory that can be found in [1]. That paper proves that the problem of deciding whether or not an integer is a prime can be solved in time polynomial in the number of bits necessary to represent the integer at hand (that is, roughly the logarithm of the integer itself).  This FAQ also clarifies some misconceptions surrounding the result.

Q2.  What is a prime number?

A prime number is an integer, greater than 1, whose only divisors (factors) are one and itself.  For example the prime divisors of 10 are 2 and 5.  The first 7 primes are 2, 3, 5, 7, 11, 13 and 17.

Q3.  What is complexity theory?

Complexity theory is a field in theoretical computer science which attempts to quantify the difficulty of computational tasks and tends to aim at generality while doing so.  "Complexity" is measured by various natural computing resources, such as the amount of memory needed, communication bandwidth, time of execution, etc.  By analyzing several candidate algorithms for a problem, a most efficient one can be easily identified.  For the problem of determining the primality of an integer, the resource we examine is the time of execution.

Q4.  What is P?

P stands for a class of decisional problems (problems for which you can answer "yes" or "no") for which any instance of the problem can be solved by a deterministic algorithm in time that can be bounded by a polynomial on the size of the input of the problem.  A deterministic algorithm is one whose behavior can be completely predicted from the input.   Natural ways of measuring the size of the input is to consider the number of items in the input or the number of bits needed to represent the input.  For the problem of testing the primality of an integer n, we will consider the number of bits needed to represent n, this is lg (n), where lg represents the logarithm base 2. Time here is not defined in seconds, nor in computer cycles of a particular computer processor, but rather in an abstract way as the number of elementary operations (or "steps") needed to execute an algorithm.  The notion of step is defined in a way that is as machine-independent as possible (remember that complexity theory strives for generality), we assume only a single-processor random-access machine. A step is often taken to be a bit operation, however to simplify the analysis we typically assume that simple arithmetic operations such as addition, multiplication and division, take a single time-step each. Thus, an algorithm is represented as pseudocode and we state that each line corresponding to a step takes a constant amount of time to execute.

The class P is interesting because most problems that are in P can be solved efficiently in real life. There are exceptions however, since the constants in the polynomial function can turn out to be very large, but in general you can consider that most problems in P can be solved efficiently.

Q5.  Are there complexity classes other than P?

There are many other classes of problems. For example, BPP (bounded probabilistic polynomial-time) is the class of problems which can be solved using a randomized polynomial-time algorithm with at most exponentially small error of probability on every input.  A randomized algorithm is an algorithm which can make random choices during it's execution.  When a deterministic, non-randomized, algorithm is run multiple times on a given input, it will always execute the same path of instructions and output the same result, while on the other hand a randomized algorithm will typically choose a different path of execution (based on the output of a random number generator, such as the flip of a coin) on each run and sometimes output a different result.

RP (randomized polynomial-time) is a class similar to BPP, except that the algorithms that solve the problems in this class may err only on inputs for which the answer is really supposed to be "yes".  Co-RP is the analogous class of problems for which algorithms that solve the problems only err on inputs which are really suppose to be "no". In other words, a decisional problem is in Co-RP if it's "complementary" problem is in RP.  The problem complementary to that of deciding if an integer is prime is the problem of deciding if an integer is composite.

ZPP (zero-error probabilistic polynomial-time) is the class of problems which can be solved with zero error, but for which the algorithm may, with low probability, run for a long time (it should be that the *expected* time is polynomial in the input length).

Another well known class is NP, the class of problems that can be solved in polynomial time by a non-deterministic algorithm (NP stands for "non-deterministic polynomial"; note that NP *does not* stand for "non-polynomial time"-- a common misunderstanding of the term). A non-deterministic algorithm is one that can magically get some extra input (usually called a "witness") that helps it to find and verify the answer to the problem at hand. Specifically, a decisional problem is in NP if for every possible input for which the correct answer is "yes", there exists some witness which makes the algorithm output "yes". On the other hand, if the correct answer is "no", then there exists no witness which will cause the algorithm to mistakenly say "yes".  An equivalent, and now standard, way of viewing problems that are in NP is to say that a problem is in NP if a solution to the problem can be verified in polynomial time.

Example.  The "subset sum problem" is the following:  given a set of positive integers {a_1, a_2, ..., a_n} and a positive integer s, determine whether or not there is a subset of the a_i that sum to s.  This problem is NP since magically given a guessed subset of the integers which indeed sum to s one can verify that these integers really do sum to s in time which is polynomial in the size of the problem.  Note that there is no deterministic or randomized algorithm that is known to be able to solve the "subset sum problem" on arbitrary inputs in time bounded by a polynomial in the size of the inputs.

Note that a nondeterminstic algorithm does not constitute a reasonable definition of an algorithm in practice, contrary to randomized algorithms which are very reasonable in practice (and often used). A non-deterministic algorithm can, however, be simulated by a deterministic algorithm by trying all possible choices and making sure that it doesn't get stuck prematurely in a non-terminating computation, but this is very inefficient (not bounded by a polynomial in the size of the input).

A long standing problem is to decide whether or not P = NP, that is to determine whether all the problems that are in NP are in fact in P as well. A mathematical proof demonstrating whether P is equal to NP or not is worth 1 million dollars:
http://www.claymath.org/prizeproblems/index.htm

Q6.  Is there a relation between these different classes?

Yes.  Let A <= B denote the fact that A is a subset of the set B. A hierarchy exists among the different complexity classes mentioned:
P <= ZPP <= RP <= NP
We also have
P <= ZPP <= co-RP <= BPP

For example this says that any problem which is in P is also in ZPP and in RP and in co-RP and in NP, however a problem that is in NP is not necessarily  in RP or in co-RP or in ZPP or in P.  There is no known direct relation between BPP and NP, however one can prove for example that if P = NP then BPP = P, thus if we can one day prove that P = NP, the whole hierchary collapses.

A problem is said to be in a certain class if the best known algorithm for solving the problem satisfies the criteria of that class.  Thus through time, more efficient algorithms for problems can be found and the problem may be classified in a different class.

Q7.  What is PRIMES?

PRIMES is the decisional problem of determining whether or not a given integer n is prime.

Q8.  What algorithms were known for PRIMES before the publication of [1]?

The simplest primality test is "trial division".  We take an odd integer n (if n is even than it is only a prime if it is equal to 2) and try to divide it by all odd integers less than n.  In fact, we only need to try the integers up to sqrt(n), since at least one prime factor, if it exists, must be less than this value.  This primality test is very inefficient however: to determine if an integer n is prime one needs to execute sqrt(n) divisions, which is exponential in lg(n) (the size of n), thus not bounded by a polynomial in the size of n.

In 17th century, Fermat came up with a theorem referred to as Fermat`s Little Theorem, which states the following:

If n is a prime integer, than for any integer a which has no common divisors with n,
(1)      a^(n-1) == 1 (mod n)

The theorem can be turned into a randomized algorithm that determines if an integer is not prime: simply choose various integers a which share no common divisor with n, and test for equality in (1), if you find an integer a for which the equality in (1) fails you are guaranteed that n is not prime (because if n was prime (1) would always be satisfied!).  There is a good chance that you will come across such an integer a if n is not prime.

The test is not bulletproof for primality however, there exist integers n which are not prime and for which the equality in (1) will be satisfied for all integers a, thus if the algorithm does not determine that an integer is not prime we cannot conclude with certainty that it is a prime.

In 1975, Pratt published a paper showing that PRIMES is in NP. Using results known informally since the 1960's, he explained how every prime has a proof of its primality of size a polynomial in the size of the prime in question. For a bit of history surrounding this result and the complexity of primes, please see this note.

In 1976, Miller came up with a deterministic polynomial-time algorithm for PRIMES assuming the Extended Riemann Hypothesis (ERH).  Thus Miller proved that PRIMES is in P if ERH is true; ERH is widely believed to be true, but mathematicians have been trying (and failing!) to prove it for over one hundred years.

A year later Solovay and Strassen came up with a randomized algorithm for PRIMES which has a probability of error that can be made arbitrarily small for all inputs, thus proving that PRIMES is in BPP.  More precisely they proved that PRIMES is in co-RP:  given an integer that is actually a composite, the algorithm may determine it to be a prime with a probability of error that can be made arbitrarily small, while given a non composite (a prime) the algorithm will always correctly determine that the input is a non composite.

Rabin later modified Miller`s algorithm to present it as an unconditional but randomized polynomial-time algorithm, which also proves that PRIMES is in co-RP. This last algorithm (with some optimizations from Knuth) is commonly referred to as Miller-Rabin`s primality test, and is probably the primality test the most used in practice.

In 1983, Adleman, Pomerance and Rumely presented a deterministic algorithm for PRIMES (often referred to as the Cyclotomic method) that runs in time bounded by k(lg n)^(c*(lg lg lg n)), for some constants k and c.  This is "almost" polynomial in the size of the input but not exactly.  Contrary to the other algorithms mentioned, when this one determines an integer to be prime there is no mistake about it. Cohen and Lenstra simplified the algorithm both theoretically and algorithmically.  While efficient in practice,  numbers that are several hundred decimal digits long can be tested in just a few minutes on fast workstations, the algorithm is not easy to implement and the possibility of undetected bugs is very likely.

In 1986, Goldwasser and Killian proposed a randomized algorithm, based on elliptic curves, running in expected polynomial-time on almost all inputs, that produces a certificate for primality (a certificate of primality is some sort of evidence that proves that an integer is prime, which can be verified in polynomial time).  The algorithm is inefficient in practice,  Atkin developed a similar algorithm, known as the Elliptic Curve Primality Proving (ECPP) algorithm, which is efficient in practice.    ECPP has been used to prove the primality of numbers
more than 1000 decimal digits long.

Finally, Adleman and Huang, in 1992, modified the Goldwasser-Kilian algorithm to obtain a randomized polynomial time algorithm that always produced a certificate for primality, thus proving that PRIMES is in RP.  And since PRIMES is also in co-RP, this proved at the same time that PRIMES is in ZPP, since ZPP is equal to the intersection of RP and co-RP).

Q9.  What exactly is the main result presented in [1]?

The main result in [1] is a proof that the problem of deciding whether or not an integer is prime can be solved by a deterministic algorithm in time bounded by a polynomial in the size of the input, without using any unproven mathematical assumptions. Previously, algorithms for solving this problem existed which could run in time polynomial in the size of the input, but they where randomized. Specifically, the problem was known to be in ZPP --- no chance of making a mistake, but with low probability the randomized algorithm might take a very long time to run.  The authors of [1] proved that PRIMES is in fact in P.

The most commonly used algorithms for primality testing have been known for about fifteen years.  They run very quickly, but have a small probability of error.  This probability of error is essentially negligible---smaller than, say, the probability that the computer hardware running the algorithm makes an error, while in the same minute you are struck by lightning and win the lottery (an extremely unlikely event).  Nonetheless, that small probability of error was there, and mathematicians had tried for many years to make it go away. The authors of [1] finally succeeded.

Q10. What does the new primality testing algorithm look like?

A description of the algorithm along with a simplified explanation of how it works can be found at
http://primepages.org/prove/prove4_3.html

For more details of course you can read the original paper, [1]:
http://www.cse.iitk.ac.in/news/primality.html

Q11. Are there any available implementations of the new algorithm?

Yes.  You can find a list of implementations at
http://fatphil.org/maths/AKS/#Implementations

Q12. Could this result break cryptographic algorithms?

No. As mentioned above, fast randomized algorithms for primality testing were already known, and in fact they are necessary just to make most known public key cryptosystems feasible!

Some people confuse the factorization problem with the problem of distinguishing prime numbers from composite numbers.  The factorization problem is: given an integer n, try to find the prime numbers which when multiplied together give n.  The fundamental theorem of arithmetic states that every integer n can indeed be factored into prime numbers, and in a unique way, so the problem makes sense for any integer.  If one could efficiently factor large integers, then certain cryptographic algorithms would be broken (such as the famous RSA encryption and signature schemes).  The fact that PRIME has been found to be in P cannot be used to break any cryptographic algorithms.

Q13. Does this result have any impact in cryptography at all?

Not in any obvious ways. Certain algorithms need to generate prime numbers in order to construct cryptographic keys, but algorithms to accomplish this which can be executed very efficiently already existed before the result in [1].  The most commonly used ones have a probability of error, but this error can be made to be arbitrarily small (see question 9) and thus they give us practically the same assurance as the algorithm proposed in P.  These algorithms that are commonly used in practice are actually faster than the ones proposed in [1].  The result in [1] is a very important one in complexity theory, but probably have no (practical) impact in cryptography.

Q14. What are some on-line references?

* [1]  Manindra Agrawal, Neeraj Kayal and Nitin Saxena, PRIMES is in P:
http://www.cse.iitk.ac.in/news/primality.html

* An excellent ressource concerning primes is the Prime Pages:
http://primepages.org

Thre you will find informationa bout
ERH: http://primepages.org/notes/rh.html
Miller's test:   http://primepages.org/prove/prove2_3.html