This FAQ is compiled by Anton Stiglic (anton at instantlogic.net).
Special thanks to (in alphabetic order) Manindra Agrawal, Phil Carmody,
Bill Heelan, Piyush P Kurur, Vaughan Pratt, Robert Silverman, Adam
Smith.
Initial Version: August 8th, 2002.
Last Updated: September 22th, 2008.
Q1. What is the purpose of this FAQ?
Q2. What is a prime number?
Q3. What is complexity theory?
Q4. What is P?
Q5. Are there complexity classes other than
P?
Q6. Is there a relation between these different
classes?
Q7. What is PRIMES?
Q8. What algorithms were known for PRIMES
before the publication of [1]?
Q9. What exactly is the main result presented
in [1] ?
Q10. What does the new primality testing algorithm
look like?
Q11. Are there any available implementations
of the new algorithm?
Q12. Could this result break cryptographic algorithms?
Q13. Does this result have any impact in cryptography
at all?
Q14. What are some on-line references?
[1] Manindra Agrawal, Neeraj Kayal and Nitin Saxena, PRIMES is
in P .
http://www.cse.iitk.ac.in/news/primality.html
Q1. What is the purpose of this FAQ?
This FAQ is intended to answer some of the most common questions related
to a recent, very beautiful result in complexity theory that can be found
in [1]. That paper proves that the problem of deciding whether or not
an integer is a prime can be solved in time polynomial in the number of
bits necessary to represent the integer at hand (that is, roughly the
logarithm of the integer itself). This FAQ also clarifies some misconceptions
surrounding the result.
Q2. What is a prime number?
A prime number is an integer, greater than 1, whose only divisors (factors)
are one and itself. For example the prime divisors of 10 are 2 and
5. The first 7 primes are 2, 3, 5, 7, 11, 13 and 17.
Q3. What is complexity theory?
Complexity theory is a field in theoretical computer science which
attempts to quantify the difficulty of computational tasks and tends to
aim at generality while doing so. "Complexity" is measured by various
natural computing resources, such as the amount of memory needed, communication
bandwidth, time of execution, etc. By analyzing several candidate
algorithms for a problem, a most efficient one can be easily identified.
For the problem of determining the primality of an integer, the resource
we examine is the time of execution.
Q4. What is P?
P stands for a class of decisional problems (problems for which you
can answer "yes" or "no") for which any instance of the problem can be solved
by a deterministic algorithm in time that can be bounded by a polynomial
on the size of the input of the problem. A deterministic algorithm
is one whose behavior can be completely predicted from the input. Natural
ways of measuring the size of the input is to consider the number of items
in the input or the number of bits needed to represent the input. For
the problem of testing the primality of an integer n, we will consider the
number of bits needed to represent n, this is lg (n), where lg represents
the logarithm base 2. Time here is not defined in seconds, nor in computer
cycles of a particular computer processor, but rather in an abstract way
as the number of elementary operations (or "steps") needed to execute an
algorithm. The notion of step is defined in a way that is as machine-independent
as possible (remember that complexity theory strives for generality), we assume
only a single-processor random-access machine. A step is often taken to be
a bit operation, however to simplify the analysis we typically assume that
simple arithmetic operations such as addition, multiplication and division,
take a single time-step each. Thus, an algorithm is represented as pseudocode
and we state that each line corresponding to a step takes a constant amount
of time to execute.
The class P is interesting because most problems that are in P can
be solved efficiently in real life. There are exceptions however, since
the constants in the polynomial function can turn out to be very large,
but in general you can consider that most problems in P can be solved efficiently.
Q5. Are there complexity classes other than P?
There are many other classes of problems. For example, BPP (bounded
probabilistic polynomial-time) is the class of problems which can be solved
using a randomized polynomial-time algorithm with at most exponentially
small error of probability on every input. A randomized algorithm
is an algorithm which can make random choices during it's execution.
When a deterministic, non-randomized, algorithm is run multiple times on
a given input, it will always execute the same path of instructions and output
the same result, while on the other hand a randomized algorithm will typically
choose a different path of execution (based on the output of a random number
generator, such as the flip of a coin) on each run and sometimes output a
different result.
RP (randomized polynomial-time) is a class similar to BPP, except that
the algorithms that solve the problems in this class may err only on inputs
for which the answer is really supposed to be "yes". Co-RP is the
analogous class of problems for which algorithms that solve the problems
only err on inputs which are really suppose to be "no". In other words,
a decisional problem is in Co-RP if it's "complementary" problem is in RP.
The problem complementary to that of deciding if an integer is prime is
the problem of deciding if an integer is composite.
ZPP (zero-error probabilistic polynomial-time) is the class of problems
which can be solved with zero error, but for which the algorithm may, with
low probability, run for a long time (it should be that the *expected* time
is polynomial in the input length).
Another well known class is NP, the class of problems that can be solved
in polynomial time by a non-deterministic algorithm (NP stands for "non-deterministic
polynomial"; note that NP *does not* stand for "non-polynomial time"--
a common misunderstanding of the term). A non-deterministic algorithm is
one that can magically get some extra input (usually called a "witness")
that helps it to find and verify the answer to the problem at hand. Specifically,
a decisional problem is in NP if for every possible input for which the
correct answer is "yes", there exists some witness which makes the algorithm
output "yes". On the other hand, if the correct answer is "no", then there
exists no witness which will cause the algorithm to mistakenly say "yes".
An equivalent, and now standard, way of viewing problems that are
in NP is to say that a problem is in NP if a solution to the problem can
be verified in polynomial time.
Example. The "subset sum problem" is the following: given
a set of positive integers {a_1, a_2, ..., a_n} and a positive integer s,
determine whether or not there is a subset of the a_i that sum to s.
This problem is NP since magically given a guessed subset of the integers
which indeed sum to s one can verify that these integers really do sum to
s in time which is polynomial in the size of the problem. Note that
there is no deterministic or randomized algorithm that is known to be able
to solve the "subset sum problem" on arbitrary inputs in time bounded by
a polynomial in the size of the inputs.
Note that a nondeterminstic algorithm does not constitute a reasonable
definition of an algorithm in practice, contrary to randomized algorithms
which are very reasonable in practice (and often used). A non-deterministic
algorithm can, however, be simulated by a deterministic algorithm by trying
all possible choices and making sure that it doesn't get stuck prematurely
in a non-terminating computation, but this is very inefficient (not bounded
by a polynomial in the size of the input).
A long standing problem is to decide whether or not P = NP, that is
to determine whether all the problems that are in NP are in fact in P as
well. A mathematical proof demonstrating whether P is equal to NP or not
is worth 1 million dollars:
http://www.claymath.org/prizeproblems/index.htm
Q6. Is there a relation between these different classes?
Yes. Let A <= B denote the fact that A is a subset of the
set B. A hierarchy exists among the different complexity classes mentioned:
P <= ZPP <= RP <= NP
We also have
P <= ZPP <= co-RP <= BPP
For example this says that any problem which is in P is also in ZPP
and in RP and in co-RP and in NP, however a problem that is in NP is not
necessarily in RP or in co-RP or in ZPP or in P. There is no
known direct relation between BPP and NP, however one can prove for example
that if P = NP then BPP = P, thus if we can one day prove that P = NP, the
whole hierchary collapses.
A problem is said to be in a certain class if the best known algorithm
for solving the problem satisfies the criteria of that class. Thus
through time, more efficient algorithms for problems can be found and
the problem may be classified in a different class.
Q7. What is PRIMES?
PRIMES is the decisional problem of determining whether or not a given
integer n is prime.
Q8. What algorithms were known for PRIMES before the
publication of [1]?
The simplest primality test is "trial division". We take an odd
integer n (if n is even than it is only a prime if it is equal to 2) and
try to divide it by all odd integers less than n. In fact, we only
need to try the integers up to sqrt(n), since at least one prime factor,
if it exists, must be less than this value. This primality test is
very inefficient however: to determine if an integer n is prime one needs
to execute sqrt(n) divisions, which is exponential in lg(n) (the size of
n), thus not bounded by a polynomial in the size of n.
In 17th century, Fermat came up with a theorem referred to as Fermat`s
Little Theorem, which states the following:
If n is a prime integer, than for any integer a which has no
common divisors with n,
(1) a^(n-1) == 1 (mod n)
The theorem can be turned into a randomized algorithm that determines
if an integer is not prime: simply choose various integers a which share
no common divisor with n, and test for equality in (1), if you find an integer
a for which the equality in (1) fails you are guaranteed that n is not prime
(because if n was prime (1) would always be satisfied!). There is a
good chance that you will come across such an integer a if n is not prime.
The test is not bulletproof for primality however, there exist integers
n which are not prime and for which the equality in (1) will be satisfied
for all integers a, thus if the algorithm does not determine that an integer
is not prime we cannot conclude with certainty that it is a prime.
In 1975, Pratt published a paper showing that PRIMES is in NP.
Using results known informally since the 1960's, he explained how every
prime has a proof of its primality of size a polynomial in the size of the
prime in question. For a bit of history surrounding this result and the
complexity of primes, please see
this note.
In 1976, Miller came up with a deterministic polynomial-time algorithm
for PRIMES assuming the Extended Riemann Hypothesis (ERH). Thus
Miller proved that PRIMES is in P if ERH is true; ERH is widely believed
to be true, but mathematicians have been trying (and failing!) to prove
it for over one hundred years.
A year later Solovay and Strassen came up with a randomized algorithm
for PRIMES which has a probability of error that can be made arbitrarily
small for all inputs, thus proving that PRIMES is in BPP. More precisely
they proved that PRIMES is in co-RP: given an integer that is actually
a composite, the algorithm may determine it to be a prime with a probability
of error that can be made arbitrarily small, while given a non composite
(a prime) the algorithm will always correctly determine that the input
is a non composite.
Rabin later modified Miller`s algorithm to present it as an unconditional
but randomized polynomial-time algorithm, which also proves that PRIMES
is in co-RP. This last algorithm (with some optimizations from Knuth) is
commonly referred to as Miller-Rabin`s primality test, and is probably
the primality test the most used in practice.
In 1983, Adleman, Pomerance and Rumely presented a deterministic algorithm
for PRIMES (often referred to as the Cyclotomic method) that runs in time
bounded by k(lg n)^(c*(lg lg lg n)), for some constants k and c. This
is "almost" polynomial in the size of the input but not exactly. Contrary
to the other algorithms mentioned, when this one determines an integer to
be prime there is no mistake about it. Cohen and Lenstra simplified the
algorithm both theoretically and algorithmically. While efficient in
practice, numbers that are several hundred decimal digits long can
be tested in just a few minutes on fast workstations, the algorithm is not
easy to implement and the possibility of undetected bugs is very likely.
In 1986, Goldwasser and Killian proposed a randomized algorithm, based
on elliptic curves, running in expected polynomial-time on almost all
inputs, that produces a certificate for primality (a certificate of primality
is some sort of evidence that proves that an integer is prime, which can
be verified in polynomial time). The algorithm is inefficient in practice,
Atkin developed a similar algorithm, known as the Elliptic Curve Primality
Proving (ECPP) algorithm, which is efficient in practice. ECPP
has been used to prove the primality of numbers
more than 1000 decimal digits long.
Finally, Adleman and Huang, in 1992, modified the Goldwasser-Kilian
algorithm to obtain a randomized polynomial time algorithm that always produced
a certificate for primality, thus proving that PRIMES is in RP. And
since PRIMES is also in co-RP, this proved at the same time that PRIMES is
in ZPP, since ZPP is equal to the intersection of RP and co-RP).
Q9. What exactly is the main result presented in [1]?
The main result in [1] is a proof that the problem of deciding whether
or not an integer is prime can be solved by a deterministic algorithm in
time bounded by a polynomial in the size of the input, without using any
unproven mathematical assumptions. Previously, algorithms for solving this
problem existed which could run in time polynomial in the size of the input,
but they where randomized. Specifically, the problem was known to be in
ZPP --- no chance of making a mistake, but with low probability the randomized
algorithm might take a very long time to run. The authors of [1] proved
that PRIMES is in fact in P.
The most commonly used algorithms for primality testing have been known
for about fifteen years. They run very quickly, but have a small probability
of error. This probability of error is essentially negligible---smaller
than, say, the probability that the computer hardware running the algorithm
makes an error, while in the same minute you are struck by lightning and
win the lottery (an extremely unlikely event). Nonetheless, that
small probability of error was there, and mathematicians had tried for
many years to make it go away. The authors of [1] finally succeeded.
Q10. What does the new primality testing algorithm look like?
A description of the algorithm along with a simplified explanation of
how it works can be found at
http://primepages.org/prove/prove4_3.html
For more details of course you can read the original paper, [1]:
http://www.cse.iitk.ac.in/news/primality.html
Q11. Are there any available implementations of the new algorithm?
Yes. You can find a list of implementations at
http://fatphil.org/maths/AKS/#Implementations
Q12. Could this result break cryptographic algorithms?
No. As mentioned above, fast randomized algorithms for primality testing
were already known, and in fact they are necessary just to make most known
public key cryptosystems feasible!
Some people confuse the factorization problem with the problem of distinguishing
prime numbers from composite numbers. The factorization problem
is: given an integer n, try to find the prime numbers which when multiplied
together give n. The fundamental theorem of arithmetic states that
every integer n can indeed be factored into prime numbers, and in a unique
way, so the problem makes sense for any integer. If one could efficiently
factor large integers, then certain cryptographic algorithms would be broken
(such as the famous RSA encryption and signature schemes). The fact
that PRIME has been found to be in P cannot be used to break any cryptographic
algorithms.
Q13. Does this result have any impact in cryptography at all?
Not in any obvious ways. Certain algorithms need to generate prime
numbers in order to construct cryptographic keys, but algorithms to accomplish
this which can be executed very efficiently already existed before the
result in [1]. The most commonly used ones have a probability of error,
but this error can be made to be arbitrarily small (see question 9) and
thus they give us practically the same assurance as the algorithm proposed
in P. These algorithms that are commonly used in practice are actually
faster than the ones proposed in [1]. The result in [1] is a very
important one in complexity theory, but probably have no (practical) impact
in cryptography.
Q14. What are some on-line references?
* [1] Manindra Agrawal, Neeraj Kayal and Nitin Saxena, PRIMES
is in P:
http://www.cse.iitk.ac.in/news/primality.html
* An excellent ressource concerning primes is the Prime Pages:
http://primepages.org
Thre you will find informationa bout
ERH: http://primepages.org/notes/rh.html
Miller's test: http://primepages.org/prove/prove2_3.html
Adleman-Pomerance-Rumely: http://primepages.org/prove/prove4_1.html
Goldwasser-Kilian and Atkin: http://www.utm.edu/research/primes/prove/prove4_2.html
Agrawal-Kayal-Saxena: http://primepages.org/prove/prove4_3.html
+ many more primality algorithms and other prime stuff.
* The WikipediA on-line encyclopedia also has interesting information
about prime numbers:
http://www.wikipedia.org/wiki/Prime_number
* NIST's dictionary of algorithms and data structures:
http://www.nist.gov/dads/
there you can find definitions for the various complexity classes mentioned
here and much more.
* Phil Carmody has interesting links to things relevant to
the AKS algorithm
http://fatphil.org/maths/AKS/