This seems like an ordinary puzzle that could take some hours to figure out. However, after giving it some thought, it turns out to be a quite hard problem. In fact it is a version of a well known problem called the “Clique problem”. It is the mother of hard problems and is intricately connected to other hard problems such as prime factorization, the knapsack problem and the travelling salesman. This problem is so hard that the Clay Mathematics Institute has offered a prize of $1 000 000 to anyone who can show that this or similar problems in the same class of problems can be solved in a practical way. Likewise, you win the money for showing that it is impossible to solve these problems without spending an amount of time in comparison to the age of the universe.
Before going into the fascinating details of this problem we should start with discussing what easy and hard problem really are from a computational standpoint. Consider the problem of solving the multiplication of 17 x 13. If you can’t solve this directly in your head you know a method from school on how to multiplicate the digits individually to come to the conclusion that the product is 221. If feels like an easy problem. Consider the reverse, what numbers should you multiply to get 221? This is harder, we probably have to test a lot of numbers to see if they divide 221 without a remainder to finally find that 17 and 13 are the factors of 221.
Now consider the same problem but with the numbers 37975227936943673922808872755445627854565536638199 × 40094690950920881030683735292761468389214899724061. If you’re not Rainman you will probably not do the multiplication in your head but you could do it with a pen and paper and some spare time. A computer would calculate the product in a split second. What about the reverse? Finding what two numbers to multiply to receive the product 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139 turns out to be extremely hard even with the help of a computer. It is so hard that this is the cornerstone of communication security on the Internet (RSA cryptography).
What is the actual difference? For these kind of problems we have methods also known as algorithms which acts as recipes on how to produce a solution to a specific kind of problem. In the case of multiplication we know from school that we should write the numbers above each other and multiply each digit with all the digits in the other number. The total number of simple operations will in this case be n2 if the numbers have n digits. This is a common way to denote how hard a problem is to solve. If the problem has an input of size n, in this case the number of digits in the numbers to multiply, the solution will require us to perform a number of simple operations proportional to some function of n, in this case n2. If you remember from school, a polynomial is a function where the variable is raised to different powers such as n1 + n2 + n3 + … nx. So the complexity of the multiplication problem which is n2 is a polynomial.
A problem which can be solved with an algorithm that performs something proportional to a polynomial number of operations in relation to the size of the input is considered an easy problem. There is in fact a name for these kind of problems. They belong to the class P of problems where P means that they can be solved in polynomial time.
Now let’s consider the reverse of our math problem where we tried to find the factors of the product. In this case, since we must test numbers smaller than the product for divisibility, the number of simple operations needed in this naive algorithm will grow exponentially with the number of digits in the product. For example, to find the factors of 221 we have to try whether it is divisible by 2, 3, 4, 5 … and so on. A number of one more digit like 5671 would require 10 times as many tries.
Problems like this where the number of operations needed to find the solution grows exponentially with the size of the input are considered hard. An interesting thing about this hard problem is that if you in some way find a solution (the factors of the number), it is an easy thing to verify that the solution is correct. You would simply multiply the factors and see whether they really produce the product. More formally, the task to verify the solution is in the class P of problems.
Now we’re closing in on the $1 000 000 problem. All these problems, which seems very hard to solve, but once you have a solution it is straight forward to verify that the solution is correct, are called NP problems (nondeterministic polynomial time). This means that we don’t really know the complexity to solve them, but a solution can be verified with an algorithm that belongs to class P (an easy task). To win the money you must show that either P = NP or P ≠ NP. That means, either you must prove that you can solve an NP problem in a fast way or show that this is impossible and there can’t exist any algorithm however smart it is that can solve these problems in a fast way.
Do you see the connection with the party planner problem? It is hard to find a group of people which all are friends among all the presumably invited persons. But once you have found this group, it is easy to verify that all these persons really are friends.
Another way to state this is that all these problems are some sort of search problems. You try to search for factors of number, a subset of party people or for all the other NP problems there are similar searches needed. The thing you win the money for is to determine whether you can solve these type of search problems without actually need to search out all the possibilities in a brute force manner which often means an exponential number of operations to test all combinations.
Let’s take a closer look at the party planning problem, or as we will call it from now, the Clique problem. We simplify our example and assume that there are seven people in the village and they can be represented as a graph. Each villager is a node in the graph and paths between two nodes indicates that the corresponding villagers are friends with eachother. The problem now is to find the largest subgraph where each node is connected to all other nodes in the subgraph. Such a subgraph is called a clique.
The naive algorithm would try all combinations to verify whether there is a clique of size 1, 2, 3 and so on until no such clique can be found. In the example below an exhaustive search is made to find a valid clique of size 4.
See how the search complexity would explode in possible permutations when increasing the number of nodes to 8, 9, 10...?
There are many NP problems which on the surface seems very different. Some famous examples include the travelling salesman problem (TSP) where you search for a tour between a set of cities shorter than a given length. The boolean satisfiability problem (SAT) where you try to find a configuration for a set of boolean variables that makes an expression true. For example the expression (a AND NOT b) can be evaluated to true if a = true and b = false. There are no known algorithms to solve these problems in polynomial time.
Now for some mind boggling facts about these problems. The clique problem together with several other NP problems are part of a special subclass in NP called NP-complete problems. Such problems have the property that all other NP problems can be converted to them in polynomial time. This means that for example the problem of factorizing a large number can in a fast way be remade into a graph clique problem. Solve the clique problem and you have the answer for the factorization.
See what this implies? If you can find a fast way of solving the clique problem or any other NP complete problem you have as a side effect solved all other NP problems since they can trivially (in respect to computational time) be converted to that problem.
If this algorithm was discovered, the digital landscape would be completely changed. Internet security would become obsolete over night and many other applications could be revolutionized with faster algorithms.
The NP equals P problem is probably the most famous open question within the field of computational computer science. Solve it and your name will be engraved in various precious metals.
The numbers for the example of large prime factorization described in this article are in fact taken from another challenge given by the RSA company. They posted publically a set of large semi prime numbers in 1991. A semi prime number is the factor of two primes. Each number had a reward attached so that when someone managed to factor the number they could claim prizes from $1,000 up to $200,000 USD. The numbers ranged from 330 to 2048 binary digits. The reason for this was to survey the academic landscape and find the cutting edge algorithms for prime factorization which their cryptographic product is based on. To current date all challenges up to 768 digits ($50,000) are beaten by various teams around the world. The challenge was discontinued in 2007.