User:Tcshasaposse/Computational complexity theory

Computational complexity theory, as a branch of the theory of computation in computer science, investigates the amount of resources (such as time, memory, randomness) required by algorithms for solving various computational problems.

The origins of computational complexity can be traced back to the early 1970s when it was realized that certain simple problems would take an inordinate amount of time to solve on any computer, even though these problems are in principle solvable. Moreover, the inherent difficulty of these problems does not have anything to do with the computing technology that was available in the 1970s. It appears that nature imposes intrinsic obstacles at performing certain computations, and a central question in complexity theory is to understand why and how these obstacles arise.

Computational problems

Unlike computability theory, which mostly studies problems about computations, much of the focus of computational complexity theory has been on natural problems that are inspired by other scientific disciplines and areas of life (be it mathematics, economics, physics, or sociology). Complexity theory also studies problems arising from computations, but these are often used just as a stepping stone to say something about problems that scientists and engineers from other disciplines might be interested in.

Depending on the type of question asked, computational problems can be classified as decision problems, search problems, optimization problems, or counting problems.

Decision problems

Decision problems can be answered by "yes" or "no". Examples:

Perfect Matching: Given a graph, does it contain a perfect matching?

Boolean Formula Satisfiability (SAT): Given a boolean formula in conjunctive normal form (CNF), is there an assignment that satisfies the formula?

Search problems

In a search problem we not only want to know if a solution exists, but find a solution as well. Examples:

Find a Perfect Matching: Given a graph, find a perfect matching if one exists.

Find-SAT: Given a boolean formula, find an assignment that satisfies it, if possible.

Optimization problems

Optimization problems ask for the best possible solution to a problem. A decision or search problem can have several optimization variants. Examples:

Maximum matching: Given a graph, find a maximum matching, i.e. the largest perfect matching present in one of its subgraphs.

Maximum SAT: Given a boolean formula, find an assignment that satisfies as many of its clauses as possible.

Minimum equivalent SAT: Given a boolean formula, find the smallest formula that is equivalent to it, i.e., one that shares the same set of satisfying assignments.

Approximate optimization problems do not ask for the best possible solution, but one that approximates it, for example:

90% Approximate SAT: Given a boolean formula, find an assignment that satisfies 90% of the maximum

possible number of clauses that can be satisfied.

Counting problems

Counting problems ask for the number of solutions of a given instance. Examples:

#matching Given a graph, count the number of perfect matchings it contains.

#sat Given a boolean formula, count how many satisfying assignments it has.

Complexity measures

The time complexity of a problem is the number of steps that it takes to solve an instance of the problem as a function of the size of the input (usually measured in bits), using the most efficient algorithm. To understand this intuitively, consider the example of an instance that is n bits long that can be solved in n² steps. In this example we say the problem has a time complexity of n². Of course, the exact number of steps will depend on exactly what machine or language is being used. To avoid that problem, the Big O notation is generally used (sometimes described as the "order" of the calculation, as in "on the order of"). If a problem has time complexity O(n²) on one typical computer, then it will also have complexity O(n²) on most other computers, so this notation allows us to generalize away from the details of a particular computer.

Example: Mowing grass has linear time complexity because it takes double the time to mow double the area. However, looking up something in a dictionary has only logarithmic time complexity because a double sized dictionary only has to be opened one time more (i.e. exactly in the middle, then the problem size is reduced by half).

The space complexity of a problem is a related concept, that measures the amount of space, or memory required by the algorithm. An informal analogy would be the amount of scratch paper needed while working out a problem with pen and paper. Space complexity is also measured with Big O notation.

A different measure of problem complexity, which is useful in some cases, is circuit complexity. This is a measure of the size of a boolean circuit needed to compute the answer to a problem, in terms of the number of logic gates required to build the circuit. Such a measure is useful, for example, when designing hardware microchips to compute the function instead of software.

An important result in complexity theory is the fact that no matter how hard a problem can get (i.e. how much time and space resources it requires), there will always be even harder problems. For time complexity, this is proven by the time hierarchy theorem. A similar space hierarchy theorem can also be derived.

Computational resources

Complexity theory analyzes the difficulty of computational problems in terms of many different computational resources. The same problem can be explained in terms of the necessary amounts of many different computational resources, including time, space, randomness, alternation, and other less-intuitive measures. A complexity class is the set of all of the computational problems which can be solved using a certain amount of a certain computational resource.

Perhaps the most well-studied computational resources are deterministic time (DTIME) and deterministic space (DSPACE). These resources represent the amount of computation time and memory space needed on a deterministic computer, like the computers that actually exist. These resources are of great practical interest, and are well-studied.

Some computational problems are easier to analyze in terms of more unusual resources. For example, a nondeterministic Turing machine is a computational model that is allowed to branch out to check many different possibilities at once. The nondeterministic Turing machine has very little to do with how we physically want to compute algorithms, but its branching exactly captures many of the mathematical models we want to analyze, so that nondeterministic time is a very important resource in analyzing computational problems.

Many more unusual computational resources have been used in complexity theory. Technically, any complexity measure can be viewed as a computational resource, and complexity measures are very broadly defined by the Blum complexity axioms.

Complexity classes

A complexity class is the set of all of the computational problems which can be solved using a certain amount of a certain computational resource.

The complexity class P is the set of decision problems that can be solved by a deterministic machine in polynomial time. This class corresponds to an intuitive idea of the problems which can be effectively solved in the worst cases.^[1]

The complexity class NP is the set of decision problems that can be solved by a non-deterministic machine in polynomial time. This class contains many problems that people would like to be able to solve effectively, including the Boolean satisfiability problem, the Hamiltonian path problem and the vertex cover problem. All the problems in this class have the property that their solutions can be checked efficiently.^[1]

Many complexity classes can be characterized in terms of the mathematical logic needed to express them – this field is called descriptive complexity.

Open questions

The P = NP question

The question of whether NP is the same set as P (that is whether problems that can be solved in non-deterministic polynomial time can be solved in deterministic polynomial time) is one of the most important open questions in theoretical computer science due to the wide implications a solution would present.^[1] If it were true, many important problems would be shown to have "efficient" solutions. These include various types of integer programming in operations research, many problems in logistics, protein structure prediction in biology, and the ability to find formal proofs of pure mathematics theorems efficiently using computers.^[2]^[3] The P=NP problem is one of the Millennium Prize Problems proposed by the Clay Mathematics Institute the solution of which is a US$1,000,000 prize for the first person to provide a solution.^[4]

Questions like this motivate the concepts of hard and complete. A set of problems X is hard for a set of problems Y if every problem in Y can be transformed "easily" into some problem in X that produces the same solution. The definition of "easily" is different in different contexts. In the particular case of P versus NP, the relevant hard set is NP-hard – a set of problems, that are not necessarily in NP themselves, but to which any NP problem can be reduced in polynomial time.

Set X is complete for Y if it is hard for Y, and is also a subset of Y. Thus, the set of NP-complete problems contains the most "difficult" problems in NP, in the sense that they are the ones most likely not to be in P. Because the problem of P = NP remains unsolved, being able to reduce a problem to a known NP-complete problem would indicate that there is no known polynomial-time solution for it. Similarly, because all NP problems can be reduced to the set, finding an NP-complete problem that can be solved in polynomial time would mean that P = NP.^[1]

Incomplete problems in NP

Incomplete problems are problems in NP that are neither NP-complete nor in P. In other words, incomplete problems can neither be solved in polynomial time nor do they belong to the hardest problems of NP. It has been shown that, if P ≠ NP, then there exist NP-incomplete problems.^[5]^[6]

An important open problem in this context is, whether the graph isomorphism problem is in P, NP-complete, or NP-incomplete. The answer is not known, but there are strong hints that the problem is at least not NP-complete.^[7] The graph isomorphism problem asks whether two given graphs are isomorphic.

NP = co-NP

Where co-NP is the set containing the complement problems (i.e. problems with the yes/no answers reversed) of NP problems. It is believed that the two classes are not equal, however it has not yet been proven. It has been shown that if these two complexity classes are not equal, then it follows that no NP-complete problem can be in co-NP and no co-NP-complete problem can be in NP.^[6]

Intractability

Problems that are solvable in theory, but cannot be solved in practice, are called intractable.^{[citation needed]} Naive complexity theory assumes problems that lack polynomial-time solutions are intractable for more than the smallest inputs. Problems that are known to be intractable in this sense include those that are EXPTIME-complete. If NP is not the same as P, then the NP-complete problems are also intractable in this sense. What this means "in practice" is open to debate.

To see why exponential-time solutions might be unusable in practice, consider a problem that requires 2ⁿ operations to solve (n is the size of the input). For a relatively small input size of n=100, and assuming a computer that can perform 10¹² operations per second, a solution would take about 4×10¹⁰ years, much longer than the current age of the universe. On the other hand, a problem that requires n¹⁵ operations would be in P, yet a solution would also take about 4×10¹⁰ years for n=100. And a problem that required 2^0.0000001*n operations would not be in P, but would be solvable for quite large cases.

Finally, saying that a problem is not in P does not imply that all large cases of the problem are hard or even that most of them are. For example the decision problem in Presburger arithmetic has been shown not to be in P, yet algorithms have been written that solve the problem in reasonable times in most cases.

History

The foundations of computational complexity theory were laid by Andrey Kolmogorov in the 1950s. A notable early discovery was the Karatsuba algorithm in 1960, for the multiplication of two numbers. This algorithm disproved Kolmogorov's 1956 conjecture that the fastest multiplication algorithm must be $\Omega (n^{2})$ , and thus helped launch the study of algorithms in earnest. The field was subsequently expanded by many researchers, including:

Manuel Blum, who developed an axiomatic complexity theory based on his Blum axioms
Allan Borodin
Stephen Cook
Michael R. Garey
Oded Goldreich
Juris Hartmanis
David S. Johnson
Richard Karp
Marek Karpinski
Donald Knuth
Leonid Levin
Christos H. Papadimitriou
Alexander Razborov
Richard Stearns
Leslie Valiant
Andrew Yao

References

Blum, M. (1967) On the Size of Machines, Information and Control, v. 11, pp. 257-265

Blum M. (1967) A Machine-independent Theory of Complexity of Recursive Functions, Journal of the ACM, v. 14, No.2, pp. 322-336

L. Fortnow, Steve Homer (2002/2003). A Short History of Computational Complexity. In D. van Dalen, J. Dawson, and A. Kanamori, editors, The History of Mathematical Logic. North-Holland, Amsterdam.

Jan van Leeuwen, ed. Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, The MIT Press/Elsevier, 1990. ISBN 978-0-444-88071-0 (Volume A). QA 76.H279 1990. Huge compendium of information, 1000s of references in the various articles.

Herbert S. Wilf, Algorithms and Complexity, http://www.math.upenn.edu/~wilf/AlgComp3.html

Footnotes

^ ^a ^b ^c ^d Sipser, Michael (2006). "Time Complexity". Introduction to the Theory of Computation (2nd edition ed.). USA: Thomson Course Technology. ISBN 0534950973. {{cite book}}: |edition= has extra text (help)
^ Berger, Bonnie A. (1998). "Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete". Journal of Computational Biology. 5 (1): p27–40. PMID 9541869. {{cite journal}}: |pages= has extra text (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ Cook, Stephen (2000). "The P versus NP Problem" (PDF). Clay Mathematics Institute. Retrieved 2006-10-18. {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |month= ignored (help)
^ Jaffe, Arthur M. (2006). "The Millennium Grand Challenge in Mathematics" (PDF). Notices of the AMS. 53 (6). Retrieved 2006-10-18.
^ ^a ^b Ladner, Richard E. (1975). "On the structure of polynomial time reducibility" (PDF). Journal of the ACM (JACM). 22 (1): 151–171. doi:10.1145/321864.321877.
^ ^a ^b Du, Ding-Zhu (2000). Theory of Computational Complexity. John Wiley & Sons. ISBN 978-0-471-34506-0. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |country= ignored (help)
^ Arvind, Vikraman; Kurur, Piyush P. (2006), "Graph isomorphism is in SPP", Information and Computation, 204 (5): 835–852, doi:10.1016/j.ic.2006.02.002

External links

The Complexity Zoo – a Wiki on complexity classes.
Free (at least for now) book on computational complexity

Template:Link FA

[Sipser2006-1] Sipser, Michael (2006). "Time Complexity". Introduction to the Theory of Computation (2nd edition ed.). USA: Thomson Course Technology. ISBN 0534950973. {{cite book}}: |edition= has extra text (help)

[2] Berger, Bonnie A. (1998). "Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete". Journal of Computational Biology. 5 (1): p27–40. PMID 9541869. {{cite journal}}: |pages= has extra text (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)

[3] Cook, Stephen (2000). "The P versus NP Problem" (PDF). Clay Mathematics Institute. Retrieved 2006-10-18. {{cite journal}}: Cite journal requires |journal= (help); Unknown parameter |month= ignored (help)

[4] Jaffe, Arthur M. (2006). "The Millennium Grand Challenge in Mathematics" (PDF). Notices of the AMS. 53 (6). Retrieved 2006-10-18.

[Ladner75-5] Ladner, Richard E. (1975). "On the structure of polynomial time reducibility" (PDF). Journal of the ACM (JACM). 22 (1): 151–171. doi:10.1145/321864.321877.

[DuKo2000-6] Du, Ding-Zhu (2000). Theory of Computational Complexity. John Wiley & Sons. ISBN 978-0-471-34506-0. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |country= ignored (help)

[AK06-7] Arvind, Vikraman; Kurur, Piyush P. (2006), "Graph isomorphism is in SPP", Information and Computation, 204 (5): 835–852, doi:10.1016/j.ic.2006.02.002

[1]

[2]

[3]

[4]

[5]

[6]

[7]

v t e Complexity classes
Considered feasible	DLOGTIME AC⁰ ACC⁰ TC TC⁰ L SL RL FL NL NL-complete NC SC CC P P-complete ZPP RP BPP BQP APX FP
Suspected infeasible	UP NP NP-complete NP-hard co-NP co-NP-complete TFNP FNP AM QMA PH ⊕P PP #P #P-complete IP PSPACE PSPACE-complete
Considered infeasible	EXPTIME NEXPTIME EXPSPACE 2-EXPTIME ELEMENTARY PR R RE ALL
Class hierarchies	Polynomial hierarchy Exponential hierarchy Grzegorczyk hierarchy Arithmetical hierarchy Boolean hierarchy
Families of classes	DTIME NTIME DSPACE NSPACE Probabilistically checkable proof Interactive proof system
List of complexity classes