All of the algorithms we have studied thus far have been **polynomial-time*** algorithms*: on inputs of size

To understand the class of polynomial-time solvable problems, we must first have a formal notion of what a "problem" is. We define an **abstract problem*** Q* to be a binary relation on a set *I* of problem * instances* and a set

This formulation of an abstract problem is more general than is required for our purposes. For simplicity, the theory of NP-completeness restricts attention to** *** decision problems*: those having a yes/no solution. In this case, we can view an abstract decision problem as a function that maps the instance set

Many abstract problems are not decision problems, but rather * optimization problems*, in which some value must be minimized or maximized. In order to apply the theory of NP-completeness to optimization problems, we must recast them as decision problems. Typically, an optimization problem can be recast by imposing a bound on the value to be optimized. As an example, in recasting the shortest-path problem as the decision problem PATH, we added a bound

If a computer program is to solve an abstract problem, problem instances must must be represented in a way that the program understands. An **encoding**** **of a set *S* of abstract objects is a mapping *e* from *S* to the set of binary strings.^{2 }For example, we are all familiar with encoding the natural numbers *N* = {0, 1, 2, 3, 4, . . .} as the strings {0, 1, 10, 11, 100, . . .}. Using this encoding, *e*(l7) = 10001. Anyone who has looked at computer representations of keyboard characters is familiar with either the ASCII or EBCDIC codes. In the ASCII code, *e*(A) = 1000001. Even a compound object can be encoded as a binary string by combining the representations of its constituent parts. Polygons, graphs, functions, ordered pairs, programs--all can be encoded as binary strings.

Thus, a computer algorithm that "solves" some abstract decision problem actually takes an encoding of a problem instance as input. We call a problem whose instance set is the set of binary strings a * concrete problem*. We say that an algorithm

We can now formally define the **complexity class**** P** as the set of concrete decision problems that are solvable in polynomial time.

We would like to extend the definition of polynomial-time solvability from concrete problems to abstract problems using encodings as the bridge, but we would like the definition to be independent of any particular encoding. That is, the efficiency of solving a problem should not depend on how the problem is encoded. Unfortunately, it depends quite heavily. For example, suppose that an integer *k* is to be provided as the sole input to an algorithm, and suppose that the running time of the algorithm is (*k*). If the integer *k* is provided in * unary*--a string of

We say that a function *f *: {0, 1}^{* } {0, 1}^{*} is * polynomial-time computable* if there exists a polynomial-time algorithm

One of the convenient aspects of focusing on decision problems is that they make it easy to use the machinery of formal-language theory. It is worthwhile at this point to review some definitions from that theory. An * alphabet* is a finite set of symbols. A

There are a variety of operations on languages. Set-theoretic operations, such as * union* and

L= {x_{1}x_{2}:x_{1}L_{1}andx_{2}L_{2}} .

The** *** closure* or

L^{*}= {}LL^{2}L^{3}^{ . . },

where *L ^{k}* is the language obtained by concatenating

L= {x* :Q(x) = 1} .

For example, the decision problem PATH has the corresponding language

PATH = {G,u,v,k: G= (V,E) is an undirected graph,

u,vV,

k0 is an integer, and

there exists a path fromutovinG

whose length is at mostk}.

The formal-language framework allows us to express the relation between decision problems and algorithms that solve them concisely. We say that an algorithm A * accepts *a string

Even if language *L* is accepted by an algorithm *A*, the algorithm will not necessarily reject a string *x* *L* provided as input to it. For example, the algorithm may loop forever. A language *L* is * decided* by an algorithm

We can informally define a * complexity class* as a set of languages, membership in which is determined by a

P = {L{0,1}* : there exists an algorithmA

that decidesLin polynomial time} .

In fact, P is also the class of languages that can be accepted in polynomial time.

P = {*L : L *is accepted by a polynomial-time algorithm}* .*

Define the optimization problem LONGEST-PATH-LENGTH as the relation that associates each instance of a undirected graph and two vertices with the length of the longest simple path between the two vertices. Define the decision problem LONGEST-PATH = {*G, u, v, k** *: *G* = (*V, E*) is an undirected graph, *u, v* *V*, *k* 0 is an integer, and there exists a simple path from *u* to *v* in *G* whose length is at least *k*}. Show that the optimization problem LONGEST-PATH-LENGTH can be solved in polynomial time if and only if LONGEST-PATH P.

Give a formal definition for the problem of finding the longest simple cycle in an undirected graph. Give a related decision problem. Give the language corresponding to the decision problem.

Is the dynamic-programming algorithm for the 0-1 knapsack problem that is asked for in Exercise 17.2-2 a polynomial-time algorithm? Explain your answer.

We now look at algorithms that "verify" membership in languages. For example, suppose that for a given instance *G, u, v, k* of the decision problem PATH, we are also given a path *p* from *u* to *v*. We can easily check whether the length of *p* is at most *k*, and if so, we can view *p* as a "certificate" that the instance indeed belongs to PATH. For the decision problem PATH, this certificate doesn't seem to buy us much. After all, PATH belongs to P--in fact, PATH can be solved in linear time--and so verifying membership from a given certificate takes as long as solving the problem from scratch. We shall now examine a problem for which we know of no polynomial-time decision algorithm yet, given a certificate, verification is easy.

The problem of finding a hamiltonian cycle in an undirected graph has been studied for over a hundred years. Formally, a * hamiltonian cycle* of an undirected graph

We can define the * hamiltonian-cycle problem,* "Does a graph

HAM-CYCLE = {G: Gis a hamiltonian graph}.

We define a * verification algorithm *as being a two-argument algorithm

L= {x{0, 1}* : there existsy{0, 1}* such thatA(x, y) = 1} .

The **complexity class**** NP** is the class of languages that can be verified by a polynomial-time algorithm.^{3 }More precisely, a language *L* belongs to NP if and only if there exists a two-input polynomial-time algorithm *A* and constant *c* such that

L= {x{0,1}* : there exists a certificateywith |y| =0(|x|)^{c}

such thatA(x,y) = 1} .

We say that algorithm *A* * verifies* language

^{3}The name "NP" stands for "nondeterministic polynomial time." The class NP was originally studied in the context of nondeterminism, but this book uses the somewhat simpler yet equivalent notion of verification. Hopcroft and Ullman [104] give a good presentation of NP-completeness in terms of nondeterministic models of computation.

Many other fundamental questions beyond the P NP question remain unresolved. Despite much work by many researchers, no one even knows if the class NP is closed under complement. That is, does *L* NP imply ? We can define the **complexity class**** co-NP** as the set of languages *L* such that . The question of whether NP is closed under complement can be rephrased as whether NP = co-NP. Since P is closed under complement (Exercise 36.1-7), it follows that P NP co-NP. Once again, however, it is not known whether P = NP co-NP or whether there is some language in NP co-NP - P. Figure 36.2 shows the four possible scenarios.

Consider the language GRAPH-ISOMORPHISM = {*G*_{1}, *G*_{2} : *G*_{1} and *G*_{2 }are isomorphic graphs}. Prove that GRAPH-ISOMORPHISM NP by describing a polynomial-time algorithm to verify the language.

A **hamiltonian path**** **in a graph is a simple path that visits every vertex exactly once. Show that the language HAM-PATH = {*G, u, v* : there is a hamiltonian path from *u* to *v* in graph *G*} belongs to NP.

Let be a boolean formula constructed from the boolean input variables *x*_{1}, *x*_{2}, . . . , *x _{k}*, negations (), AND's (), OR's (), and parentheses. The formula is a

Prove that if NP co-NP, then P NP.

Intuitively, a problem *Q* can be reduced to another problem *Q*' if any instance of *Q* can be "easily rephrased" as an instance of *Q*', the solution to which provides a solution to the instance of *Q*. For example, the problem of solving linear equations in an indeterminate *x* reduces to the problem of solving quadratic equations. Given an instance *ax* + *b* = 0, we transform it to 0*x*^{2} + *ax* + *b* = 0, whose solution provides a solution to *ax* + *b* = 0. Thus, if a problem *Q* reduces to another problem *Q*', then *Q* is, in a sense, "no harder to solve" than *Q*'.

Returning to our formal-language framework for decision problems, we say that a language *L*_{1} is * polynomial-time reducible* to a language

xL_{1 }if and only iff(x)L_{2}.

We call the function *f* the * reduction function*, and a polynomial-time algorithm

Polynomial-time reductions give us a powerful tool for proving that various languages belong to P.

If *L*_{1}, *L*_{2} {0, 1}^{*} are languages such that *L*_{1} *L*_{2}, then *L*_{2} P implies *L*_{1} P.

Polynomial-time reductions provide a formal means for showing that one problem is at least as hard as another, to within a polynomial-time factor. That is, if *L*_{1} _{p} *L*_{2}, then *L*_{1} is not more than a polynomial factor harder than *L*_{2}, which is why the "less than or equal to" notation for reduction is mnemonic. We can now define the set of NP-complete languages, which are the hardest problems in NP.

A language *L* {0, 1}^{*} is * NP-complete* if

If a language *L* satisfies property 2, but not necessarily property 1, we say that *L* is * NP-hard*. We also define NPC to be the class of NP-complete languages.

We have defined the notion of an NP-complete problem, but up to this point, we have not actually proved that any problem is NP-complete. Once we prove that at least one problem is NP-complete, we can use polynomial-time reducibility as a tool to prove the NP-completeness of other problems. Thus, we now focus on demonstrating the existence of an NP-complete problem: the circuit-satisfiability problem.

Figure 36.6 shows two boolean combinational circuits, each with three inputs and a single output. A * truth assignment* for a boolean combinational circuit is a set of boolean input values. We say that a one-output boolean combinational circuit is

CIRCUIT-SAT =

{C:Cis a satisfiable boolean combinational circuit} .

The circuit-satisfiability problem belongs to the class NP.

The circuit-satisfiability problem is NP-hard.

The circuit-satisfiability problem is NP-complete.

* Proof* Immediate from Lemmas 36.5 and 36.6 and the definition of NP-completeness.

A language *L* is * complete* for a language class

Show that *L* is complete for NP if and only if is complete for co-NP.

The NP-completeness of the circuit-satisfiability problem relies on a direct proof that *L* _{P} CIRCUIT-SAT for every language *L* NP. In this section, we shall show how to prove that languages are NP-complete without directly reducing *every* language in NP to the given language. We shall illustrate this methodology by proving that various formula-satisfiability problems are NP-complete. Section 36.5 provides many more examples of the methodology.

The following lemma is the basis of our method for showing that a language is NP-complete.

2. Select a known NP-complete language *L*'.

4. Prove that the function *f* satisfies *x* *L*' if and only if *f*(*x*) *L* for all *x* {0, 1 }^{*}.

5. Prove that the algorithm computing *f* runs in polynomial time.

We illustrate the reduction methodology by giving an NP-completeness proof for the problem of determining whether a boolean formula, not a circuit, is satisfiable. This problem has the historical honor of being the first problem ever shown to be NP-complete.

We formulate the **(formula)*** satisfiability* problem in terms of the language SAT as follows. An instance of SAT is a boolean formula

1. boolean variables: *x*_{1}, *x*_{2}, . . . . ;

2. boolean connectives: any boolean function with one or two inputs and one output, such as (AND), (OR), (NOT), (implication), (if and only if); and

As in boolean combinational circuits, a * truth assignment* for a boolean formula

SAT = {:is a satisfiable boolean formula} .

= ((x_{1}x_{2}) ((x_{1}x_{3})x_{4}))x_{2}

has the satisfying assignment *x*_{1 }= 0, *x*_{2 }= 0, *x*_{3 }= 1, *x*_{4 }= 1, since

= ((0 0) ((0 1) 1)) 0

= (1 (1 1)) 1

= (1 0) 1

= 1 ,

and thus this formula belongs to SAT.

Satisfiability of boolean formulas is NP-complete.

=xx_{10}(x_{4}_{3})

(x_{5 }(x_{1}x_{2))}

(x_{6 }x_{4)}

(x_{7 }(x_{1}x_{2}x_{4))}

(x_{8 }(x_{5}x_{6}))

(x_{9}(x_{6}x_{7}))

(x_{10}(x_{7}x_{8}x_{9})) .

Given a circuit *C*, it is straightforward to produce such a formula in polynomial time.

Many problems can be proved NP-complete by reduction from formula satisfiability. The reduction algorithm must handle any input formula, though, and this can lead to a huge number of cases that must be considered. It is often desirable, therefore, to reduce from a restricted language of boolean formulas, so that fewer cases need be considered. Of course, we must not restrict the language so much that it becomes polynomial-time solvable. One convenient language is 3-CNF satisfiability, or 3-CNF-SAT.

We define 3-CNF satisfiability using the following terms. A* literal* in a boolean formula is an occurrence of a variable or its negation. A boolean formula is in

For example, the boolean formula

(x_{1 }_{ }x_{1 }_{ }x_{2}) (x_{3}x_{2}x_{4}) (x_{1}_{ }x_{3 }_{ }x_{4})

= ((xx_{1}x_{2}) ((x_{1}x_{3})x_{4}))_{2 .}

Satisfiability of boolean formulas in 3-conjunctive normal form is NP-complete.

The first step is similar to the one used to prove CIRCUIT-SAT _{p }SAT in Theorem 36.9. First, we construct a binary "parse" tree for the input formula , with literals as leaves and connectives as internal nodes. Figure 36.9 shows such a parse tree for the formula

= ((xx_{1}x_{2}) ((x_{1}x_{3})x_{4}))_{2 }.

'=y_{1}(y_{1}(y_{2}x_{2}))

(y_{2}(y3y_{4}))

(y_{3}(x_{1}x_{2}))

(y_{4}y_{5})

(y_{5}(y_{6}y_{4}))

(y_{6}(x_{1}x_{3})).

y_{1}y_{2}x_{2}(y_{1}(y_{2}x_{2}))

------------------------------

1 1 1 0

1 1 0 1

1 0 1 0

1 0 0 0

0 1 1 1

0 1 0 0

0 0 1 1

0 0 0 1

The second step of the reduction converts each clause into conjunctive normal form. We construct a truth table for by evaluating all possible assignments to its variables. Each row of the truth table consists of a possible assignment of the variables of the clause, together with the value of the clause under that assignment. Using the truth-table entries that evaluate to 0, we build a formula in * disjunctive normal form* (or

(y_{1}y_{2}x_{2}) (y_{1}y_{2}x_{2}) (y_{1}y_{2}x_{2}) (y_{1}y_{2}x_{2}) .

Applying DeMorgan's laws, we get the CNF formula

which is equivalent to the original clause .

If *C _{i}* has 3 distinct literals, then simply include

Show the 3-CNF formula that results when we use the method of Theorem 36.10 on the formula (36.3).

Professor Jagger proposes to show that SAT _{P} 3-CNF-SAT by using only the truth-table technique in the proof of Theorem 36.10, and not the other steps. That is, the professor proposes to take the boolean formula , form a truth table for its variables, derive from the truth table a formula in 3-DNF that is equivalent to , and then negate and apply DeMorgan's laws to produce a 3-CNF formula equivalent to . Show that this strategy does not yield a polynomial-time reduction.

Show that the problem of determining whether a boolean formula is a tautology is complete for co-NP. (*Hint:* See Exercise 36.3-6.)

Let 2-CNF-SAT be the set of satisfiable boolean formulas in CNF with exactly 2 literals per clause. Show that 2-CNF-SAT P. Make your algorithm as efficient as possible. (*Hint:* Observe that *x* *y* is equivalent to *x* *y*. Reduce 2-CNF-SAT to a problem on a directed graph that is efficiently solvable.)

A * clique* in an undirected graph

CLIQUE = {G,k:Gis a graph with a clique of sizek}.

The clique problem is NP-complete.

are in different triples, that is, *r* *s*, and

their corresponding literals are * consistent*, that is, is not the negation of .

= (x_{1}x_{2}x_{3}) (x_{1}x_{2}x_{3}) (x_{1}x_{2}x_{3}) ,

then *G* is the graph shown in Figure 36.12.

A * vertex cover* of an undirected graph

The * vertex-cover problem* is to find a vertex cover of minimum size in a given graph. Restating this optimization problem as a decision problem, we wish to determine whether a graph has a vertex cover of a given size

VERTEX-COVER = {G, k: graphGhas vertex cover of sizek} .

The following theorem shows that this problem is NP-complete.

The vertex-cover problem is NP-complete.

We prove that the vertex-cover problem is NP-hard by showing that CLIQUE _{ P} VERTEX-COVER. This reduction is based on the notion of the "complement" of a graph. Given an undirected graph *G* = (*V, E*), we define the * complement* of

The next NP-complete problem we consider is arithmetic. In the * subset-sum problem*, we are given a finite set

As usual, we define the problem as a language:

SUBSET-SUM=

{S, t: there exists a subsetS'Ssuch thatt=S_{s}S_{'}} .

The subset-sum problem is NP-complete.

At the heart of the reduction is an incidence-matrix representation of *G*. Let *G* = (*V, E*) be an undirected graph and let *V* = {*v*_{0}, *v*_{1}, . . . ,*v*_{|}*V| _{-1}} and *E

y= 4_{j}.^{j}

The first digit of the target sum *t* is *k*, and all |*E|* lower-order digits are 2's. Formally,

S'= {x_{i1},x_{i2}, . . . ,x_{ik}}

{y:_{j}eis incident on precisely one vertex in_{j}V'} .

We now return to the hamiltonian-cycle problem defined in Section 36.2.

The hamiltonian cycle problem is NP-complete.

We now prove that HAM-CYCLE is NP-complete by showing that 3-CNF-SAT _{P} HAM-CYCLE. Given a 3-CNF boolean formula over variables *x*_{1}, *x*_{2}, . . . , *x _{n}* with clauses

between (*b*_{2,1}, *b*_{2,2}) and *e*_{1},

between (*b*_{2,2}, *b*_{2,3}) and , and

between (*b*_{2,3}, *b*_{2,4}) and *e*_{3}.

First, it traverses edge to go from the top left to the top right.

It next traverses edge to get back to the left side.

Finally, it traverses the B widgets from bottom to top on the left.

In the * traveling-salesman problem*, which is closely related to the hamiltonian-cycle problem, a salesman must visit

TSP = {G,c,k:G= (V,E) is a complete graph,

cis a function fromVVZ,

kZ, and

Ghas a traveling-salesman tour with cost at mostk} .

The traveling-salesman problem is NP-complete.

The instance of TSP is then (*G**'*, *c*, 0), which is easily formed in polynomial time.

The * subgraph-isomorphism problem* takes two graphs

Given an integer *m*-by-*n* matrix *A* and an integer *m*-vector *b*, the * 0-1 integer-programming problem* asks whether there is an integer

Show that the subset-sum problem is solvable in polynomial time if the target value *t* is expressed in unary.

The * set-partition problem* takes as input a set

Show that the hamiltonian-path problem is NP-complete.

The * longest-simple-cycle problem* is the problem of determining a simple cycle (no repeated vertices) of maximum length in a graph. Show that this problem is NP-complete.

An * independent set* of a graph

A * k-coloring* of an undirected graph

* a. *Give an efficient algorithm to determine a 2-coloring of a graph if one exists.

* c. *Let the language 3-COLOR be the set of graphs that can be 3-colored. Show that if 3-COLOR is NP-complete, then your decision problem from part (b) is NP-complete.

* f. *Complete the proof that 3-COLOR is NP-complete.

The proof of Theorem 36.14 was adapted from Papadimitriou and Steiglitz [154].