Karp's 21 NP-Complete Problems


Karp's 21 NP-Complete Problems

This is arguably Karp's most impactful contribution, for which he received the Turing Award in 1985. In his seminal 1972 paper, "Reducibility Among Combinatorial Problems," Karp established the practical importance of the P vs. NP problem.

While Stephen Cook (and Leonid Levin) had provided the first NP-complete problem (SAT) with Cook's Theorem, Karp demonstrated that a vast array of common and important problems from graph theory, logic, and operations research were also NP-complete. He did this by providing polynomial-time reductions from known NP-complete problems to these new ones.

His paper proved that all these problems were computationally equivalent—if you could solve one efficiently, you could solve them all.

Some of the 21 problems he proved NP-complete include:

  • 0-1 Integer Programming

  • Clique: Does a graph contain a clique (a fully connected subgraph) of size $k$?

  • Set Packing

  • Vertex Cover: Is there a set of $k$ vertices that touches every edge in a graph?

  • Set Covering

  • Feedback Vertex Set

  • Hamiltonian Cycle: Does a graph contain a path that visits every vertex exactly once and returns to the start?

  • Directed Hamiltonian Cycle

  • 3-Satisfiability (3-SAT): A restricted version of SAT, which he used as a starting point for many other reductions.

  • Graph Coloring (Chromatic Number): Can a graph be colored with $k$ colors so no adjacent vertices share a color?

  • Clique Cover

  • Exact Cover

  • 3D Matching

  • Subset Sum: Given a set of integers, is there a non-empty subset that sums to exactly zero?

  • Knapsack Problem

  • Job Sequencing

  • Partition


Other Major Theorems and Algorithms

Beyond his 1972 paper, Karp co-authored several other major theorems and algorithms.

Edmonds-Karp Algorithm

Developed with Jack Edmonds, this is an algorithm for solving the maximum flow problem in a network.

  • The "Theorem" Part: The key theoretical result is the analysis of its runtime. The Edmonds-Karp algorithm implements the Ford-Fulkerson method by specifically choosing the shortest augmenting path (found via a Breadth-First Search). They proved that this specific choice guarantees the algorithm terminates in polynomial time, specifically $O(VE^2)$, where $V$ is the number of vertices and $E$ is the number of edges. This was a crucial step in showing that max-flow is an efficiently solvable (polynomial-time) problem.

Karp-Lipton Theorem

This is a major theorem in computational complexity theory that explores the consequences of NP problems having small circuits.

  • The Theorem: It states that if the NP-complete problem SAT can be solved by polynomial-size Boolean circuits (a "non-uniform" model of computation), then the polynomial hierarchy collapses to its second level ($\Pi_2^p$).

  • Significance: Because it's widely believed that the polynomial hierarchy does not collapse, this theorem provides strong evidence that NP problems (including all NP-complete problems) cannot be solved by polynomial-size circuits. This is a related, but distinct, question from P vs. NP.

Karp-Rabin Algorithm

Developed with Michael O. Rabin, this is a probabilistic string-matching algorithm.

  • The Algorithm: It uses hashing (specifically, a rolling hash known as a Rabin fingerprint) to find occurrences of a pattern string within a larger text.

  • The "Theorem" Part: The analysis shows that the algorithm runs in expected linear time ($O(n+m)$, where $n$ is the text length and $m$ is the pattern length) and has a very low, controllable probability of returning a false positive (which can be made arbitrarily small). It's a classic example of the power of randomization in algorithm design.

Karp's Algorithm for Minimum Mean Weight Cycle

This is an efficient algorithm to find a cycle in a directed graph such that the average weight of the edges in the cycle is minimized. This problem has applications in areas like system performance analysis.


Here are those proofs explained without mathematical notation.

To prove these problems are "hard" (specifically, NP-complete), we use a domino-effect strategy. We start with one problem we know is super hard (like the SAT problem, which involves solving logic puzzles) and show that we can translate it into one of the new problems.

If we can build a "translator" that turns any SAT puzzle into, say, a Clique problem, it means the Clique problem must be at least as hard as SAT.

1. 0-1 Integer Programming

  • The Problem: Imagine you have a list of yes/no questions. You also have a list of rules like, "If you answer 'yes' to question 1, you must answer 'no' to question 2" or "You must answer 'yes' to at least two of questions 3, 4, and 5." The goal is to find a set of yes/no answers that follows all the rules. (Here, "1" means yes, and "0" means no).

  • Why It's Easy to Check: If someone gives you their list of yes/no answers, you can quickly check it. Just go down your rulebook, one rule at a time, and see if their answers violate anything. This is very fast.

  • How to Prove It's Hard (The Proof):

    We show that the SAT logic puzzle can be translated into a 0-1 Integer Programming problem.

    1. A SAT puzzle has logic variables (like $A$, $B$, $C$) that can be "True" or "False".

    2. We create a yes/no question for each logic variable. Let's say "yes" (1) means the variable is "True," and "no" (0) means it's "False."

    3. A SAT puzzle also has rules, like "($A$ must be True) OR ($B$ must be False) OR ($C$ must be True)."

    4. We translate this exact rule into a mathematical one for our yes/no questions. We can write a rule that says, "(The 'yes' for $A$) + (the 'no' for $B$) + (the 'yes' for $C$) must be at least 1."

    5. This new math rule works exactly like the logic rule. The only way it can fail is if $A$ is 'no' (False), $B$ is 'yes' (True), and $C$ is 'no' (False)—which is the same way the logic rule fails!

    6. Since we can translate every part of any SAT puzzle into a 0-1 Integer Programming problem, finding a solution to the programming problem is the same as finding a solution to the original SAT puzzle. Therefore, it must be just as hard.


2. Clique

  • The Problem: Imagine a social network where lines connect people who are friends. A "clique" is a group of people who are all friends with each other (everyone in the group is connected to everyone else). The problem is: "Is there a clique of, say, 10 people in this network?"

  • Why It's Easy to Check: If someone gives you a list of 10 people, you can easily check if they're a clique. Just go through every possible pair in that group and check if a "friends" line exists between them. If all pairs are connected, it's a clique. This is fast.

  • How to Prove It's Hard (The Proof):

    We also show that the SAT logic puzzle can be translated into a Clique problem. This translation is very clever:

    1. Take your SAT puzzle, which is made of rules (clauses) like "(A or B or not-C) AND (not-A or C or D)..."

    2. We will build a graph. For every single item in every single rule, we create a dot (a node). So, "A" in the first rule gets a dot, "B" in the first rule gets a dot, "not-C" in the first rule gets a dot, "not-A" in the second rule gets a dot, and so on.

    3. Now, we draw lines (edges) between these dots based on two simple "don't connect" rules:

      • Rule 1: Don't connect dots that came from the same rule. (So, "A" and "B" from the first rule are not connected).

      • Rule 2: Don't connect dots that are opposites. (So, "A" from the first rule is not connected to "not-A" from the second rule).

    4. Connect all other dots with lines.

    5. Finally, we ask: "Does this new graph have a clique of size $k$?", where $k$ is the number of rules we started with.

    6. Why this works: A clique of size $k$ must pick exactly one dot from each of the $k$ original rules (because of Rule 1). And, it cannot pick two dots that are opposites (because of Rule 2).

    7. This "winning" set of dots is the solution to the SAT puzzle! If the clique contains the "A" dot, you set $A$ to "True." If it contains the "not-C" dot, you set $C$ to "False." This assignment is guaranteed to solve the puzzle. Because of this perfect translation, Clique is just as hard as SAT.


3. Set Packing

  • The Problem: Imagine you have a list of grocery bags, each containing a few items. For example:

    • Bag 1: {apple, banana}

    • Bag 2: {banana, orange}

    • Bag 3: {apple, pear}

    • Bag 4: {orange, lime}

      The goal is to pick, say, 3 bags that have no items in common. (Here, you could pick Bag 1, 3, and 4, but you couldn't pick Bag 1 and 2 because they both have a "banana").

  • Why It's Easy to Check: If someone hands you 3 bags, you can just compare them, pair by pair, and check if any two bags share an item. If there are no overlaps, the solution is good. This is fast.

  • How to Prove It's Hard (The Proof):

    We can show that the Clique problem (which we now know is hard) can be translated into a Set Packing problem.

    1. Take any Clique problem (a graph of dots and lines) where you're looking for a clique of, say, 10 people.

    2. We're going to build a Set Packing problem from this graph.

    3. Create a "bag" (a set) for every dot (person) in the graph.

    4. What do we put in the bags? We create an item for every pair of dots that are NOT connected.

    5. Let's say in our graph, dot $A$ is not connected to dot $B$. We invent an item called "non-edge $AB$". We put this item into Bag $A$ and also into Bag $B$.

    6. We do this for every single missing line in the entire graph.

    7. Now, we ask: "Can you pick 10 bags from this new setup that have no items in common?"

    8. Why this works: What does it mean if we pick two bags, Bag $A$ and Bag $B$, that have no items in common? It means they didn't share any "non-edge" items. This could only happen if the "non-edge $AB$" item was never created. And that item would only be missing if $A$ and $B$ were connected by a line in the original graph.

    9. Therefore, finding 10 bags with no overlaps is the exact same thing as finding 10 dots (people) in the original graph where every single pair is connected. This is the definition of a clique!

    10. Since solving the Set Packing problem is the same as solving the Clique problem, it must also be just as hard.


(PROOF in LEAN)

Providing a complete, compilable LEAN 4 proof for even one of these NP-complete problems would be a significant research project, as it requires a massive formal library for computational complexity (defining Turing machines, polynomial time, etc.) that is not yet fully mature in mathlib.

However, I can provide the rigorous structure of these proofs in LEAN. This will show you how a formal proof is constructed, what needs to be defined, and what theorems (axioms) must be established.

We will imagine a hypothetical library Computability.NP that provides the foundational definitions. All our proofs will rest on one central, universally accepted axiom: 3-SAT is NP-complete.


1. Foundational Axioms (The Computability.NP Library)

First, we must import our hypothetical framework and real mathlib tools.

Lean
-- We assume a (mostly hypothetical) library for complexity theory
import Computability.NP
-- We use real mathlib tools for graphs, finite sets, and matrices
import Mathlib.Combinatorics.SimpleGraph
import Mathlib.Data.Fintype
import Mathlib.Data.Matrix.Basic
import Mathlib.Data.Finset

open Finset SimpleGraph

-- This (hypothetical) library would define:

-- A type for decision problems (languages) over an input type α
class Language (α : Type) where
  P : α → Prop

-- A definition for a problem being in NP
def NP {α} (L : Language α) : Prop := sorry -- (Exists a poly-time verifier)

-- A definition for a problem being NP-hard
def NPHard {α} (L : Language α) : Prop := sorry -- (All NP problems poly-time reduce to L)

-- A problem is NP-complete if it's in NP and is NP-hard
def NPComplete {α} (L : Language α) : Prop :=
  NP L ∧ NPHard L

-- A type for Boolean formulas in 3-Conjunctive Normal Form (3-CNF)
structure ThreeSAT.Formula (v : Type) :=
  clauses : Finset (Fin 3 → Option Bool × v) -- Each clause has 3 literals

-- The 3-SAT decision problem
def ThreeSAT_problem : Language (ThreeSAT.Formula ℕ) :=
  { P := fun φ => ∃ (assign : ℕ → Bool), sorry -- ... formula φ is satisfied by assign
  }

-- The foundational axiom for all subsequent proofs:
axiom ThreeSAT_is_NPComplete : NPComplete ThreeSAT_problem

2. 0-1 Integer Programming

This problem asks if a system of linear inequalities $A\vec{x} \le \vec{b}$ has a solution where $\vec{x}$ contains only 0s and 1s.

Step 1: Define the problem in LEAN

Lean
-- An instance of 0-1 ILP is a matrix A and a vector b over integers
structure ZeroOneILP.Instance :=
  (m n : ℕ)
  (A : Matrix (Fin m) (Fin n) ℤ)
  (b : Fin m → ℤ)

-- The decision problem for 0-1 ILP
def ZeroOneILP_problem : Language ZeroOneILP.Instance :=
  { P := fun inst =>
      -- There exists a vector x of 0s and 1s
      ∃ (x : Fin inst.n → Fin 2),
        -- Such that A * x <= b
        ∀ (i : Fin inst.m),
          (Matrix.dotProduct (inst.A i) (fun j => (x j : ℤ))) ≤ inst.b i
  }

Step 2: State and Prove the Theorem

The proof works by reducing 3-SAT to 0-1 ILP.

Lean
-- Theorem: 0-1 Integer Programming is NP-complete
theorem ZeroOneILP_is_NPComplete : NPComplete ZeroOneILP_problem :=
by
  constructor
  · -- Part 1: Prove ZeroOneILP is in NP
    -- We must show that a given solution x can be verified in polynomial time.
    -- This involves a polynomial-time matrix-vector multiplication and comparison.
    apply (NP_verifier_poly_time :
      NP ZeroOneILP_problem)
    sorry -- This proof is complex, involving formalizing algorithm runtime.
  · -- Part 2: Prove ZeroOneILP is NP-hard
    -- We show this by reducing 3-SAT to it.
    apply (NPHard_of_reduction :
      -- We assume 3-SAT is NP-hard (from our axiom)
      NPHard ThreeSAT_problem →
      -- We provide a polynomial-time reduction from 3-SAT to 0-1 ILP
      (∃ (f : ThreeSAT.Formula ℕ → ZeroOneILP.Instance),
        (PolyTimeReduction f) ∧
        -- And prove the reduction is correct
        (∀ (φ : ThreeSAT.Formula ℕ),
          ThreeSAT_problem.P φ ↔ ZeroOneILP_problem.P (f φ))) →
      NPHard ZeroOneILP_problem)

    · -- Supply the NP-hardness of 3-SAT
      exact ThreeSAT_is_NPComplete.right

    · -- Supply the reduction and its proofs
      -- We must define the reduction function
      use (fun φ =>
        -- This function builds the matrix A and vector b from the 3-SAT formula φ
        -- 1. For each variable x, create a 0-1 variable in the ILP.
        -- 2. For each clause (l1 ∨ l2 ∨ l3), create an inequality.
        --    e.g., (x ∨ y ∨ ¬z) becomes x + y + (1-z) ≥ 1
        --    In our A*x <= b form, this is: -x - y + z ≤ 0
        (construct_ILP_from_3SAT φ))
      constructor
      · -- Proof that `construct_ILP_from_3SAT` runs in polynomial time
        sorry
      · -- Proof that the reduction is correct
        -- (φ is satisfiable ↔ the constructed ILP instance has a 0-1 solution)
        sorry

3. Clique

This problem asks if a graph $G$ contains a "clique" (a fully connected subgraph) of size $k$.

Step 1: Define the problem in LEAN

Lean
-- A graph G is defined over a vertex type V. `SimpleGraph V` is from mathlib.
-- An instance of Clique is a graph and a natural number k.
structure Clique.Instance (V : Type) [Fintype V] :=
  (G : SimpleGraph V)
  (k : ℕ)

-- The decision problem for Clique
def Clique_problem (V : Type) [Fintype V] : Language (Clique.Instance V) :=
  { P := fun inst =>
      -- There exists a subset of vertices 'S'
      ∃ (S : Finset V),
        -- Such that the size of S is k
        S.card = inst.k ∧
        -- And S is a clique (all pairs in S are adjacent)
        ∀ (u v : V), u ∈ S → v ∈ S → u ≠ v → inst.G.Adj u v
  }

Step 2: State and Prove the Theorem

This proof also works by reducing 3-SAT to Clique.

Lean
-- Theorem: Clique is NP-complete
theorem Clique_is_NPComplete (V : Type) [Fintype V] : NPComplete (Clique_problem V) :=
by
  constructor
  · -- Part 1: Prove Clique is in NP
    -- We must show that a given subset S can be verified as a k-clique
    -- in polynomial time. This involves checking |S| = k and checking all
    -- pairs of vertices in S for adjacency.
    apply (NP_verifier_poly_time :
      NP (Clique_problem V))
    sorry -- This proof involves reasoning about O(k^2) checks.
  · -- Part 2: Prove Clique is NP-hard
    -- We show this by reducing 3-SAT to it.
    apply (NPHard_of_reduction :
      NPHard ThreeSAT_problem →
      (∃ (f : ThreeSAT.Formula ℕ → Clique.Instance V),
        (PolyTimeReduction f) ∧
        (∀ (φ : ThreeSAT.Formula ℕ),
          ThreeSAT_problem.P φ ↔ (Clique_problem V).P (f φ))) →
      NPHard (Clique_problem V))

    · -- Supply the NP-hardness of 3-SAT
      exact ThreeSAT_is_NPComplete.right

    · -- Supply the reduction and its proofs
      -- We must define the reduction function
      use (fun φ =>
        -- This function builds the graph G and k from the 3-SAT formula φ
        -- 1. Create one vertex for each literal in φ.
        -- 2. k = number of clauses in φ.
        -- 3. Add an edge between two vertices (literals) IF:
        --    a) They are in different clauses
        --    b) They are not opposites (e.g., not x and ¬x)
        (construct_Graph_from_3SAT φ))
      constructor
      · -- Proof that `construct_Graph_from_3SAT` runs in polynomial time
        sorry
      · -- Proof that the reduction is correct
        -- (φ is satisfiable ↔ the graph G has a k-clique)
        -- A k-clique in G corresponds to picking one literal from each
        -- clause that are all mutually compatible, which is a
        -- satisfying assignment.
        sorry

4. Set Packing

This problem asks if, given a collection of sets $S$, we can find $k$ sets from $S$ that are all mutually disjoint (no two share an element).

Step 1: Define the problem in LEAN

Lean
-- A Set Packing instance is a collection of Finsets (our 'sets')
-- and a number k.
structure SetPacking.Instance (U : Type) [Fintype U] :=
  (collection : Finset (Finset U)) -- A collection of sets
  (k : ℕ)

-- The decision problem for Set Packing
def SetPacking_problem (U : Type) [Fintype U] : Language (SetPacking.Instance U) :=
  { P := fun inst =>
      -- There exists a sub-collection 'S_pack'
      ∃ (S_pack : Finset (Finset U)),
        S_pack ⊆ inst.collection ∧
        S_pack.card = inst.k ∧
        -- And all sets in S_pack are pairwise disjoint
        ∀ (s1 ∈ S_pack), ∀ (s2 ∈ S_pack), s1 ≠ s2 →
          Disjoint s1 s2
  }

Step 2: State and Prove the Theorem

This proof works by reducing Clique to Set Packing. Since we just "proved" Clique is NP-hard, we can use it as our new base.

Lean
-- Theorem: Set Packing is NP-complete
theorem SetPacking_is_NPComplete (U V : Type) [Fintype U] [Fintype V] :
  NPComplete (SetPacking_problem U) :=
by
  constructor
  · -- Part 1: Prove Set Packing is in NP
    -- We must show that a given sub-collection S_pack can be verified
    -- in polynomial time. This involves checking its size and checking all
    -- O(k^2) pairs for disjointness.
    apply (NP_verifier_poly_time :
      NP (SetPacking_problem U))
    sorry -- Formalizing O(k^2) checks.
  · -- Part 2: Prove Set Packing is NP-hard
    -- We reduce Clique to Set Packing
    apply (NPHard_of_reduction :
      NPHard (Clique_problem V) → -- We use Clique's NP-hardness
      (∃ (f : Clique.Instance V → SetPacking.Instance U),
        (PolyTimeReduction f) ∧
        (∀ (cl : Clique.Instance V),
          (Clique_problem V).P cl ↔ (SetPacking_problem U).P (f cl))) →
      NPHard (SetPacking_problem U))

    · -- Supply the NP-hardness of Clique
      -- This comes from the `right` (NPHard) part of our previous theorem
      exact (Clique_is_NPComplete V).right

    · -- Supply the reduction and its proofs
      -- We must define the reduction function
      use (fun cl_inst =>
        -- This function builds a Set Packing instance from a Clique instance (G, k)
        -- 1. The new 'universe' U will be the set of *edges* in G.
        -- 2. For each *vertex* v in G, create a set containing all edges
        --    that are *incident* to v.
        -- 3. The new k is the same as the old k.
        (construct_SetPacking_from_Clique cl_inst))
      constructor
      · -- Proof that `construct_SetPacking_from_Clique` runs in poly time
        sorry
      · -- Proof that the reduction is correct
        -- (G has a k-clique ↔ The collection has a k-packing)
        -- A k-clique is k vertices where every pair is connected.
        -- The corresponding k sets will be disjoint *if and only if*
        -- no two vertices share an edge.
        -- Wait... this reduction is for *Independent Set*, not Clique.
        -- Let's use the reduction from your previous request:
        --
        -- **Correct Reduction (Clique to Set Packing):**
        -- 1. The universe U is the set of *non-edges* in G.
        -- 2. For each *vertex* v, create a set containing all *non-edges*
        --    incident to v.
        -- 3. The new k is the same k.
        --
        -- **Proof of correctness:**
        -- A k-clique is k vertices with *no* "non-edges" between them.
        -- A k-packing is k sets (vertices) that share no common elements.
        -- The elements are "non-edges".
        -- So, a k-packing is k vertices that share no "non-edges".
        -- This is the definition of a k-clique.
        sorry