Cook's Theorem : SAT(Boolean Satisfiability Problem) is NP-Complete

Stephen Cook is most famously associated with one monumental theorem that essentially created an entire field of computer science.

The Cook-Levin Theorem (Cook's Theorem)

This is, without a doubt, the most important theorem attributed to Stephen Cook.

  • What it states: The theorem proves that the Boolean Satisfiability Problem (SAT) is NP-complete.

  • Why it matters:

    1. It was the first: This was the very first problem ever proven to be NP-complete.

    2. It established a foundation: Cook's 1971 paper, "The Complexity of Theorem-Proving Procedures," formally defined the concept of NP-completeness and established the significance of the P vs. NP problem 

    3. It provides a "master" problem: The theorem shows that any problem in the class NP can be reduced (in polynomial time) to SAT. This means if someone were to find an efficient, polynomial-time algorithm for solving SAT, it would automatically provide an efficient algorithm for every single problem in NP, thus proving P = NP.


Other Major Contributions & Related Work

While the Cook-Levin theorem is his most famous named theorem, Cook's work includes many other foundational results and definitions in computational complexity and proof complexity.

  • Formalization of P vs. NP: His 1971 paper is credited with formally defining the class NP (Nondeterministic Polynomial time) and posing the P versus NP problem as a major open question in mathematics and computer science.

  • Proof Complexity: Cook is a leading figure in proof complexity, which studies the length of proofs in formal logical systems.

    • Cook-Reckhow Thesis: Along with Robert A. Reckhow, he defined "p-simulation" and characterized the relationship between proof systems and complexity classes. Their work suggests that NP = co-NP if and only if there exists a propositional proof system in which every tautology has a polynomial-length proof.

  • Parallel Computation: Cook helped define and explore the complexity class NC (Nick's Class). This class contains problems that can be solved very efficiently on a parallel computer (in polylogarithmic time using a polynomial number of processors).

  • Bounded Arithmetic: He has done extensive work on systems of "bounded arithmetic" (like $PV$), which are weak logical theories used to formally capture the concept of polynomial-time reasoning.

------
(PROOF)

This is a foundational proof in computer science, and its "rigor" comes from a very clever, step-by-step construction. Explaining it rigorously without any notation is a challenge, but the core logic can be described in plain English.

To prove the Boolean Satisfiability Problem (SAT) is NP-complete, we must prove two separate things:

  1. SAT is in the class NP.

  2. SAT is NP-hard.

Let's break down each part.

Part 1: Proving SAT is in NP

  • What this means: "NP" stands for "Nondeterministic Polynomial time." A simpler way to think about it is "efficiently verifiable." If someone hands you a problem and a potential solution, can you check if the solution is correct quickly (in polynomial time)?

  • The Proof:

    1. Imagine a SAT problem (a big logical formula) and a potential solution (a list of TRUE/FALSE assignments for every variable, also called a "certificate").

    2. To verify this solution, you simply plug the TRUE/FALSE values into the formula.

    3. You then evaluate the formula, step-by-step, just like a calculator. You check each small clause ("A or not B"), see if it's TRUE, and then check if all the clauses are TRUE.

    4. This checking process is very fast. The time it takes is directly proportional to the size of the formula. Because this verification is efficient and fast, SAT is, by definition, in the class NP.

This is the easy part of the proof.


Part 2: Proving SAT is NP-hard

  • What this means: This is the hard part and the core of Cook's Theorem. We must prove that every other problem in NP can be transformed (or "reduced") into an equivalent SAT problem quickly (in polynomial time).

  • If this is true, it means SAT is at least as hard as every other problem in NP. If you could solve SAT efficiently, you could solve everything in NP efficiently.

  • The Proof (The "Grand Construction"):

    1. The "Checking Machine": First, we must agree on a universal model for a computer that "checks" solutions. The formal model for this is a Turing machine, but we can just think of it as a basic, idealized computer. Let's call it our "Checker." Since any problem in NP is efficiently verifiable (from Part 1), we know that for any NP problem, there is a "Checker" program that can verify a "YES" answer in a quick, limited number of steps (a polynomial number).

    2. The Goal: Cook's brilliant idea was to prove that you can take any of these "Checker" programs (for any NP problem) and automatically convert it into one, massive SAT formula. This formula will be satisfiable if and only if the "Checker" program would have output "YES."

    3. The Construction: We build this giant formula by creating variables and rules that perfectly simulate the "Checker" machine's computation, step-by-step.

    4. Step A: Define the Variables. We create millions of Boolean (TRUE/FALSE) variables to describe the entire state of the "Checker" at every single step of its computation.

      • Tape Variables: For every tape square on the machine, and for every step in time, we create a variable. For example: Is_Tape_Square_5_Symbol_A_at_Step_12? (This would be TRUE or FALSE).

      • Head Variables: For every tape square and every step in time: Is_Machine_Head_at_Square_7_at_Step_3?

      • State Variables: For every machine state and every step in time: Is_Machine_in_State_Q_at_Step_8?

    5. Step B: Define the Rules (The Clauses). Now we add logical rules (the clauses of our SAT formula) to force these variables to behave like a real computation.

      • Start Rule: We add clauses that force the variables at "Step 0" to match the machine's starting conditions (e.g., the input problem is written on the tape).

      • Transition Rules: This is the biggest part. We add clauses that enforce the machine's logic. For every step t, these rules ensure that the machine's state at step t+1 is a valid next move from its state at step t. For example, a rule might say:

        "IF (at step 5, the head is at square 20) AND (the symbol at square 20 is a '1') AND (the machine is in 'State K')

        THEN

        (at step 6, the head must be at square 21) AND (the symbol at square 20 must now be a '0') AND (the machine must be in 'State L')."

        This entire logical statement is just a (complex) clause in our SAT formula.

      • Uniqueness Rules: We add rules to prevent impossible situations, like the machine's head being in two places at once, or a single tape square containing two different symbols at the same time.

      • Final "Accept" Rule: We add one last, simple rule: "At some point in time before the final step, the machine must enter the 'YES, the answer is correct' state."

    6. The Conclusion of the Proof: We have now built a giant SAT formula.

      • If a "YES" answer for the original problem exists: Then a valid computation history of the "Checker" machine also exists. This history (the exact sequence of head moves, state changes, and tape symbols) provides a set of TRUE/FALSE assignments for all our variables that makes our entire formula TRUE. Therefore, the formula is satisfiable.

      • If no "YES" answer exists: Then no possible computation of the "Checker" can ever reach the "YES" state. This means it is impossible to find a set of TRUE/FALSE assignments that satisfies all our rules at once. Therefore, the formula is unsatisfiable.

This process shows that any NP problem can be transformed into a SAT problem. Because this transformation is itself a systematic, fast (polynomial-time) process, it proves that SAT is NP-hard.


Final Summary

Because we proved (1) SAT is in NP (it's easy to check a given answer) and (2) SAT is NP-hard (every other NP problem can be converted into it), we have proven that SAT is NP-complete.


(LEAN code)



# The Cook-Levin theorem proves: SAT is NP-complete.
# This requires proving two distinct parts.

# -----------------------------------------------------------------
# PART 1: Prove that SAT is in NP (SAT ∈ NP)
# -----------------------------------------------------------------
# A problem is in NP if a proposed solution (a "certificate")
# can be verified in polynomial time.

def verify_SAT(formula: BooleanFormula, certificate: TruthAssignment) -> bool:
    """
    Verifies if a given truth assignment satisfies a boolean formula.
    
    Args:
        formula: The boolean formula to check (e.g., (x1 or ~x2) and (x2 or x3)).
        certificate: A map of variables to True/False values (e.g., {x1: True, x2: False, x3: True}).
    """
    
    # 1. Substitute all variables in the formula with their values
    #    from the certificate.
    #    Example: (True or ~False) and (False or True)
    substituted_formula = formula.substitute(certificate)
    
    # 2. Evaluate the resulting expression.
    #    Example: (True or True) and (False or True) -> (True) and (True) -> True
    result = substituted_formula.evaluate()
    
    # 3. Return the final boolean result.
    return result

# ---
# Proof Conclusion for Part 1:
# The `verify_SAT` function runs in polynomial time (relative to the size of the
# formula and certificate). Step 1 is a linear pass. Step 2 is also fast
# (proportional to the formula's length).
#
# Because a polynomial-time verifier exists, SAT is in NP.
# ---


# -----------------------------------------------------------------
# PART 2: Prove that SAT is NP-hard (SAT is NP-Hard)
# -----------------------------------------------------------------
# A problem is NP-hard if *every* problem in NP can be
# reduced to it in polynomial time.

def build_SAT_formula_from_NP_problem(problem_L, input_w) -> BooleanFormula:
    """
    This is a *conceptual* function that performs the Cook-Levin reduction.
    
    It takes *any* NP problem L (defined by its polynomial-time verifier,
    a Turing Machine M) and an input 'w', and builds a SAT formula 'phi'
    that is satisfiable *if and only if* M accepts 'w'.
    """
    
    # 1. Define the Machine 'M' and time 'p(n)'.
    #    By definition of NP, 'problem_L' has a verifier (Turing Machine 'M')
    #    that runs in polynomial time, let's call it p(n), where n is len(w).
    M = problem_L.verifier
    n = len(input_w)
    poly_time_steps = calculate_polynomial_bound(M, n) # e.g., n^2
    
    # 2. Generate all boolean variables.
    #    We create a variable for every possible state of the machine
    #    at every possible time step. This forms a "tableau" (a grid)
    #    of size p(n) x p(n).
    
    variables = {}
    for t in range(poly_time_steps):
        for i in range(poly_time_steps): # Tape position
            for s in M.symbols:
                # 'Is tape cell i symbol s at time t?'
                variables[f'tape_{t}_{i}_{s}'] = BooleanVar()
            
            for q in M.states:
                # 'Is machine in state q at time t?'
                variables[f'state_{t}_{q}'] = BooleanVar()
            
            # 'Is head at position i at time t?'
            variables[f'head_{t}_{i}'] = BooleanVar()

    # 3. Build the four sets of rules (clauses).
    
    # Rule 1: phi_start
    # The machine starts in the correct initial configuration at time 0.
    phi_start = create_start_rules(M, input_w, variables)
    # e.g., (head_0_0 == True) AND (state_0_q_start == True) AND ...
    #       (tape_0_0_w[0] == True) AND (tape_0_1_w[1] == True) ...

    # Rule 2: phi_cell
    # The tableau is well-formed. (e.g., "a cell can't be 'A' and 'B'
    # at the same time," "the head can't be in two places at once").
    phi_cell = create_uniqueness_rules(variables, poly_time_steps)
    # e.g., (tape_t_i_A OR tape_t_i_B OR ...) AND (~tape_t_i_A OR ~tape_t_i_B) ...

    # Rule 3: phi_move
    # The machine's transitions are valid. For every time 't',
    # the state at time 't+1' must correctly follow from time 't'
    # according to M's rules. This is the most complex part.
    phi_move = create_transition_rules(M, variables, poly_time_steps)
    # e.g., FOR t from 0 to p(n)-1:
    #   IF (head at i, state is q, symbol is s) at time t
    #   THEN (must be valid next move) at time t+1
    
    # Rule 4: phi_accept
    # The machine must enter the "accept" state at some point in time.
    phi_accept = create_accept_rules(M, variables, poly_time_steps)
    # e.g., (state_0_q_accept == True) OR (state_1_q_accept == True) OR ...
    
    # 4. Combine all rules into the final formula.
    phi = AND(phi_start, phi_cell, phi_move, phi_accept)
    
    return phi

# ---
# Proof Conclusion for Part 2:
# The function `build_SAT_formula_from_NP_problem` itself runs in
# polynomial time. The number of variables is polynomial (p(n) * p(n) * ...).
# The number of clauses is also polynomial (it's generated by loops
# running in polynomial time).
#
# Because this polynomial-time reduction exists, SAT is NP-hard.
# ---


# -----------------------------------------------------------------
# FINAL THEOREM: SAT is NP-Complete
# -----------------------------------------------------------------

def prove_Cook_Levin_Theorem():
    
    # Part 1: SAT is in NP.
    # We proved this with the 'verify_SAT' function.
    SAT_is_in_NP = True 
    
    # Part 2: SAT is NP-hard.
    # We proved this by showing the 'build_SAT_formula_from_NP_problem'
    # reduction exists and runs in polynomial time.
    SAT_is_NP_Hard = True
    
    # By definition, a problem that is both in NP and NP-hard
    # is NP-complete.
    is_NP_Complete = SAT_is_in_NP and SAT_is_NP_Hard
    
    return is_NP_Complete

# Q.E.D.

(AXIOMS in LEAN)


-- Import libraries for complexity theory.
-- (We assume a mature library like Mathlib)
import Mathlib.Computability.TuringMachine
import Mathlib.Complexity.ComplexityClass
import Mathlib.Complexity.Reduction

-- Use the standard namespaces for complexity
open Complexity
open Computability

/-!
### Section 1: Defining the SAT Problem

First, we must define the *syntax* of a Boolean formula.
We do this as an inductive type, where 'var n' represents
a variable like x₁, x₂, etc.
-/
inductive PropForm (ν : Type) : Type
  | var   : ν → PropForm ν           -- A variable (e.g., x_n)
  | neg   : PropForm ν → PropForm ν  -- Negation (¬P)
  | conj  : PropForm ν → PropForm ν → PropForm ν -- Conjunction (P ∧ Q)
  | disj  : PropForm ν → PropForm ν → PropForm ν -- Disjunction (P ∨ Q)

-- An "assignment" is a function that maps variables to True/False.
abbrev Assignment (ν : Type) := ν → Bool

-- A function to evaluate a formula given an assignment.
def eval (a : Assignment ν) : PropForm ν → Bool
  | .var n    => a n
  | .neg p    => not (eval a p)
  | .conj p q => (eval a p) && (eval a q)
  | .disj p q => (eval a p) || (eval a q)

/-!
The SAT problem is the set of all formulas (over natural number
variables) for which *there exists* an assignment that
makes the formula evaluate to True.
-/
def SAT : Problem (PropForm Nat) :=
  { phi | exists (a : Assignment Nat), eval a phi = true }


/-!
### Section 2: Defining NP-Completeness

We rely on definitions from the complexity library.
(These are simplified for clarity).

`IsNP (L : Problem)` := The language L is in NP.
`IsNPComplete (L : Problem)` := L is NP-complete.

The definition of NP-complete is:
1. The problem is in NP.
2. The problem is NP-hard.

`IsNPComplete (L)` is defined as:
`IsNP L ∧ (∀ (L' : Problem), IsNP L' → L' ≤ₚ L)`

Where `L' ≤ₚ L` means "L' polynomially reduces to L".
-/


/-!
### Section 3: The Cook-Levin Theorem Statement

Here is the formal statement of the theorem. Proving this
is the hard part, so we use `sorry`. In Lean, `sorry` is a
placeholder for a proof that has not yet been completed.
-/

theorem cook_levin : IsNPComplete SAT := by
  -- 1. Prove that SAT is in NP.
  have sat_is_in_np : IsNP SAT := by
    -- We must build a polynomial-time verifier.
    -- The verifier takes a formula `phi` and a certificate `a` (an assignment).
    -- It then runs `eval a phi` and checks if the result is `true`.
    -- This `eval` function is computable in polynomial time.
    sorry -- This part is non-trivial but much easier than Part 2.

  -- 2. Prove that SAT is NP-hard.
  have sat_is_np_hard : ∀ (L' : Problem), IsNP L' → L' ≤ₚ SAT := by
    -- This is the core of the proof.
    -- We must take *any* NP problem L'...
    intro L' h_L_is_np
    
    -- ...find its non-deterministic Turing machine (NTM) verifier M...
    -- ...and construct a polynomial-time reduction `f`.
    -- This function `f` builds the giant "tableau" formula `phi_M`.
    
    -- `f(w)` must be satisfiable IFF M accepts `w`.
    sorry -- This is the massive construction we discussed.
  
  -- 3. Combine the two parts.
  -- By definition, NP-complete means (in NP) AND (NP-hard).
  exact And.intro sat_is_in_np sat_is_np_hard