From Busy Beaver to Modern CPUs: How a 1936 Abstraction Became Real Silicon

Published on May 12, 2026 by Remy

Computer science has two famous Turing-related stories. One is the 1936 paper that defined computability. The other, less told, is how that abstract machine became the warm rectangle in your laptop. Between them sits a small but striking detour: the Busy Beaver function, which makes uncomputability tangible.

This article walks the full arc — from the most “impractical” object in computability theory to the very practical thing humming under your keyboard.

Part 1 — The Busy Beaver: Uncomputability You Can Touch

Where it came from

In 1962, Hungarian mathematician Tibor Radó, then at Ohio State, published “On non-computable functions”. He had a pedagogical problem: every undecidability result before him — the halting problem, Rice’s theorem — was proved by diagonalization, an argument students could follow but not feel. Radó wanted a concrete, finite, intuitive function that was nonetheless not computable.

His move was almost playful. Consider every Turing machine that has n states and uses 2 tape symbols (0 and 1) and that eventually halts on a blank tape. There are only finitely many such machines (up to renaming states), so two quantities are well-defined:

Σ(n) — the maximum number of 1s any such machine leaves on the tape when it halts.
S(n) — the maximum number of steps any such machine takes before halting.

Picture a tiny beaver scurrying across the tape, “busily” punching as many marks as possible before stopping. Hence the name.

Why it isn’t computable

Suppose, for contradiction, that Σ were a computable total function. Then given any Turing machine M with n states, we could compute Σ(n), simulate M for Σ(n) + 1 steps, and decide whether it had halted. That would solve the halting problem, which we know is undecidable. Therefore Σ must outrun every computable function.

In fact Σ grows so fast that it eventually exceeds any function you can write down with a program. It is the textbook example of “non-computable but well-defined.”

The numbers that we know

Pinning down small values is brutally hard — each new n multiplies the search space by orders of magnitude:

n	Σ(n)	Year settled
1	1	1962
2	4	1962
3	6	1965
4	13	1983
5	4098	2024 (formal proof in Coq, bbchallenge)
6	≥ 10↑↑15	open

The jump from Σ(5) to Σ(6) is not gradual — it’s a cliff into Knuth’s up-arrows. Σ(6) is conjectured to be larger than any number with a reasonable physical interpretation in our universe.

The lesson Radó delivered

You don’t need exotic constructions to escape computability. A finite-state machine with a tape — the very object Turing invented — already hides functions no algorithm can pin down. The limits of computation aren’t out at infinity. They show up at n = 5.

Hold this in mind: a Turing machine is both the simplest non-trivial computer and a vessel for the deepest impossibility results in the field. Now let’s ask how that same object became your CPU.

Part 2 — A Common Misconception: “How Many States Is a Modern Chip?”

A natural question once you know Busy Beaver: if we measure Turing machines by state count, how many states is, say, an M3 or a Ryzen? It sounds answerable. It isn’t — and the reasons are illuminating.

Mistake 1: “States” is two-dimensional

Busy Beaver fixes the alphabet at 2 symbols and varies n, the state count. But a Turing machine has both states n and symbols k. There’s a separate “two-symbol Busy Beaver,” “three-symbol Busy Beaver,” and so on, with their own tables. The mapping between (n, k) combinations is non-trivial; “state count” alone is not a universal complexity yardstick.

Mistake 2: A CPU isn’t a Turing machine

A real CPU has finite memory. With B bits of state across registers, caches, and RAM, the entire machine has at most 2^B possible configurations. It is technically a colossal finite-state machine, or — if you include the disk and treat RAM as bounded — a linear bounded automaton (LBA), not a Turing machine. The infinite tape is precisely what’s missing.

For most purposes we pretend a modern computer is Turing-complete, because the bound is so far beyond practice that ignoring it costs nothing. But theoretically: no, your laptop is not a Turing machine.

Mistake 3: Bits of state ≠ controller states

Even if you accept the finite-state framing, the “states” of Busy Beaver correspond to the control unit of the machine — a tiny switch-case selecting the next move. The vast 2^B configuration space of a real chip lives mostly on the tape side (registers and memory), not the control side. Conflating them is like measuring a book’s complexity by counting every possible arrangement of its ink.

What’s the good question?

If you actually want a number, pick the dimension you care about:

Controller complexity? → Microcode ROMs on modern x86 hold ~10⁴–10⁶ micro-operations. That’s a useful “controller state” estimate.
Transistor count? → ~10¹¹ on a flagship die.
Simulation cost? → How many Turing-machine steps to simulate one second of CPU? Roughly 10¹⁰–10¹² steps, depending on workload. This is the most theoretically meaningful version.

Each answer measures something real. “How many states is my CPU?” sounds crisp but collapses on inspection. The mapping from theory to hardware is structural, not numerical.

Part 3 — How Turing Machine Ideas Actually Live in a CPU

Now the core question. The Turing machine has four primitives. Each one has a clean correspondence in modern hardware — and a clean deviation that you should understand.

The four-element mapping

Turing machine	Modern computer
Infinite tape	RAM + disk + network storage
Read/write head position	Memory address (PC, stack pointer, MMU translations)
Finite-state controller	CPU control unit + microcode ROM
Transition function δ	Instruction set architecture (ISA) + decoder

This table is the skeleton. The interesting parts are where engineering diverges from the abstraction.

The key conceptual leap: the Universal Turing Machine

Turing’s 1936 paper has two big ideas, but the famous one is actually the second: the Universal Turing Machine (UTM). A UTM is a Turing machine that takes, on its tape, a description of another Turing machine — plus that machine’s input — and simulates it.

This is the moment “machine” becomes “data.” A description of a computation is itself a string of symbols, indistinguishable in kind from the data the computation processes. It’s the conceptual seed of the stored-program computer.

Von Neumann’s 1945 First Draft of a Report on the EDVAC turned this into engineering: store the program and the data in the same memory. That single design choice — heretical at the time, when computers like ENIAC were “programmed” by literally rewiring them — is what makes a modern computer a general-purpose device. Every CPU you’ve ever used is a thinly disguised UTM.

Tape — the infinite ribbon, faked beautifully

The Turing machine’s tape is infinite. A real computer’s memory is not. The engineering solution is the memory hierarchy:

Registers (~32 × 64-bit on x86-64) — sub-nanosecond access
L1 cache (~32 KB) — ~1 ns
L2 cache (~1 MB) — ~3 ns
L3 cache (~30 MB) — ~10 ns
DRAM (8–128 GB) — ~100 ns
NVMe SSD (1–8 TB) — ~100 µs
Network / cloud storage — milliseconds to seconds, effectively boundless

Each layer is roughly 10× bigger and 10× slower than the one above. The illusion of “infinite tape” is maintained by the fact that useful programs hit smaller working sets that fit higher in the hierarchy, and the system pages in larger backing stores on demand.

Head — random access, not sliding

A Turing machine’s head moves one cell at a time. Reaching cell N takes N steps. A CPU sends an address down the bus and the memory subsystem returns the byte in (roughly) constant time.

This is a speed improvement, not a power improvement. A Turing machine that moves cell-by-cell can simulate random access — it just takes longer. The two models are computationally equivalent; one is exponentially friendlier to program against. The “address” — what we casually call a pointer — is just a binary encoding of “where the head should be.”

State — split into macro and micro

The Turing machine’s “state” is a single label like q_7. A CPU has, in effect, two layers of state:

Macro state — the program counter and the contents of registers. This is what the programmer thinks about. It’s the “logical” position of the head and the controller’s current rule.
Micro state — pipeline stage, branch predictor entries, cache line MESI status, reorder buffer contents, speculative execution frontier. These don’t exist in the abstract Turing machine. They’re engineering machinery introduced to extract more useful work per second.

Importantly: micro state is invisible to correct programs. The whole architectural discipline of “as-if serial semantics” means that, however wild the speculation and reordering, the result you observe is as if the CPU executed your instructions one at a time. The micro state is a performance optimization that preserves the macro abstraction.

(Spectre and Meltdown were exactly the failure of this invisibility — micro state leaked into observable channels.)

Transition function δ — the instruction cycle

A Turing machine’s δ says: given current state and current symbol, write a symbol, move the head, switch state. A CPU does the same thing, just at a much coarser grain:

Fetch — read the instruction at PC from memory
Decode — translate the bit pattern into a control signal
Execute — perform the ALU / load / store / branch operation
Writeback — update registers, possibly update PC

Each x86 or ARM instruction is essentially a macro for a few dozen Turing-machine-style transitions. An x86 MUL doesn’t fit in one δ — it expands to a microprogram of many simpler δs, often implemented in microcode ROM. The instruction set is the API; the microcode is the implementation.

This is why we can say a CPU “is” a UTM in a meaningful sense: the architectural state machine (PC + registers) plays the role of the universal controller, and the program in memory plays the role of the simulated machine’s transition table.

Part 4 — What the Turing Machine Doesn’t Account For

A faithful map should mark its blind spots. Several features of a real computer have no analogue in the original Turing machine, and you should know what they buy you.

Parallelism

Modern chips have many cores; GPUs have thousands. A multi-tape Turing machine (with several heads on several tapes) has been proven equivalent in power to a single-tape one — anything n tapes can compute, 1 tape can too, with at most polynomial slowdown. So parallelism doesn’t add computability, only speed. Importantly: there are problems where multi-core gives near-linear speedup, and others where it doesn’t help at all. The theory predicted this distinction.

Pipelining and out-of-order execution

These are scheduling tricks for the transition function — applying multiple δ steps concurrently when the dependency graph allows it. They’re below the level of the abstraction; from the programmer’s view, semantics are unchanged.

Interrupts and I/O

The original Turing machine has no notion of “the outside world.” It runs in a closed loop until it halts. A real CPU receives interrupts — asynchronous external events that yank execution into a handler. This is a genuine extension of the model, and it’s why operating systems exist. Without interrupts, the CPU could only run one program to completion. With them, it can multiplex, respond to keyboards, handle network packets — in short, be useful.

Virtual memory

The MMU implements a virtual tape on top of the physical one. Each process sees its own private address space (its own “tape”), while the physical memory is shared. This is the OS pulling the UTM trick one level higher: just as a UTM simulates other machines, a kernel with virtual memory simulates separate computers for each process.

Caching and prefetching

These are predictions about the head’s future position. The Turing machine doesn’t predict; it just follows δ. A CPU guesses where the program is going and pre-loads, betting that locality of reference will pay off. It usually does — which is why “cache-friendly code” is a real, measurable concern that has no analogue in the abstract model.

Part 5 — A Useful Mental Model

If you remember nothing else, remember this picture:

Your CPU is a highly optimized Universal Turing Machine.

The program stored in memory is the description of another Turing machine.

The CPU hardware is the universal simulator running that description.

The OS is a UTM running on top of the CPU’s UTM, simulating one Turing machine per process.

Each container or VM is yet another UTM-on-UTM layer.

Every time you launch a piece of software, you are telling a stack of nested universal machines: “please simulate this other machine for me.” That’s literally what’s happening, all the way down to the silicon. The abstraction nests because Turing’s original insight — that machines can be encoded as data — composes cleanly.

Closing thought

The Turing machine is sometimes dismissed as “just a theoretical model.” It isn’t. It’s the organizational principle of every general-purpose computer ever built. The tape is your RAM. The head is your address bus. The states are your microcode. The transitions are your ISA. And the universality — the ability of the machine to read another machine’s description and become it — is the reason you can install software at all.

The 1936 paper described the architecture. The 1945 EDVAC report named the engineering form. The 90 years since have been an exercise in making it faster, smaller, and harder to break — without changing the underlying idea.

And every once in a while, when you want a reminder that this whole edifice has hard limits baked in, there’s a small beaver still scurrying across a tape somewhere, making the impossible visible.