Phase 1 · the bench · 10 stations · S1 + C1

Foundations of intelligence.

Phase 1 teaches what the machine fundamentally is, before any maths, hardware, or architecture context lands. You sit at the bench. You think before you build.

Lessons: L1–L10 + S1 + C1 Time: ~4 weeks Builds: B1 tokenizer explorer (numpy) Core laws established here: representation (L7), optimisation (L5), constraints (L1, L10)
The transformation

What changes between entering the bench and leaving it.

The reader arrives at the bench with whatever picture of AI public discourse has installed: a single mysterious thing, sometimes magical, sometimes dismissed. Phase 1 replaces that picture with a working noun.

An intelligence system is a thing with inputs, internal state, outputs, and a learning signal. It runs as optimisation against an objective. Its capability is shaped by what signal it was trained on, what representation it built, and what constraints it lives under. None of those four facts requires mathematics yet. They require the right vocabulary and the right mental model.

By the end of Phase 1, the reader can describe any AI system as: this input → this representation → this objective → this output, trained with this signal under these constraints. The maths, the hardware, the architecture, and the training stack all get layered onto that skeleton in the phases that follow.

Phase 1 in one line

Phase 1 teaches what the machine fundamentally is. Optimisation against an objective, shaped by signal and constraints, surfacing as representation. Mechanism first.

The systems loop

The shape that recurs through every later phase.

The diagram below is the seed diagram of the course. Phase 4's transformer block is the same loop instantiated with attention and feed-forward layers. Phase 5's training loop is the same loop with feedback to the parameters. The progressive diagram evolution starts here.

fig 1 · the intelligence system loop (Phase 1 seed) input · representation · internal state · output · learning signal · constraints input tokens · pixels · audio sensor stream representation embedding · feature L7 · L8 · L9 internal state parameters · memory L1 · L7 output prediction · action L2 · L6 target · reward supervised signal L5 · L6 learning signal · gradient · L5 · L6 data constraints distribution · quantity · quality · L3 · L10 compute constraints memory · bandwidth · power · latency · referenced through every later phase capability emerges from this loop running at scale · L4 forward learning signal constraint
Fig 1 · The Phase 1 systems loop. Forward arrows (amber) carry input through representation, internal state, and output. The learning signal (green, dashed) closes the loop by feeding back from target or reward into the internal state. Constraints (red, dotted) bound every block. This shape recurs at higher fidelity in the transformer block (L43), the training loop (L49), and the deployment stack (L58–L67).
The 10 stations

The bench, left to right.

Each station is a physical object on the bench, anchored to one concept. The route is the spine of Phase 1. Walking the bench is the consolidation step that turns the lessons into structural memory.

Phase 1 themes

What Phase 1 reinforces and what it refuses.

Four themes thread the bench. Each one cuts against a specific tendency in how AI is talked about elsewhere.

Theme · 1

Mechanism over mysticism

"The model just understands" is not an explanation. Phase 1 names objective, representation, signal, and constraint instead. Where a behaviour is currently unexplained, the lesson says so and references current interpretability work.

Theme · 2

Optimisation over magic

Capability is a function of what was optimised against, not a property the system "wants" to have. L5 makes the paradigms explicit; L6 makes the temporal credit assignment problem of RL explicit; the whole phase resists language that hides the optimisation.

Theme · 3

Representation over anthropomorphism

The system operates on its representation of the world, not on the world. Choice of representation often matters more than the algorithm running on top. L7 lands this as a core law (representation shapes computation) that recurs through P3, P4, and P6.

Theme · 4

Constraints over hype

The capability perimeter (L10) is the operational consequence of constraints: data, compute, signal, deployment. The honest perimeter is what separates engineering from press release.

The capability perimeter (L10)

Three honest categories.

By the end of Phase 1, the reader can sort claimed AI capabilities into the three categories below, with the mechanism that puts each one there. This sorting is the operational habit Phase 1 installs.

stable capability

The system is reliably useful at this. The mechanism is well-understood; the failure modes are bounded. Example: text classification on in-distribution data.

brittle trick

The system can do this on benchmarks. On real inputs slightly outside the training distribution, it falls over. The brittleness traces back to a specific representation or signal limit.

confidently wrong

The system produces output that looks like a capability but isn't. The output is fluent; the underlying claim is false. The mechanism is overconfidence in low-evidence regions of the input space.

Core laws established in Phase 1

What lands here · what recurs later

  • Representation shapes computation. Established at L7. Recurs through Phase 3 (the chip is designed for matmul on representations), Phase 4 (architectures encode different representation choices), Phase 6 (embeddings and retrieval live or die on representation quality).
  • Optimisation shapes capability. Threaded across L5 and L6, and revisited every time Phase 5 runs an objective on a model. The system gets what its objective rewards.
  • Constraints shape systems. The compute-spectrum lens lands lightly at L1 (intelligence systems exist across tiers) and at L10 (the perimeter is shaped by what hardware you have). This law turns into the whole of Phase 3.
  • Geometry enables generalisation. Teased at L9 with embeddings as direction. The full geometric treatment lives in Phase 2 (L11–L12–L13) and is then used as scaffolding through the rest of the course.
Bridge to Phase 2

From vocabulary to apparatus.

Phase 1 leaves the reader with a working mental model of intelligence-as-optimisation. The model is durable but unquantified. You can talk about representation; you can't yet say what a representation looks like as a 768-dimensional vector. You can name optimisation; you can't yet describe what a gradient is or which direction to step.

Phase 2 is the apparatus. The whiteboard wall sketches the maths the rest of the course depends on: vectors, matrices, gradients, probability, entropy, parallelism, and scaling intuition. None of it is heavier than it has to be. Each piece earns its place because it shows up later.

The S1 synthesis runs the bench in one breath and names the bridge explicitly. The C1 calibration gates the move. If C1 doesn't stick, you walk the bench again before crossing the workshop to the wall.