# Phase dependencies and cross-phase reinforcement (v2.1)

The course is layered, not strictly linear. This document maps what each phase assumes about prior phases, and how concepts get re-applied later so they harden into long-term memory.

v2.1 updates: split Phase 7 into 7A (research literacy) and 7B (frontier). Added RL, discriminative architectures, emergence, research literacy as first-class dependencies. Lesson numbers updated to match v2.1 syllabus.

---

## The high-level dependency graph

```
                  P1 Foundations of intelligence
                            |
                            v
                P2 Mathematical & computational intuition
                            |
            ----------------+----------------
            |               |                |
            v               v                v
        P3 Hardware    P4 Architectures  (P2 keeps feeding into both)
            |               |
            +-------+-------+
                    |
                    v
              P5 Training & scaling
                    |
                    v
            P6 Engineering & deployment
                    |
                    v
            P7A Research literacy  (must precede P7B)
                    |
                    v
              P7B Frontier intelligence
```

Read it as: **P1 and P2 are unconditional foundations. P3 (hardware) and P4 (architectures) lean on P2, and lean on each other in both directions. P5 needs everything before it. P6 builds on P5. P7A teaches you how to read research; P7B uses that skill on the frontier.**

**v3 additions to the graph.**

- **Lesson 0** (Orientation) has no prerequisites and feeds into L1 plus the entire course operationally. It introduces the palace, retrieval practice, glossary, synthesis lessons, calibration assessments, build track, and compute-spectrum lens. Read once at the start.
- **Synthesis lessons (S1-S7B)** depend on every concept lesson in their phase. They do not feed forward into any specific later lesson; they feed into the next phase's first lesson as a transition (S2 prepares L22, S3 prepares L34, etc.).
- **Calibration assessments (C1-C7B)** depend on the corresponding synthesis lesson plus all phase content. P3+ calibrations may pull questions across earlier phases. A calibration is a gate: failure means revisit named lessons before starting the next phase.
- **Compute-spectrum lens** does not add new graph edges; it adds a recurring annotation to existing lessons where the lens earns its place.

---

## What each phase assumes

**Phase 1 (Foundations of intelligence)** assumes nothing. This is the entry point. A reader with no prior AI exposure can start here.

**Phase 2 (Maths and computational intuition)** assumes phase 1 vocabulary (pattern, prediction, learning paradigms). No prior maths beyond high-school algebra and geometry.

**Phase 3 (Hardware)** assumes phase 2 parallelism and compute-scaling intuition (lessons 20, 21). The reader should already think of compute as something you parallelise, not just a number to multiply.

**Phase 4 (Neural architectures)** assumes phase 2 (vectors, matrices, gradients, dot products, distributions, entropy) and phase 3 (hardware constraints: matmul, VRAM, bandwidth). The architecture lessons keep referencing "this maps onto tensor cores" or "this hits the memory wall here." L47 (discriminative architectures) closes out the phase and sets up L61 in P6.

**Phase 5 (Training and scaling)** assumes all of phases 1-4. You can't talk about pretraining without already knowing what a transformer is (P4), what an optimiser does (P2), how data parallelism shards a model (P3), and what generalisation means (P1). RLHF (L53) explicitly depends on L6 (RL fundamentals from P1).

**Phase 6 (Engineering and deployment)** assumes phases 1-5 conceptually. Practically it leans hardest on P4 (what you're serving), P5 (how it was made, so you can reason about its failure modes), and P3 (what hardware you're serving on). L61 (classical ML in production) is a second-pass treatment of L47.

**Phase 7A (Research literacy)** assumes phases 1-6. The reader needs concrete technical material (transformers, scaling laws, benchmarks they've already met) to apply the research-reading skills to.

**Phase 7B (Frontier intelligence)** assumes P7A. You don't read frontier claims without first knowing how to read them. Reasoning models, world models, and robotics lessons all build on L6 (RL fundamentals).

---

## Cross-phase reinforcement (the interleaving design)

Each lesson's interleaved retrieval question is deliberate. It keeps specific earlier concepts in working memory while you're learning new ones.

The rules:

**Phase 2 lessons pull back to phase 1.** Every maths lesson has one retrieval question that re-anchors the maths to an intelligence/learning concept. E.g., the distance & similarity lesson (L12) pulls back to embeddings intuition (L9).

**Phase 3 lessons pull back to phase 2.** Every hardware lesson reaches into a phase 2 concept. E.g., the tensor cores lesson (L26) pulls back to matrices and linear transforms (L14), reinforcing that the operation tensor cores accelerate is exactly the matrix multiply you saw geometrically two phases ago.

**Phase 4 lessons pull back to phase 3 and (occasionally) phase 2.** Every architecture lesson interleaves a hardware concept. Multi-head attention (L41) pulls back to tensor cores (L26) and parallelism (L20). The transformer block (L43) pulls back to memory hierarchies (L27), since residual streams and KV cache live in specific levels of that hierarchy. The transformer block also pulls back to emergence (L4): the internal representations the model develops were never explicitly programmed.

**Phase 5 lessons pull back to phase 3 (mostly) and phase 4.** Training is where hardware and architecture meet. Pretraining (L49) pulls back to distributed compute patterns (L33). RLHF (L53) pulls back to L6 (RL fundamentals) and to the architecture being trained (L43). Scaling laws (L51) pulls back to compute scaling intuition (L21) and to emergence (L4).

**Phase 6 lessons pull back to whatever they're built on.** RAG (L60) pulls back to embeddings (L9 and L59). Classical ML in production (L61) pulls back to discriminative architectures (L47) and to evaluation (L65). Inference engines (L66) pull back to VRAM (L25) and KV cache (memory hierarchies, L27).

**Phase 7A lessons pull back to a recent technical lesson and ask you to critique it.** L68 (Reading an AI paper) pulls back to the transformer block (L43) and asks the reader to identify its load-bearing claim. L69 (Benchmarks and how they lie) pulls back to scaling laws (L51) and asks for the 4 questions to ask of a SOTA claim. L70 (Reading scaling graphs) pulls back to L51 with a specific axis-choice critique exercise.

**Phase 7B lessons pull back to phase 1.** This is deliberate symmetry. The frontier questions are intelligence-system questions in modern clothes. Reasoning models (L71), world models (L72), and embodiment (L73) all pull back to L6 (RL fundamentals). World models (L72) also pulls back to representation (L7). Emergence revisited (L75) pulls back to L4 (first pass) and to scaling laws (L51). AGI hypotheses (L78) pulls back to "what current AI can and can't do" (L10).

The result: by lesson 79, the reader has retrieved every foundational concept many times across many contexts. Not as drilled flashcards (those happen too, separately) but as load-bearing pieces of new arguments. The 4 most-pulled-back-to concepts are L4 (emergence), L6 (RL), L27 (memory hierarchies), and L43 (transformer block). They get heavy reinforcement by design.

---

## Lessons that should be revisited as new context arrives

Some early lessons get **lighter treatment first time around** and **deeper treatment later**. This is deliberate. The first pass is intuition; the second is mechanism.

| Concept                    | First pass (intuition)           | Second pass (mechanism)              |
|----------------------------|----------------------------------|--------------------------------------|
| Embeddings                 | L9 (P1, geometric)               | L59 (P6, applied)                    |
| Compute scaling            | L21 (P2, intuition)              | L51 (P5, scaling laws)               |
| Tokens                     | L8 (P1, what they are)           | L66 (P6, KV cache implications)      |
| Backprop                   | L36 (P4, chain rule)             | L49 (P5, training loop)              |
| Quantisation               | L32 (P3, what changes)           | L66 (P6, inference deployment)       |
| Memory hierarchies         | L27 (P3, the ladder)             | L66 (P6, why KV cache matters)       |
| Reinforcement learning     | L6 (P1, paradigm)                | L53 (P5, applied in training)        |
| Discriminative architectures | L47 (P4, concept)              | L61 (P6, production deployment)      |
| Emergence                  | L4 (P1, intuition)               | L75 (P7B, mechanism after context)   |

When generating these lessons, the second-pass lesson should explicitly reference the first-pass lesson and reframe the concept in the new context. The reader feels the layering.

---

## Order-of-study recommendation

**Strict order**: phases 1 → 2 → 3 → 4 → 5 → 6 → 7A → 7B.

Within a phase, do lessons in numerical order. There are no skip-ahead shortcuts. Specifically, P7A must complete before any P7B lesson begins. The frontier lessons assume research literacy.

**Don't batch.** 1 lesson per sitting. Spaced repetition expects retrieval at intervals, not bulk consumption.

**Schedule for 2-3 lessons per week**, sideline-hours pace. Faster than that and flashcard review starts to lag. Slower than that and the route gets fuzzy between sittings.

**Re-walk the palace every Sunday.** Whichever lessons you've reached, walk the route from station 1 to the current station. Name each concept out loud. This is the consolidation step that turns the work into structural memory rather than recent memory.

**Build track milestones** (see `build-track.md`) slot in after specific lessons. Plan one build evening per week alongside the lesson cadence.

---

## How dependencies show up in lesson generation

When ChatGPT drafts a lesson and you bring it to me, I check 4 things against this document before producing the HTML:

1. **Does the lesson assume only what its phase is allowed to assume?** No referencing phase 4 attention in a phase 2 maths lesson.
2. **Does the interleaved retrieval question pull back to the right lesson?** I set this based on the rules above, not based on whatever ChatGPT picked.
3. **For second-pass concepts, does the lesson explicitly reference the first-pass lesson?** If not, I add the link.
4. **For lessons that touch emergence (L43, L45, L51, L71, L72, L75), does the lesson name it as such?** Emergence is a through-line. The reader should feel it threading the rooms.

These checks are why I'm a useful intermediary between ChatGPT and the finished course. ChatGPT doesn't have the dependency graph in its head. I do.