Entering the stack

Lesson 0. Orientation, before Phase 1. ~30 min read. Durability tier 1 (bedrock; the doorway doesn't move).

🚪

Memory palace · Workshop doorway · station 0

The doorway, with the workshop schematic pinned beside it. You stand at the entrance, read the schematic, see the route before you walk it. The orientation lesson lives here.

Core idea. This course teaches AI as a stack of mechanisms (representation, optimisation, geometry, hardware, constraints, systems tradeoffs) and the engineering habits needed to think about them clearly across the spectrum from microcontroller to hyperscale.

Why this lesson exists

Public AI discourse swings between mystification and dismissal. Both are wrong because both skip the mechanism. The mechanism is the only part worth your time: how the system is built, what objective it was optimised against, what hardware it runs on, what representations it learned, where it falls over.

This course is engineered to teach those mechanisms in the order that makes them stick, and to keep them stuck six months later. Lesson 0 is the doorway. You read it once at the start so the rest of the course knows where you're standing.

The shape of the course

The course is a workshop you walk through in your head. Seven rooms, an external staircase, a roof. Each room belongs to a phase. Each phase teaches one layer of the modern AI stack.

The bench is where you sit and think before you build. The whiteboard wall is where you sketch the maths you'll need. The server bay is the noisy room with the racks; where hardware lives. The drafting table is where neural architectures get drawn out in chronological order, each one a response to the previous one's limits. The foundry is where models get made: training, scaling, fine-tuning. The lab bench is the wiring rig where production systems get assembled. The staircase is where you learn to read primary research. The roof is where you survey the frontier.

Seventy-eight numbered concept lessons across those rooms, plus this lesson at the doorway, plus a synthesis lesson at the end of each phase, plus a calibration assessment between phases. Roughly eight months at two to three lessons a week.

The order matters. You can't read the frontier without research literacy. You can't have research literacy without the architectures it talks about. You can't have the architectures without the maths and the hardware that shaped them. You can't have any of that without a clear definition of what an intelligence system actually is. The whole structure is downstream of where you'd want to start if you were learning the field carefully from scratch.

Mechanism first

The hardest part of learning modern AI is not the mathematics. It is the public discourse, which swings between two failure modes. One frames every new model as a step toward something quasi-mystical. The other dismisses the whole field as statistical autocomplete. Both are wrong because both skip the mechanism.

mechanism The mechanism is the only part worth your time. If a claim about AI doesn't trace back to objective, representation, training data, architecture, or hardware, it isn't engineering yet.

The mechanism is what this course teaches. Why does this architecture take the shape it does? Which constraint was it responding to? What does the training objective actually reward? Where does the model fail, and why does that failure trace back to the mechanism rather than the headline?

A reader who finishes the course should be able to look at a new architecture or training scheme and ask, quickly and accurately, what hardware fact, what data fact, and what objective fact produced it. That's the habit the course is engineered to install.

From silicon upward

Hardware sits underneath every choice in this course. Most AI material treats compute as a budget line and silicon as someone else's problem. That framing produces a misleading picture: modern AI took the shape it has because matrix multiplies are fast on specific silicon, memory bandwidth caps what models can be served, and quantisation makes the spectrum traversable.

mechanism The shape of the field is downstream of the shape of the chip.

So the course is built bottom-up. The maths supports the architectures. The architectures fit the silicon. The training scales because the hardware allows it. The deployment lives within the memory and latency budgets the substrate enforces. By the time you reach the frontier, you can read a new system as a response to a stack of constraints rather than as a mystery.

Across the full compute spectrum

AI runs across the full compute spectrum. The same mechanisms (representation, optimisation, geometry) appear on microcontrollers with kilobytes of RAM, on phones, on workstations, on home labs, on server clusters, on hyperscale data centers. The principles stay; the constraint set shifts at each tier.

The course teaches AI as a constraint-aware engineering discipline. A recurring question, applied wherever it earns its place, is how does this change under severe constraints? That question lands differently at each tier, and the answers are part of the field.

The learning architecture

The course is engineered for long-term retention rather than short-term information exposure. The apparatus that delivers that is load-bearing, not garnish.

You'll see five things working together. A memory palace: the workshop above, with each lesson anchored to a physical object in a room you walk in your head. The route turns lesson sequence into spatial memory; walk the route weekly, and the order becomes muscle. Retrieval practice: three open-ended questions at the end of each lesson, answered without looking, then checked. The brain consolidates what it has to reach for; re-reading alone doesn't trigger that consolidation. Glossary tooltips: dashed-underlined terms in the lesson text (like the ones you've been hovering over in this lesson) expand to definitions. The glossary accumulates across phases as a single living reference. Synthesis lessons: at the end of each phase, a compression lesson that reconnects everything in the room before you leave it. Calibration assessments: a short self-test of mechanism (not trivia) before you start the next phase. If it doesn't stick yet, you go back rather than forward.

Two structural devices recur throughout. Recurring core laws: five short statements that thread the whole course. You'll meet each one where it first lands, and see it called back in every synthesis lesson.

compression · the 5 core laws

Representation shapes computation.
Optimisation shapes capability.
Hardware shapes architecture.
Geometry enables generalisation.
Constraints shape systems.

Progressive diagrams: the first system loop you'll see in Lesson 1 will evolve through every phase, the same shape instantiated at higher layers.

A parallel build track sits alongside the lessons. Fifteen core milestones (numpy first, framework second), plus optional extensions to tier-0 microcontroller and tier-3 distributed inference. Coding skill is not a hard prerequisite for conceptual progress. The builds are depth-by-choice; they make the mechanisms physical for learners who want that.

The discipline

This is a slower course than the ones that fit in a weekend. Eight months, two to three lessons a week, weekly palace walks, daily flashcard review. What you learn this way will still be there a year from now, when whatever framework is hot today is no longer hot.

You're being taught to think like a systems engineer who happens to work in AI: ask which constraint produced this, ask what the tradeoff was, ask how this would change at a different tier of the spectrum. Those habits transfer across whatever frameworks come and go.

What Lesson 1 does

Lesson 1 walks you to the first station on the bench and asks: what is an intelligence system, actually? You'll get a working definition: inputs, internal state, outputs, learning signal. From there the course unrolls, layer by layer, in the order constraints actually produced. The doorway is here.

The workshop, in plan view

Figure 0.1 is the schematic pinned beside the doorway. Eight phases laid out as rooms in a single connected building. One dashed path links them in the order you'll walk. The central workbench is the calibration stop you return to between phases. Stations are dots on the room walls; synthesis lessons are the closing walk through each room.

FIG 0.1. The workshop, floor plan. 1 doorway, 7 rooms, 1 staircase, 1 roof. The dashed amber path is the course route through phases in order. Dots on each room's wall are the stations of the memory palace; S markers indicate the closing synthesis walk at the end of each room. The central anvil between the rows is the calibration stop you return to between phases. Phase 7A climbs the east wall; Phase 7B sits above the building.

Flashcards

Click a card to flip. Rate yourself: Again resets, Hard shortens the interval, Good lengthens it. State persists in this browser.

Retrieval practice

Write your answer first. Then reveal. Don't peek. Getting it wrong is how the memory forms.

L0 Name the 7 rooms (and the staircase) of the workshop palace in order. For each one, describe in a sentence what it contains and why it sits where it sits in the course sequence.

Bench (Phase 1): foundations of intelligence. A working definition of what an intelligence system is, before any maths or hardware. Comes first because you need a clear noun before you can analyse anything. Whiteboard wall (Phase 2): mathematical and computational intuition. Vectors, gradients, matrices, probability, entropy, parallelism, scaling. Comes second because the maths is the apparatus the rest depends on. Server bay (Phase 3): hardware and systems. CPU, GPU, VRAM, memory hierarchies, the roofline, accelerators, quantisation, interconnects. Comes third because modern AI took its shape from this silicon. Drafting table (Phase 4): neural architectures. Perceptron through transformer through diffusion through MoE through multimodal through discriminative. Comes fourth because architectures are downstream of the substrate and the maths. Foundry (Phase 5): training and scaling. Datasets, pretraining, scaling laws, post-training (SFT, RLHF, DPO), distillation, distributed training. Comes fifth because trained models depend on architectures and hardware. Lab bench (Phase 6): engineering and deployment. APIs, RAG, classical ML in production, agents, evaluation, inference engines, on-prem AI. Comes sixth because deployment depends on having something trained to deploy. Stairs (Phase 7A): research literacy. Reading a paper, reading benchmarks honestly, reading scaling graphs critically. Comes seventh because you need technical content from earlier phases to apply the skill to. Roof (Phase 7B): frontier intelligence. Reasoning models, world models, embodiment, alignment, AGI views, future hardware. Comes last because reading the frontier needs the literacy of 7A and the substrate of everything before. The ordering is not arbitrary; it is the order in which the field actually became possible.

L0 You read about a new model architecture that's getting attention online. From the worldview this course teaches, what 3 questions should you ask about it before forming an opinion of whether it's interesting or hype? Why those 3?

(1) What constraint was it responding to? Every architecture comes from a limit in the previous architecture. If you can't name the limit, you're reading marketing rather than engineering. (2) What hardware fact does it depend on? If the architecture only runs because of specific silicon (tensor cores, NVLink topology, a particular memory pattern), that's where the engineering value sits and also where the deployment limits will appear. (3) Where does the training objective fail to align with the deployment task? Every failure mode in modern AI traces back to a gap between the training signal and what you actually want. Naming that gap up front is half of evaluating the architecture honestly. Together: these 3 questions cover the substrate (hardware), the mechanism (architecture as constraint response), and the failure surface (objective-deployment gap). If you can answer all 3 from the announcement, the architecture has been honestly explained. If you can't, the announcement is selling you something.

↳ L1 Someone asks why you're "wasting time" with a course that takes 8 months when you could play with an API over a weekend and learn "the same things". From the course's stated worldview, write a serious answer about what is actually different about learning AI mechanistically.

The API teaches what one specific vendor's current model can do today. It teaches nothing about why it can do that, what hardware enables it, what training signal shaped it, where it breaks, or what comes next. In a year the API will be replaced or its limits will change; what was learned by playing with it will be obsolete. Mechanistic understanding transfers across the vendor churn. After 8 months of this course, you will be able to read a new model release and predict what's hype and what's real from the architecture and training story rather than from the marketing. You will be able to evaluate whether a deployment claim is credible given the hardware it's running on. You will know which earlier architectural decision is being revisited when a new paper appears. That ability is what 48 hours of prompt tinkering cannot produce. API tinkering has its place; specific tasks today often benefit from it. The two activities teach different things, and only one of them survives the next product cycle.

Next station

Lesson 1 walks you to the first station on the bench, the reading lamp, where the course's most basic question gets a careful answer: what is an intelligence system, actually?

— start of course — Lesson 1 →