Build a 2D loss surface, derive its gradient by hand, and write the gradient-descent update rule yourself. Then watch the optimiser walk: drop a starting point on the surface, run the loop, and plot the trajectory over a contour map. Sweep the learning rate to see crawling, converging, and diverging, and toggle momentum to see it smooth the path through a valley.
numpy and matplotlib. If you do not have them, see Installing packages. New to running a script? Python setup and Running Python cover it, and Reading errors helps when something throws. The numpy and plotting one-liners you need are in Python basics and the Python cheatsheet. There is no data file and no model to download: the surface is math, so the whole build is self-contained.
You build a 2D loss surface, derive its gradient, and write the gradient-descent update yourself. Then you run the optimiser from a fixed start and plot the path it takes over a contour map of the surface. You sweep the learning rate to watch a small rate crawl, a mid rate converge, and a large rate diverge, and you turn momentum on and off to see it cut a straighter path through a narrow valley. The surface is a few lines of numpy, the optimiser is your code, and the plot is where the behaviour becomes obvious.
About 2 to 2.5 hours: roughly half an hour on the surface and its gradient, an hour on the update rule and the trajectory plot, and the rest on the learning-rate and momentum sweeps.
descent.py: the surface, the hand-written gradient, the update rule, the trajectory loop, and the contour plot.README.md: the update rule in your own words, the learning rate that converged, and what momentum changed.builds/B4/
descent.py # surface + gradient + step + loop + contour/trajectory plot
trajectory.png # the optimiser path over the contours
lr_sweep.png # trajectories or loss curves for several learning rates
README.md # update rule, the lr that worked, what momentum did
One script is plenty. There is no data file: unlike B3, B4 is fully self-contained because the surface is a function.
f(x, y) = 0.5 * (A*x**2 + B*y**2) with A and B far apart (say 1 and 20). Evaluate it on a grid with np.meshgrid and draw a contour plot so you can see the valley.[A*x, B*y]. Sanity-check it points away from the minimum.step(p, v, lr, momentum): compute the gradient, update the velocity, step downhill, return the new point and velocity.Two functions carry the optimiser and are left for you to write. Everything else is scaffolding. Writing gradient and step yourself is the milestone.
import numpy as np
import matplotlib.pyplot as plt
A, B = 1.0, 20.0 # an elongated bowl: a narrow valley along x
def f(p): # scaffolding: the surface
x, y = p
return 0.5 * (A * x**2 + B * y**2)
def gradient(p):
# TODO (you write this): the slope vector at p.
# for this bowl the gradient is [A*x, B*y]
...
def step(p, v, lr, momentum):
# TODO (you write this): one optimiser update.
# velocity = momentum * v - lr * gradient(p)
# new_p = p + velocity
# return new_p, velocity (momentum = 0 gives plain gradient descent)
...
def run(start, lr, steps=200, momentum=0.0): # scaffolding: the loop + recording
p = np.array(start, dtype=float)
v = np.zeros(2)
traj = [p.copy()]
for _ in range(steps):
p, v = step(p, v, lr, momentum)
traj.append(p.copy())
if not np.isfinite(f(p)): # divergence guard
print("diverged at step", len(traj))
break
return np.array(traj)
def plot_contours(traj): # scaffolding: meshgrid + contour + path
xs = np.linspace(-3, 3, 200)
ys = np.linspace(-3, 3, 200)
X, Y = np.meshgrid(xs, ys)
Z = 0.5 * (A * X**2 + B * Y**2)
plt.contour(X, Y, Z, levels=30)
plt.plot(traj[:, 0], traj[:, 1], marker=".")
plt.xlabel("x"); plt.ylabel("y"); plt.title("descent trajectory")
plt.savefig("trajectory.png")
For the optional saddle demonstration, swap in a second surface and its gradient. Keep it brief; it shows a saddle point, it is not the main convergence exercise:
# a saddle: down in y, up in x. start just off-centre and watch it slide away.
def f_saddle(p):
x, y = p
return x**2 - y**2
def grad_saddle(p):
x, y = p
return np.array([2*x, -2*y]) # flat at the origin, but y still goes downhill
Your exact numbers depend on the surface constants and the start point; these are illustrative, not targets:
A results table makes the sweep legible:
lr momentum converged steps final loss
0.001 0.0 no (crawl) 200 4.83
0.02 0.0 yes 138 0.001
0.02 0.9 yes 41 0.000
0.20 0.0 no (diverged) 7 inf
Assess against the Build Track Validation Standard. The bar is understanding, not a converged number.
scipy.optimize) rather than your numpy. The milestone is to build the optimiser, so reframe it to your own code before marking complete. Such libraries belong only in the optional extensions.
These are conceptual traps, distinct from code symptoms.
These are code symptoms and their likely causes, distinct from the conceptual pitfalls above.
| Symptom | Likely cause |
|---|---|
| trajectory shoots to inf or NaN | Learning rate too large, or the gradient sign is wrong. Lower the rate and check the step subtracts the gradient. |
| the path climbs uphill | The step adds the gradient instead of subtracting it. |
| the path barely moves | Rate far too small, or the gradient is miscomputed as near-zero. |
| contour and trajectory do not line up | x and y swapped between the surface grid and the trajectory plot. |
| momentum path oscillates forever | Momentum too high for the learning rate. Lower one of them. |
| the contour plot is empty | Forgot meshgrid, or evaluated the surface on 1D arrays instead of the grid. |