# %% [markdown]
# # Gradient descent, step by step
#
# The Python behind the "gradient descent" explainer on the *Intro to neural
# networks* chapter. We minimize a simple loss valley `L(w) = 0.5 * (w - 3)^2`
# by repeatedly stepping downhill. Change the learning rate and watch it
# converge, crawl, or diverge — the same thing the website's slider does.
#
# Requirements: `numpy`, `matplotlib`.

# %%
import numpy as np
import matplotlib.pyplot as plt

# %% [markdown]
# ## 1. The loss and its gradient
#
# The minimum is at `w = 3`. The derivative of `0.5*(w-3)^2` is just `(w - 3)`,
# which tells us which way is downhill and how steep it is.

# %%
def loss(w):
    return 0.5 * (w - 3.0) ** 2

def grad(w):
    return w - 3.0

# %% [markdown]
# ## 2. The update rule
#
# `w <- w - learning_rate * gradient`. Try `lr = 0.1` (steady), `lr = 0.9`
# (overshoots but converges), `lr = 2.1` (diverges — the step is too big).

# %%
learning_rate = 0.1
w = 0.3                # starting guess
history = [w]

for step in range(25):
    w = w - learning_rate * grad(w)
    history.append(w)

print(f"final w = {w:.4f}  (target 3.0),  final loss = {loss(w):.5f}")

# %% [markdown]
# ## 3. Watch each step roll down the curve

# %%
ws = np.linspace(-1, 7, 200)
plt.figure(figsize=(6, 4))
plt.plot(ws, loss(ws), color="#94a3b8", label="loss L(w)")
hist = np.array(history)
plt.plot(hist, loss(hist), "o-", color="#b42318", label="descent steps")
plt.scatter([3], [0], color="#2457c5", zorder=5, label="minimum")
plt.xlabel("w"); plt.ylabel("loss"); plt.legend(); plt.title(
    f"Gradient descent (lr = {learning_rate})")
plt.show()

# %% [markdown]
# ## Your turn
#
# - Increase `learning_rate` past ~2.0 and watch the steps fly apart.
# - Replace `loss`/`grad` with a real one — e.g. mean-squared error of a linear
#   fit `y = w*x` — and descend on that instead.
# - This single-weight loop is exactly what a deep network does, just with
#   millions of weights and gradients from backpropagation.
