# %% [markdown]
# # Decision boundaries: k-NN vs. a decision tree
#
# This is the Python behind the interactive "decision boundary" explainer on
# the *Algorithm families* chapter. Run it top-to-bottom, then change `k` and
# `max_depth` and re-run to reproduce what the sliders do on the website.
#
# Requirements: `numpy`, `matplotlib`, `scikit-learn` (all preinstalled on Colab).

# %%
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

# %% [markdown]
# ## 1. The training data
#
# Two partially overlapping clusters in a 2-D feature space (the same points
# shown on the site). Class 0 sits top-left, class 1 bottom-right, with a couple
# of points crossing over so the boundary is not trivially separable.

# %%
X = np.array([
    [-0.9, 0.7], [-0.7, 0.9], [-0.5, 0.6], [-0.8, 0.4], [-0.3, 0.8],
    [0.6, -0.5], [0.8, -0.7], [0.5, -0.3], [0.9, -0.6], [0.4, -0.8],
    [0.2, 0.2], [-0.1, -0.1],
])
y = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1])

# %% [markdown]
# ## 2. A helper that paints the decision region
#
# We classify a dense grid of points and colour each cell by the predicted
# class — exactly how the website fills the background behind the dots.

# %%
def plot_boundary(model, title):
    model.fit(X, y)
    lo, hi = -1.1, 1.1
    xx, yy = np.meshgrid(np.linspace(lo, hi, 300), np.linspace(lo, hi, 300))
    grid = np.c_[xx.ravel(), yy.ravel()]
    zz = model.predict(grid).reshape(xx.shape)

    plt.figure(figsize=(5, 5))
    plt.contourf(xx, yy, zz, alpha=0.25,
                 cmap=ListedColormap(["#2457c5", "#b42318"]))
    plt.scatter(X[:, 0], X[:, 1], c=y,
                cmap=ListedColormap(["#2457c5", "#b42318"]),
                edgecolors="k", s=80)
    plt.title(title)
    plt.xlabel("feature 1"); plt.ylabel("feature 2")
    plt.show()

# %% [markdown]
# ## 3. k-nearest neighbors
#
# Try `k = 1` (jagged, follows every point — overfit) up through `k = 11`
# (smooth, but blurs local detail). This is the website's "k" slider.

# %%
k = 5
plot_boundary(KNeighborsClassifier(n_neighbors=k), f"k-NN (k = {k})")

# %% [markdown]
# ## 4. A decision tree
#
# A tree carves the space into axis-aligned rectangles. Increase `max_depth`
# from 1 to 5 to add more splits — readable rules, but deep trees overfit this
# tiny set. This is the website's "max depth" slider.

# %%
max_depth = 2
plot_boundary(DecisionTreeClassifier(max_depth=max_depth, random_state=0),
              f"Decision tree (max_depth = {max_depth})")

# %% [markdown]
# ## Your turn
#
# - Swap in your own `X` / `y` (two columns of features, integer class labels).
# - Sweep `k` and `max_depth` and watch the boundary change.
# - Add a third class and re-run — both models handle it with no code changes.
