# %% [markdown]
# # Feature scaling: why standardization matters
#
# The Python behind the "feature scaling" explainer on the *Feature
# engineering* chapter. Two weld measurements live on wildly different scales —
# bead width (~6–14 mm) and an IR signal (0–1). Distance-based models are
# dominated by the larger-range feature until you scale.
#
# Requirements: `numpy`, `matplotlib`, `scikit-learn`.

# %%
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# %% [markdown]
# ## 1. Raw features on different scales

# %%
# columns: bead width (mm), IR signal (0-1)
X = np.array([
    [6.2, 0.82], [6.8, 0.78], [7.1, 0.85], [5.9, 0.76],
    [13.4, 0.24], [12.8, 0.28], [14.1, 0.22], [13.0, 0.26],
])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])
probe = np.array([12.5, 0.75])   # a new point we want to classify

# %% [markdown]
# ## 2. The problem: Euclidean distance ignores the small-range feature
#
# Because width spans ~8 units and the IR signal spans ~0.6, raw distance is
# almost entirely driven by width. The IR signal barely counts.

# %%
def nearest_label(Xs, probe_s):
    d = np.linalg.norm(Xs - probe_s, axis=1)
    return y[d.argmin()], d

raw_label, raw_d = nearest_label(X, probe)
print("Raw nearest-neighbor class:", raw_label)

# %% [markdown]
# ## 3. Standardize: subtract the mean, divide by the std
#
# After `StandardScaler`, every feature has mean 0 and std 1, so both
# contribute equally to distance. (The website also shows min-max scaling,
# which squeezes every feature into [0, 1] instead.)

# %%
scaler = StandardScaler().fit(X)
Xs = scaler.transform(X)
probe_s = scaler.transform(probe.reshape(1, -1))[0]

std_label, std_d = nearest_label(Xs, probe_s)
print("Standardized nearest-neighbor class:", std_label)

# %% [markdown]
# ## 4. See it: before vs. after

# %%
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
for ax, data, p, title in [
    (axes[0], X, probe, "Raw (width dominates)"),
    (axes[1], Xs, probe_s, "Standardized (balanced)"),
]:
    ax.scatter(data[y == 0, 0], data[y == 0, 1], c="#2457c5", s=70, label="class 0")
    ax.scatter(data[y == 1, 0], data[y == 1, 1], c="#b42318", s=70, label="class 1")
    ax.scatter(*p, c="black", marker="*", s=220, label="new point")
    ax.set_title(title); ax.legend()
plt.tight_layout(); plt.show()

# %% [markdown]
# ## Your turn
#
# - Swap `StandardScaler` for `MinMaxScaler` and compare the nearest neighbor.
# - Add a feature on yet another scale and confirm scaling still balances it.
# - Fit the scaler on a *train* split only, then transform test data — never
#   fit on the full set, or you leak information.
