Algorithm Families - Code Snippets¶

BWXT Data Science Workforce Training Pilot
Companion notebook for Chapter_Algorithm_Families.md

This notebook collects the code and text snippets from the Algorithm Families chapter in chapter order.

The text snippets are included as markdown examples because they are pseudocode or plain-language model ideas. The Python snippets are included as runnable code cells. The setup cell below creates a small df table used by the model examples.

Setup - run this first¶

The chapter snippets use small placeholder variables such as df, X, and y. This setup cell creates a compact welding inspection table so the Python examples have data to use.

In [ ]:
import pandas as pd

# Small example dataset used by the chapter snippets.
df = pd.DataFrame({
    'voltage': [22.1, 19.5, 24.3, 23.9, 20.1, 21.8, 22.5, 24.8, 23.0, 20.7],
    'travel_speed': [4.8, 5.5, 4.1, 4.0, 5.2, 4.9, 4.7, 3.9, 4.3, 5.1],
    'heat_input': [1.15, 0.92, 1.35, 1.42, 0.98, 1.10, 1.18, 1.48, 1.30, 1.02],
    'mean_pixel': [82.4, 61.7, 91.2, 96.5, 70.5, 79.2, 84.1, 98.0, 88.3, 73.4],
    'std_pixel': [11.2, 18.5, 13.4, 21.0, 15.8, 12.0, 10.9, 22.5, 14.1, 16.2],
    'defect_area': [0.0, 3.0, 0.0, 11.0, 1.0, 0.0, 0.5, 14.0, 4.2, 0.0],
    'tensile_strength_mpa': [512, 455, 530, 438, 470, 505, 498, 420, 462, 500],
    'is_defective': [0, 1, 0, 1, 1, 0, 1, 1, 1, 0],
    'defect_class': ['no_defect', 'porosity', 'no_defect', 'crack', 'porosity',
                     'no_defect', 'porosity', 'crack', 'porosity', 'no_defect'],
})

df

Regression vs. Classification¶

tensile_strength_mpa = 485.2
inspection_result = 'fail'

When to Use Regression¶

In [ ]:
target = 'defect_length_mm'

When to Use Classification¶

In [ ]:
target = 'defect_class'

Linear Regression¶

predicted_strength = intercept + slope * weld_length_mm
predicted_strength = intercept
                     + weight_1 * voltage
                     + weight_2 * travel_speed
                     + weight_3 * heat_input
In [ ]:
from sklearn.linear_model import LinearRegression

X = df[['voltage', 'travel_speed', 'heat_input']]
y = df['tensile_strength_mpa']

model = LinearRegression()
model.fit(X, y)

predictions = model.predict(X)

Logistic Regression¶

probability_of_defect = sigmoid(score)
if probability_of_defect >= 0.5:
    predict 'defective'
else:
    predict 'not_defective'
In [ ]:
from sklearn.linear_model import LogisticRegression

X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['is_defective']

model = LogisticRegression()
model.fit(X, y)

predicted_classes = model.predict(X)
predicted_probabilities = model.predict_proba(X)

Decision Trees¶

Is mean_pixel < 70?
    yes -> Is defect_area > 12?
        yes -> predict 'defective'
        no  -> predict 'not_defective'
    no  -> predict 'not_defective'
In [ ]:
from sklearn.tree import DecisionTreeClassifier

X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['defect_class']

model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X, y)

predictions = model.predict(X)

Random Forest¶

tree_1 predicts 'porosity'
tree_2 predicts 'crack'
tree_3 predicts 'porosity'

forest prediction = 'porosity'
In [ ]:
from sklearn.ensemble import RandomForestClassifier

X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['defect_class']

model = RandomForestClassifier(
    n_estimators=100,
    random_state=42,
)
model.fit(X, y)

predictions = model.predict(X)

Support Vector Machines¶

In [ ]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['is_defective']

model = make_pipeline(
    StandardScaler(),
    SVC(kernel='rbf')
)
model.fit(X, y)

predictions = model.predict(X)

Naive Bayes¶

"visible crack near edge"
In [ ]:
from sklearn.naive_bayes import GaussianNB

X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['defect_class']

model = GaussianNB()
model.fit(X, y)

predictions = model.predict(X)

Neural Networks¶

voltage, travel_speed, heat_input -> hidden layers -> defect probability
weld image -> image filters -> learned features -> defect class
In [ ]:
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['defect_class']

model = make_pipeline(
    StandardScaler(),
    MLPClassifier(hidden_layer_sizes=(32, 16), random_state=42)
)
model.fit(X, y)

predictions = model.predict(X)

Gradient Boosting¶

model_1 makes initial predictions
model_2 learns from model_1's errors
model_3 learns from remaining errors
final prediction combines all models
In [ ]:
from sklearn.ensemble import GradientBoostingClassifier

X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['defect_class']

model = GradientBoostingClassifier(random_state=42)
model.fit(X, y)

predictions = model.predict(X)

Clustering¶

cluster 0 -> stable process measurements
cluster 1 -> unusually high heat input
cluster 2 -> low brightness images
In [ ]:
from sklearn.cluster import KMeans
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

X = df[['voltage', 'travel_speed', 'mean_pixel']]

model = make_pipeline(
    StandardScaler(),
    KMeans(n_clusters=3, random_state=42)
)
cluster_ids = model.fit_predict(X)

df['cluster_id'] = cluster_ids