Algorithm Families - Code Snippets¶
BWXT Data Science Workforce Training Pilot
Companion notebook forChapter_Algorithm_Families.md
This notebook collects the code and text snippets from the Algorithm Families chapter in chapter order.
The text snippets are included as markdown examples because they are pseudocode or plain-language model ideas. The Python snippets are included as runnable code cells. The setup cell below creates a small df table used by the model examples.
Setup - run this first¶
The chapter snippets use small placeholder variables such as df, X, and y. This setup cell creates a compact welding inspection table so the Python examples have data to use.
In [ ]:
import pandas as pd
# Small example dataset used by the chapter snippets.
df = pd.DataFrame({
'voltage': [22.1, 19.5, 24.3, 23.9, 20.1, 21.8, 22.5, 24.8, 23.0, 20.7],
'travel_speed': [4.8, 5.5, 4.1, 4.0, 5.2, 4.9, 4.7, 3.9, 4.3, 5.1],
'heat_input': [1.15, 0.92, 1.35, 1.42, 0.98, 1.10, 1.18, 1.48, 1.30, 1.02],
'mean_pixel': [82.4, 61.7, 91.2, 96.5, 70.5, 79.2, 84.1, 98.0, 88.3, 73.4],
'std_pixel': [11.2, 18.5, 13.4, 21.0, 15.8, 12.0, 10.9, 22.5, 14.1, 16.2],
'defect_area': [0.0, 3.0, 0.0, 11.0, 1.0, 0.0, 0.5, 14.0, 4.2, 0.0],
'tensile_strength_mpa': [512, 455, 530, 438, 470, 505, 498, 420, 462, 500],
'is_defective': [0, 1, 0, 1, 1, 0, 1, 1, 1, 0],
'defect_class': ['no_defect', 'porosity', 'no_defect', 'crack', 'porosity',
'no_defect', 'porosity', 'crack', 'porosity', 'no_defect'],
})
df
Regression vs. Classification¶
tensile_strength_mpa = 485.2
inspection_result = 'fail'
When to Use Regression¶
In [ ]:
target = 'defect_length_mm'
When to Use Classification¶
In [ ]:
target = 'defect_class'
Linear Regression¶
predicted_strength = intercept + slope * weld_length_mm
predicted_strength = intercept
+ weight_1 * voltage
+ weight_2 * travel_speed
+ weight_3 * heat_input
In [ ]:
from sklearn.linear_model import LinearRegression
X = df[['voltage', 'travel_speed', 'heat_input']]
y = df['tensile_strength_mpa']
model = LinearRegression()
model.fit(X, y)
predictions = model.predict(X)
Logistic Regression¶
probability_of_defect = sigmoid(score)
if probability_of_defect >= 0.5:
predict 'defective'
else:
predict 'not_defective'
In [ ]:
from sklearn.linear_model import LogisticRegression
X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['is_defective']
model = LogisticRegression()
model.fit(X, y)
predicted_classes = model.predict(X)
predicted_probabilities = model.predict_proba(X)
Decision Trees¶
Is mean_pixel < 70?
yes -> Is defect_area > 12?
yes -> predict 'defective'
no -> predict 'not_defective'
no -> predict 'not_defective'
In [ ]:
from sklearn.tree import DecisionTreeClassifier
X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['defect_class']
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X, y)
predictions = model.predict(X)
Random Forest¶
tree_1 predicts 'porosity'
tree_2 predicts 'crack'
tree_3 predicts 'porosity'
forest prediction = 'porosity'
In [ ]:
from sklearn.ensemble import RandomForestClassifier
X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['defect_class']
model = RandomForestClassifier(
n_estimators=100,
random_state=42,
)
model.fit(X, y)
predictions = model.predict(X)
Support Vector Machines¶
In [ ]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['is_defective']
model = make_pipeline(
StandardScaler(),
SVC(kernel='rbf')
)
model.fit(X, y)
predictions = model.predict(X)
Naive Bayes¶
"visible crack near edge"
In [ ]:
from sklearn.naive_bayes import GaussianNB
X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['defect_class']
model = GaussianNB()
model.fit(X, y)
predictions = model.predict(X)
Neural Networks¶
voltage, travel_speed, heat_input -> hidden layers -> defect probability
weld image -> image filters -> learned features -> defect class
In [ ]:
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['defect_class']
model = make_pipeline(
StandardScaler(),
MLPClassifier(hidden_layer_sizes=(32, 16), random_state=42)
)
model.fit(X, y)
predictions = model.predict(X)
Gradient Boosting¶
model_1 makes initial predictions
model_2 learns from model_1's errors
model_3 learns from remaining errors
final prediction combines all models
In [ ]:
from sklearn.ensemble import GradientBoostingClassifier
X = df[['mean_pixel', 'std_pixel', 'defect_area']]
y = df['defect_class']
model = GradientBoostingClassifier(random_state=42)
model.fit(X, y)
predictions = model.predict(X)
Clustering¶
cluster 0 -> stable process measurements
cluster 1 -> unusually high heat input
cluster 2 -> low brightness images
In [ ]:
from sklearn.cluster import KMeans
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
X = df[['voltage', 'travel_speed', 'mean_pixel']]
model = make_pipeline(
StandardScaler(),
KMeans(n_clusters=3, random_state=42)
)
cluster_ids = model.fit_predict(X)
df['cluster_id'] = cluster_ids