Chapter - Model Deployment and Monitoring

Supplementary chapter prepared for the BWXT Data Science Workforce Training Pilot.

Outline in development. This chapter is scaffolded from the maturity-model objectives. The BWXT-specific parts — target infrastructure, serving platform, monitoring stack, and retraining policy — should be filled in with the program's subject-matter experts. The conceptual outline below is ready to teach from.

About this chapter

A model that only runs in a notebook delivers no value. Deployment is the work of moving a trained model into a place where it makes predictions on real data; monitoring is making sure it keeps working after it gets there. These are the Tier 4 capabilities the maturity model calls Solutions Architecture and Algorithm & Pipeline Maintenance.

The lifecycle does not end at training

Production machine learning is a loop, not a finish line. A model is deployed, watched, and retrained as the world changes.

The production loop: train, deploy, monitor, and retrain when performance drifts. Most of a model's life is spent in the monitor-and-retrain part of this cycle.

What this chapter will cover

Packaging a model

Saving and versioning model weights and the exact preprocessing steps.
Pinning dependencies so the model runs the same everywhere.
(SME input: BWXT's model registry / artifact storage.)

Serving predictions

Batch scoring versus a real-time inference service.
A simple prediction API; where inference runs (edge vs. server, CPU vs. GPU).
(SME input: BWXT's target serving platform and hardware.)

Monitoring in production

Operational metrics (latency, errors, throughput).
Model metrics on live data; data drift — when incoming images stop resembling the training set.
Alerting and a human-in-the-loop review for flagged predictions.
(SME input: BWXT's monitoring/observability stack.)

Retraining

Deciding when to retrain (scheduled vs. drift-triggered).
Keeping a labeled feedback loop from inspectors.
Safe rollout: shadow testing and rollback.
(SME input: BWXT's retraining cadence and approval process.)

Why it matters

A weld-defect model trained on last year's images can quietly degrade when a new camera, lighting rig, or material is introduced. Without monitoring you would not notice until defects slipped through. Deployment and monitoring are what turn a good model into a dependable part of the inspection line.

Practice Questions

Why is production machine learning described as a loop rather than a one-time task?
What is data drift, and how could it appear on a weld inspection line?
Name two things you must package alongside the model weights for it to run reliably.
What is the difference between batch scoring and real-time inference?
Give one signal that should trigger retraining a deployed model.

Go deeper

scikit-learn documentation open access User guide and API for classic ML in Python.

scikit-learn MOOC (Inria) open access A full course taught by the library's core developers, with notebooks.

Google Machine Learning Crash Course open access Concepts plus interactive exercises — a great companion to our ML chapters.

What you'll be able to do

Key terms in this chapter