What you'll be able to do

  • Describe the train, deploy, monitor, and retrain production loop
  • Explain data drift and why monitoring is required
  • Outline how a model is packaged and served
Competencies you'll build
  • Identify what must be packaged with a model to run reliably
  • Recognize signals that should trigger retraining
  • Distinguish batch scoring from real-time inference

Key terms in this chapter

Chapter - Model Deployment and Monitoring

Supplementary chapter prepared for the BWXT Data Science Workforce Training Pilot.

Outline in development. This chapter is scaffolded from the maturity-model objectives. The BWXT-specific parts — target infrastructure, serving platform, monitoring stack, and retraining policy — should be filled in with the program's subject-matter experts. The conceptual outline below is ready to teach from.

About this chapter

A model that only runs in a notebook delivers no value. Deployment is the work of moving a trained model into a place where it makes predictions on real data; monitoring is making sure it keeps working after it gets there. These are the Tier 4 capabilities the maturity model calls Solutions Architecture and Algorithm & Pipeline Maintenance.

The lifecycle does not end at training

Production machine learning is a loop, not a finish line. A model is deployed, watched, and retrained as the world changes.

Train Deploy Monitor Retrain
The production loop: train, deploy, monitor, and retrain when performance drifts. Most of a model's life is spent in the monitor-and-retrain part of this cycle.

What this chapter will cover

Packaging a model

  • Saving and versioning model weights and the exact preprocessing steps.
  • Pinning dependencies so the model runs the same everywhere.
  • (SME input: BWXT's model registry / artifact storage.)

Serving predictions

  • Batch scoring versus a real-time inference service.
  • A simple prediction API; where inference runs (edge vs. server, CPU vs. GPU).
  • (SME input: BWXT's target serving platform and hardware.)

Monitoring in production

  • Operational metrics (latency, errors, throughput).
  • Model metrics on live data; data drift — when incoming images stop resembling the training set.
  • Alerting and a human-in-the-loop review for flagged predictions.
  • (SME input: BWXT's monitoring/observability stack.)

Retraining

  • Deciding when to retrain (scheduled vs. drift-triggered).
  • Keeping a labeled feedback loop from inspectors.
  • Safe rollout: shadow testing and rollback.
  • (SME input: BWXT's retraining cadence and approval process.)

Why it matters

A weld-defect model trained on last year's images can quietly degrade when a new camera, lighting rig, or material is introduced. Without monitoring you would not notice until defects slipped through. Deployment and monitoring are what turn a good model into a dependable part of the inspection line.

Practice Questions

Practice Questions

  1. Why is production machine learning described as a loop rather than a one-time task?
  2. What is data drift, and how could it appear on a weld inspection line?
  3. Name two things you must package alongside the model weights for it to run reliably.
  4. What is the difference between batch scoring and real-time inference?
  5. Give one signal that should trigger retraining a deployed model.

Check your understanding

Tier 4 depth · Architecture & tradeoffs

0 / 5 correct
  1. What is 'data drift', and why does it matter in production?

  2. You're architecting how predictions are served. When is BATCH scoring the better choice over a real-time inference service?

  3. Why pin dependencies and version both the weights AND the exact preprocessing steps when packaging a model?

  4. Which set are OPERATIONAL metrics (as opposed to model-quality metrics) you'd monitor in production?

  5. You're rolling out a retrained model. Which practice lets you de-risk the change before it fully takes over?

Go deeper

More in Additional Resources →
← Model Evaluation Object-Oriented Programming →