Glossary

The program's common language: 70 key terms across Python, data and statistics, machine learning, deep learning, LLMs, MLOps, and the program itself. Each entry links to the chapter where it's covered.

A

Activation Function Deep Learning & Vision: A nonlinearity (e.g. ReLU, sigmoid) applied to a neuron's output, letting networks model nonlinear relationships. Covered in: Introduction to Neural Networks
Agile Toolkit & Process: An iterative way of delivering work in short increments with frequent feedback and adaptation. Covered in: Agile Development Methodology
Attention NLP & LLMs: A mechanism that lets a model weigh the relevance of other tokens when representing each token. Covered in: Transformers, RAG Models, and LLMs

B

Backpropagation Deep Learning & Vision: The algorithm that computes loss gradients layer-by-layer so a network's weights can be updated. Covered in: Introduction to Neural Networks

C

Capability Tier Program: One of the four IALR Data Science Capability Maturity Model levels describing what a learner can do, from interpreting results to designing solutions. Covered in: Capability Tiers
Capstone Project Program: The course-long weld-defect image-analysis project that integrates the program's skills end to end. Covered in: Capstone Project
Classification Machine Learning: A supervised task that predicts a discrete category label for each input. Covered in: Algorithm Families , Model Evaluation
Clustering Machine Learning: Grouping similar data points together without predefined labels. Covered in: Algorithm Families
Conditional (if/else) Python: A control-flow construct that runs different code depending on whether a condition evaluates to true or false. Covered in: if-else and Flow Control
Confusion Matrix Machine Learning: A table of true vs. predicted classes (TP, FP, FN, TN) used to analyze classification errors. Covered in: Model Evaluation
Convolution Deep Learning & Vision: Sliding a small learnable filter over an input to produce a feature map highlighting local patterns. Covered in: CNNs and Computer Vision
Convolutional Neural Network (CNN) Deep Learning & Vision: A neural network that uses convolutional filters to detect spatial features, well suited to images. Covered in: CNNs and Computer Vision
Correlation Data & Statistics: A measure of how strongly two variables move together; it indicates association, not causation. Covered in: Dataset Statistics and Visualization
Cross-Validation Machine Learning: Estimating performance by repeatedly splitting data into train/validation folds and averaging the results. Covered in: Model Evaluation

D

Data Drift MLOps: When the statistical properties of incoming data shift away from the training data, degrading model accuracy. Covered in: Deployment and Monitoring
Data Visualization Data & Statistics: Representing data graphically (charts, plots) to reveal patterns, trends, and outliers. Covered in: Communicating with Visualization
DataFrame Data & Statistics: A two-dimensional, labeled table (rows and named columns) — the core structure for tabular analysis in pandas. Covered in: Dataset Statistics and Visualization , Communicating with Visualization
Debugging Python: The process of locating and fixing defects, often using error messages, print statements, and a step-through debugger. Covered in: Debugging
Deployment MLOps: Making a trained model available to serve predictions in a production system. Covered in: Deployment and Monitoring
Descriptive Statistics Data & Statistics: Summary measures (mean, median, standard deviation, etc.) that describe the central tendency and spread of a dataset. Covered in: Dataset Statistics and Visualization
Dictionary Python: A mutable collection of key–value pairs offering fast lookup by key. Covered in: Dictionaries and Structuring Data
Dimensionality Reduction Machine Learning: Reducing the number of input features while preserving structure, to fight noise, redundancy, and the curse of dimensionality. Covered in: Dimensionality Reduction
Distribution Data & Statistics: How the values of a variable are spread across their possible range, e.g. normal, skewed, or bimodal. Covered in: Dataset Statistics and Visualization

E

Embedding NLP & LLMs: A dense numeric vector representing a word, sentence, or item so that similar meanings sit close together. Covered in: Transformers, RAG Models, and LLMs
ETL / Data Engineering Data & Statistics: Extract, Transform, Load — building pipelines that move and reshape data from sources into an analysis-ready form. Covered in: Data Engineering and Pipelines

F

F1 Score Machine Learning: The harmonic mean of precision and recall, balancing the two into a single metric. Covered in: Model Evaluation
Feature Machine Learning: An individual measurable input variable used by a model to make predictions. Covered in: Feature Engineering
Feature Engineering Machine Learning: Creating, transforming, and selecting input variables to improve a model's predictive performance. Covered in: Feature Engineering
Feature Scaling Machine Learning: Putting numeric features on a comparable range (e.g. standardization or min–max normalization) so no feature dominates by magnitude. Covered in: Feature Engineering
Function Python: A reusable, named block of code that takes inputs (arguments), performs work, and optionally returns a value. Covered in: Functions

G

Grad-CAM Deep Learning & Vision: An interpretability method that highlights image regions most responsible for a CNN's prediction. Covered in: CNNs and Computer Vision
Gradient Descent Deep Learning & Vision: An optimization algorithm that iteratively adjusts parameters in the direction that most reduces the loss. Covered in: Introduction to Neural Networks

H

Hyperparameter Machine Learning: A configuration set before training (e.g. learning rate, tree depth) that is tuned rather than learned from data. Covered in: AI/ML Model Training

I

Inference MLOps: Using a trained model to generate predictions on new inputs. Covered in: Deployment and Monitoring

L

Large Language Model (LLM) NLP & LLMs: A transformer trained on large text corpora to predict tokens, enabling generation, summarization, and Q&A. Covered in: Transformers, RAG Models, and LLMs
List Python: An ordered, mutable collection of values accessed by integer index. Covered in: Lists
Loop Python: A construct (for/while) that repeats a block of code, typically over a sequence or until a condition stops holding. Covered in: Loops
Loss Function Deep Learning & Vision: A function measuring how far predictions are from the truth; training minimizes it. Covered in: Introduction to Neural Networks

M

Model Monitoring MLOps: Tracking a deployed model's inputs, predictions, and performance over time to catch degradation. Covered in: Deployment and Monitoring
Module Program: One of the nine capability areas the 12-week program is organized around, from Agile through Model Deployment & Monitoring. Covered in: 12-Week Roadmap

N

Neural Network Deep Learning & Vision: A model of connected layers of weighted units (neurons) that learns complex mappings from data. Covered in: Introduction to Neural Networks

O

Object-Oriented Programming (OOP) Python: A paradigm that organizes code into objects bundling data (attributes) with behavior (methods), built from classes. Covered in: Object-Oriented Programming
OCR (Optical Character Recognition) Python: Converting images of text (scans, photos) into machine-readable characters. Covered in: Text in Images (OCR)
One-Hot Encoding Machine Learning: Representing a categorical variable as a set of binary columns, one per category. Covered in: Feature Engineering
Outlier Data & Statistics: A data point that lies far from the rest of the distribution and can distort summaries or models. Covered in: Dataset Statistics and Visualization
Overfitting Machine Learning: When a model learns noise and specifics of the training data and fails to generalize to new data. Covered in: Model Evaluation

P

Precision Machine Learning: Of the items predicted positive, the fraction that are actually positive — TP / (TP + FP). Covered in: Model Evaluation
Principal Component Analysis (PCA) Machine Learning: A linear technique that projects data onto new axes (principal components) capturing the most variance. Covered in: Dimensionality Reduction

R

Recall Machine Learning: Of the actual positives, the fraction the model correctly identifies — TP / (TP + FN). Covered in: Model Evaluation
Regression Machine Learning: A supervised task that predicts a continuous numeric value. Covered in: Algorithm Families
Regular Expression (Regex) Python: A pattern language for matching, searching, and extracting structured substrings from text. Covered in: Regular Expressions
Retrieval-Augmented Generation (RAG) NLP & LLMs: Grounding an LLM's answers by retrieving relevant documents and supplying them as context at generation time. Covered in: Transformers, RAG Models, and LLMs

S

SQL Data & Statistics: Structured Query Language — the standard language for querying and manipulating relational databases. Covered in: SQL for Data Science , SQLite Databases
String Python: An immutable sequence of characters used to represent text. Covered in: Strings and Text Editing
Supervised Learning Machine Learning: Training a model on labeled examples to predict a target for new inputs (classification or regression). Covered in: Algorithm Families

T

Tier 1 — Data Science Foundations Program: Understands the concepts, interprets results, and runs existing notebooks. No coding required. Covered in: Capability Tiers
Tier 2 — Data Science Practitioner Program: Writes code to prepare data, build, and evaluate models with guidance. Covered in: Capability Tiers
Tier 3 — Data Science Associate Program: Works independently across the modeling workflow and makes sound methodological choices. Covered in: Capability Tiers
Tier 4 — Data Science Professional Program: Designs solutions, weighs trade-offs, and owns architecture and deployment decisions. Covered in: Capability Tiers
Token NLP & LLMs: The atomic unit (word piece) an LLM reads and generates; text is split into tokens before processing. Covered in: Transformers, RAG Models, and LLMs
Train/Test Split Machine Learning: Holding out part of the data for evaluation so model performance is measured on examples it did not learn from. Covered in: Model Evaluation
Transfer Learning Deep Learning & Vision: Reusing a model pre-trained on a large dataset and adapting it to a related task with less data. Covered in: CNNs and Computer Vision
Transformer NLP & LLMs: A neural architecture built on self-attention that processes sequences in parallel; the basis of modern LLMs. Covered in: Transformers, RAG Models, and LLMs

U

Underfitting Machine Learning: When a model is too simple to capture the underlying pattern, performing poorly even on training data. Covered in: Model Evaluation
Unit Testing Toolkit & Process: Writing small automated checks that verify individual pieces of code behave as expected. Covered in: Testing and Unit Tests
Unsupervised Learning Machine Learning: Finding structure in unlabeled data, e.g. clustering or dimensionality reduction. Covered in: Algorithm Families
Use Case Toolkit & Process: A specific business problem framed to assess whether — and how — AI/ML can deliver value. Covered in: Framing AI Use Cases

V

Variable Python: A named reference that stores a value so it can be reused and changed later in a program. Covered in: Python Basics
Version Control (Git) Toolkit & Process: Tracking changes to code over time, enabling history, branching, and collaboration. Covered in: Git and Version Control

W

Web Scraping Python: Programmatically extracting data from web pages, typically by fetching HTML and parsing the elements of interest. Covered in: Web Scraping