Glossary
The program's common language: 70 key terms across Python, data and statistics, machine learning, deep learning, LLMs, MLOps, and the program itself. Each entry links to the chapter where it's covered.
No matching terms.
A
- Activation Function Deep Learning & Vision
- A nonlinearity (e.g. ReLU, sigmoid) applied to a neuron's output, letting networks model nonlinear relationships. Covered in: Introduction to Neural Networks
- Agile Toolkit & Process
- An iterative way of delivering work in short increments with frequent feedback and adaptation. Covered in: Agile Development Methodology
- Attention NLP & LLMs
- A mechanism that lets a model weigh the relevance of other tokens when representing each token. Covered in: Transformers, RAG Models, and LLMs
B
- Backpropagation Deep Learning & Vision
- The algorithm that computes loss gradients layer-by-layer so a network's weights can be updated. Covered in: Introduction to Neural Networks
C
- Capability Tier Program
- One of the four IALR Data Science Capability Maturity Model levels describing what a learner can do, from interpreting results to designing solutions. Covered in: Capability Tiers
- Capstone Project Program
- The course-long weld-defect image-analysis project that integrates the program's skills end to end. Covered in: Capstone Project
- Classification Machine Learning
- A supervised task that predicts a discrete category label for each input. Covered in: Algorithm Families , Model Evaluation
- Clustering Machine Learning
- Grouping similar data points together without predefined labels. Covered in: Algorithm Families
- Conditional (if/else) Python
- A control-flow construct that runs different code depending on whether a condition evaluates to true or false. Covered in: if-else and Flow Control
- Confusion Matrix Machine Learning
- A table of true vs. predicted classes (TP, FP, FN, TN) used to analyze classification errors. Covered in: Model Evaluation
- Convolution Deep Learning & Vision
- Sliding a small learnable filter over an input to produce a feature map highlighting local patterns. Covered in: CNNs and Computer Vision
- Convolutional Neural Network (CNN) Deep Learning & Vision
- A neural network that uses convolutional filters to detect spatial features, well suited to images. Covered in: CNNs and Computer Vision
- Correlation Data & Statistics
- A measure of how strongly two variables move together; it indicates association, not causation. Covered in: Dataset Statistics and Visualization
- Cross-Validation Machine Learning
- Estimating performance by repeatedly splitting data into train/validation folds and averaging the results. Covered in: Model Evaluation
D
- Data Drift MLOps
- When the statistical properties of incoming data shift away from the training data, degrading model accuracy. Covered in: Deployment and Monitoring
- Data Visualization Data & Statistics
- Representing data graphically (charts, plots) to reveal patterns, trends, and outliers. Covered in: Communicating with Visualization
- DataFrame Data & Statistics
- A two-dimensional, labeled table (rows and named columns) — the core structure for tabular analysis in pandas. Covered in: Dataset Statistics and Visualization , Communicating with Visualization
- Debugging Python
- The process of locating and fixing defects, often using error messages, print statements, and a step-through debugger. Covered in: Debugging
- Deployment MLOps
- Making a trained model available to serve predictions in a production system. Covered in: Deployment and Monitoring
- Descriptive Statistics Data & Statistics
- Summary measures (mean, median, standard deviation, etc.) that describe the central tendency and spread of a dataset. Covered in: Dataset Statistics and Visualization
- Dictionary Python
- A mutable collection of key–value pairs offering fast lookup by key. Covered in: Dictionaries and Structuring Data
- Dimensionality Reduction Machine Learning
- Reducing the number of input features while preserving structure, to fight noise, redundancy, and the curse of dimensionality. Covered in: Dimensionality Reduction
- Distribution Data & Statistics
- How the values of a variable are spread across their possible range, e.g. normal, skewed, or bimodal. Covered in: Dataset Statistics and Visualization
E
- Embedding NLP & LLMs
- A dense numeric vector representing a word, sentence, or item so that similar meanings sit close together. Covered in: Transformers, RAG Models, and LLMs
- ETL / Data Engineering Data & Statistics
- Extract, Transform, Load — building pipelines that move and reshape data from sources into an analysis-ready form. Covered in: Data Engineering and Pipelines
F
- F1 Score Machine Learning
- The harmonic mean of precision and recall, balancing the two into a single metric. Covered in: Model Evaluation
- Feature Machine Learning
- An individual measurable input variable used by a model to make predictions. Covered in: Feature Engineering
- Feature Engineering Machine Learning
- Creating, transforming, and selecting input variables to improve a model's predictive performance. Covered in: Feature Engineering
- Feature Scaling Machine Learning
- Putting numeric features on a comparable range (e.g. standardization or min–max normalization) so no feature dominates by magnitude. Covered in: Feature Engineering
- Function Python
- A reusable, named block of code that takes inputs (arguments), performs work, and optionally returns a value. Covered in: Functions
G
- Grad-CAM Deep Learning & Vision
- An interpretability method that highlights image regions most responsible for a CNN's prediction. Covered in: CNNs and Computer Vision
- Gradient Descent Deep Learning & Vision
- An optimization algorithm that iteratively adjusts parameters in the direction that most reduces the loss. Covered in: Introduction to Neural Networks
H
- Hyperparameter Machine Learning
- A configuration set before training (e.g. learning rate, tree depth) that is tuned rather than learned from data. Covered in: AI/ML Model Training
I
- Inference MLOps
- Using a trained model to generate predictions on new inputs. Covered in: Deployment and Monitoring
L
- Large Language Model (LLM) NLP & LLMs
- A transformer trained on large text corpora to predict tokens, enabling generation, summarization, and Q&A. Covered in: Transformers, RAG Models, and LLMs
- List Python
- An ordered, mutable collection of values accessed by integer index. Covered in: Lists
- Loop Python
- A construct (for/while) that repeats a block of code, typically over a sequence or until a condition stops holding. Covered in: Loops
- Loss Function Deep Learning & Vision
- A function measuring how far predictions are from the truth; training minimizes it. Covered in: Introduction to Neural Networks
M
- Model Monitoring MLOps
- Tracking a deployed model's inputs, predictions, and performance over time to catch degradation. Covered in: Deployment and Monitoring
- Module Program
- One of the nine capability areas the 12-week program is organized around, from Agile through Model Deployment & Monitoring. Covered in: 12-Week Roadmap
N
- Neural Network Deep Learning & Vision
- A model of connected layers of weighted units (neurons) that learns complex mappings from data. Covered in: Introduction to Neural Networks
O
- Object-Oriented Programming (OOP) Python
- A paradigm that organizes code into objects bundling data (attributes) with behavior (methods), built from classes. Covered in: Object-Oriented Programming
- OCR (Optical Character Recognition) Python
- Converting images of text (scans, photos) into machine-readable characters. Covered in: Text in Images (OCR)
- One-Hot Encoding Machine Learning
- Representing a categorical variable as a set of binary columns, one per category. Covered in: Feature Engineering
- Outlier Data & Statistics
- A data point that lies far from the rest of the distribution and can distort summaries or models. Covered in: Dataset Statistics and Visualization
- Overfitting Machine Learning
- When a model learns noise and specifics of the training data and fails to generalize to new data. Covered in: Model Evaluation
P
- Precision Machine Learning
- Of the items predicted positive, the fraction that are actually positive — TP / (TP + FP). Covered in: Model Evaluation
- Principal Component Analysis (PCA) Machine Learning
- A linear technique that projects data onto new axes (principal components) capturing the most variance. Covered in: Dimensionality Reduction
R
- Recall Machine Learning
- Of the actual positives, the fraction the model correctly identifies — TP / (TP + FN). Covered in: Model Evaluation
- Regression Machine Learning
- A supervised task that predicts a continuous numeric value. Covered in: Algorithm Families
- Regular Expression (Regex) Python
- A pattern language for matching, searching, and extracting structured substrings from text. Covered in: Regular Expressions
- Retrieval-Augmented Generation (RAG) NLP & LLMs
- Grounding an LLM's answers by retrieving relevant documents and supplying them as context at generation time. Covered in: Transformers, RAG Models, and LLMs
S
- SQL Data & Statistics
- Structured Query Language — the standard language for querying and manipulating relational databases. Covered in: SQL for Data Science , SQLite Databases
- String Python
- An immutable sequence of characters used to represent text. Covered in: Strings and Text Editing
- Supervised Learning Machine Learning
- Training a model on labeled examples to predict a target for new inputs (classification or regression). Covered in: Algorithm Families
T
- Tier 1 — Data Science Foundations Program
- Understands the concepts, interprets results, and runs existing notebooks. No coding required. Covered in: Capability Tiers
- Tier 2 — Data Science Practitioner Program
- Writes code to prepare data, build, and evaluate models with guidance. Covered in: Capability Tiers
- Tier 3 — Data Science Associate Program
- Works independently across the modeling workflow and makes sound methodological choices. Covered in: Capability Tiers
- Tier 4 — Data Science Professional Program
- Designs solutions, weighs trade-offs, and owns architecture and deployment decisions. Covered in: Capability Tiers
- Token NLP & LLMs
- The atomic unit (word piece) an LLM reads and generates; text is split into tokens before processing. Covered in: Transformers, RAG Models, and LLMs
- Train/Test Split Machine Learning
- Holding out part of the data for evaluation so model performance is measured on examples it did not learn from. Covered in: Model Evaluation
- Transfer Learning Deep Learning & Vision
- Reusing a model pre-trained on a large dataset and adapting it to a related task with less data. Covered in: CNNs and Computer Vision
- Transformer NLP & LLMs
- A neural architecture built on self-attention that processes sequences in parallel; the basis of modern LLMs. Covered in: Transformers, RAG Models, and LLMs
U
- Underfitting Machine Learning
- When a model is too simple to capture the underlying pattern, performing poorly even on training data. Covered in: Model Evaluation
- Unit Testing Toolkit & Process
- Writing small automated checks that verify individual pieces of code behave as expected. Covered in: Testing and Unit Tests
- Unsupervised Learning Machine Learning
- Finding structure in unlabeled data, e.g. clustering or dimensionality reduction. Covered in: Algorithm Families
- Use Case Toolkit & Process
- A specific business problem framed to assess whether — and how — AI/ML can deliver value. Covered in: Framing AI Use Cases
V
- Variable Python
- A named reference that stores a value so it can be reused and changed later in a program. Covered in: Python Basics
- Version Control (Git) Toolkit & Process
- Tracking changes to code over time, enabling history, branching, and collaboration. Covered in: Git and Version Control
W
- Web Scraping Python
- Programmatically extracting data from web pages, typically by fetching HTML and parsing the elements of interest. Covered in: Web Scraping