Lab: Image transforms and augmentation (PyTorch / torchvision)¶
BWXT Data Science Workforce Training Pilot · Computer vision labs
Goals¶
- See how
torchvision.transformschange pixel data before a model ever runs. - Compare deterministic transforms (resize, crop, normalize) with random augmentations that change each epoch.
- Build a
Composepipeline typical for transfer learning (train vs eval).
Prerequisites¶
pip install torch torchvision matplotlib numpy
Dataset¶
We use a single CIFAR-10 image as a running example (color, $32\times32$). The same ideas apply to industrial inspection images at higher resolution.
import random
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import torch
from torchvision import transforms
from torchvision.datasets import CIFAR10
from torchvision.transforms import InterpolationMode
from torchvision.utils import make_grid
plt.rcParams['figure.figsize'] = (10, 4)
torch.manual_seed(42)
random.seed(42)
# One CIFAR-10 RGB image (PIL)
root = Path('./data')
train_ds = CIFAR10(root=root, train=True, download=True)
pil_img, label_idx = train_ds[0]
class_names = train_ds.classes
print('Label:', class_names[label_idx])
display_np = np.asarray(pil_img)
plt.imshow(display_np)
plt.title('Original PIL image (H×W×C)')
plt.axis('off')
plt.show()
1. Deterministic geometry and tensor conversion¶
Resize/CenterCropchange where the model looks.ToTensor()scales uint8 $[0,255]$ to float $[0,1]$ and rearranges to C×H×W (channel-first), which PyTorch layers expect.Normalize(mean, std)applies $(x - \mu) / \sigma$ per channel. For custom data you often compute dataset statistics; pretrained ImageNet weights use fixedmean/std.
geo = transforms.Compose(
[
transforms.Resize(48, interpolation=InterpolationMode.BILINEAR),
transforms.CenterCrop(40),
transforms.ToTensor(),
]
)
t = geo(pil_img)
print('Tensor shape (C,H,W):', tuple(t.shape))
print('dtype / min / max:', t.dtype, float(t.min()), float(t.max()))
# Visualize tensor: permute to H,W,C for matplotlib
plt.imshow(t.permute(1, 2, 0).numpy())
plt.title('Resize 48 → CenterCrop 40 → ToTensor')
plt.axis('off')
plt.show()
2. Random augmentations (training only)¶
Random transforms simulate lighting, viewpoint, and composition changes. They usually hurt validation metrics slightly but improve generalization when the deployment camera or scene drifts.
Common choices: RandomHorizontalFlip, RandomRotation, ColorJitter, RandomResizedCrop.
augment = transforms.Compose(
[
transforms.RandomResizedCrop(32, scale=(0.7, 1.0)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomRotation(15, interpolation=InterpolationMode.BILINEAR),
transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05),
transforms.ToTensor(),
]
)
fig, axes = plt.subplots(2, 4, figsize=(12, 6))
for ax in axes.flat:
out = augment(pil_img)
ax.imshow(out.permute(1, 2, 0).numpy())
ax.axis('off')
plt.suptitle('Eight independent random augmentations of the same image')
plt.tight_layout()
plt.show()
3. Train vs eval pipeline (ImageNet-style normalization)¶
For evaluation, we avoid resizing randomness: fixed Resize + CenterCrop (or fixed short side). Then apply the same Normalize as training.
Below we show two grids: eight train samples vs eight eval samples from the same PIL image (eval grid is nearly identical across runs; train varies).
IMAGENET_MEAN = (0.485, 0.456, 0.406)
IMAGENET_STD = (0.229, 0.224, 0.225)
train_tf = transforms.Compose(
[
transforms.RandomResizedCrop(32, scale=(0.8, 1.0)),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(0.15, 0.15, 0.15, 0.04),
transforms.ToTensor(),
transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD),
]
)
eval_tf = transforms.Compose(
[
transforms.Resize(40),
transforms.CenterCrop(32),
transforms.ToTensor(),
transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD),
]
)
train_batch = torch.stack([train_tf(pil_img) for _ in range(8)])
eval_batch = torch.stack([eval_tf(pil_img) for _ in range(8)])
# Un-normalize for display only
def unnorm(t: torch.Tensor) -> torch.Tensor:
mean = torch.tensor(IMAGENET_MEAN).view(3, 1, 1)
std = torch.tensor(IMAGENET_STD).view(3, 1, 1)
return (t * std + mean).clamp(0, 1)
plt.figure(figsize=(10, 4))
plt.imshow(make_grid(unnorm(train_batch), nrow=4).permute(1, 2, 0).numpy())
plt.title('Train-style augmentations + ImageNet normalize (unnormalized for display)')
plt.axis('off')
plt.show()
plt.figure(figsize=(10, 4))
plt.imshow(make_grid(unnorm(eval_batch), nrow=4).permute(1, 2, 0).numpy())
plt.title('Eval-style deterministic pipeline')
plt.axis('off')
plt.show()
4. Try this¶
- Swap
RandomResizedCropforRandomAffine(shear, translate). When might shear help or hurt weld or PCB images? - For grayscale data, either use single-channel tensors with
Normalizeusing one mean/std, or repeat the channel to RGB and reuse ImageNet normalization (common trick with pretrained backbones). - Log the exact
train_tf/eval_tfin experiment notes so results are reproducible.