Lab: Image transforms and augmentation (PyTorch / torchvision)¶

BWXT Data Science Workforce Training Pilot · Computer vision labs

Goals¶

  • See how torchvision.transforms change pixel data before a model ever runs.
  • Compare deterministic transforms (resize, crop, normalize) with random augmentations that change each epoch.
  • Build a Compose pipeline typical for transfer learning (train vs eval).

Prerequisites¶

pip install torch torchvision matplotlib numpy

Dataset¶

We use a single CIFAR-10 image as a running example (color, $32\times32$). The same ideas apply to industrial inspection images at higher resolution.

In [ ]:
import random
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import torch
from torchvision import transforms
from torchvision.datasets import CIFAR10
from torchvision.transforms import InterpolationMode
from torchvision.utils import make_grid

plt.rcParams['figure.figsize'] = (10, 4)

torch.manual_seed(42)
random.seed(42)

# One CIFAR-10 RGB image (PIL)
root = Path('./data')
train_ds = CIFAR10(root=root, train=True, download=True)
pil_img, label_idx = train_ds[0]
class_names = train_ds.classes
print('Label:', class_names[label_idx])
display_np = np.asarray(pil_img)
plt.imshow(display_np)
plt.title('Original PIL image (H×W×C)')
plt.axis('off')
plt.show()

1. Deterministic geometry and tensor conversion¶

  • Resize / CenterCrop change where the model looks.
  • ToTensor() scales uint8 $[0,255]$ to float $[0,1]$ and rearranges to C×H×W (channel-first), which PyTorch layers expect.
  • Normalize(mean, std) applies $(x - \mu) / \sigma$ per channel. For custom data you often compute dataset statistics; pretrained ImageNet weights use fixed mean/std.
In [ ]:
geo = transforms.Compose(
    [
        transforms.Resize(48, interpolation=InterpolationMode.BILINEAR),
        transforms.CenterCrop(40),
        transforms.ToTensor(),
    ]
)
t = geo(pil_img)
print('Tensor shape (C,H,W):', tuple(t.shape))
print('dtype / min / max:', t.dtype, float(t.min()), float(t.max()))

# Visualize tensor: permute to H,W,C for matplotlib
plt.imshow(t.permute(1, 2, 0).numpy())
plt.title('Resize 48 → CenterCrop 40 → ToTensor')
plt.axis('off')
plt.show()

2. Random augmentations (training only)¶

Random transforms simulate lighting, viewpoint, and composition changes. They usually hurt validation metrics slightly but improve generalization when the deployment camera or scene drifts.

Common choices: RandomHorizontalFlip, RandomRotation, ColorJitter, RandomResizedCrop.

In [ ]:
augment = transforms.Compose(
    [
        transforms.RandomResizedCrop(32, scale=(0.7, 1.0)),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.RandomRotation(15, interpolation=InterpolationMode.BILINEAR),
        transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05),
        transforms.ToTensor(),
    ]
)

fig, axes = plt.subplots(2, 4, figsize=(12, 6))
for ax in axes.flat:
    out = augment(pil_img)
    ax.imshow(out.permute(1, 2, 0).numpy())
    ax.axis('off')
plt.suptitle('Eight independent random augmentations of the same image')
plt.tight_layout()
plt.show()

3. Train vs eval pipeline (ImageNet-style normalization)¶

For evaluation, we avoid resizing randomness: fixed Resize + CenterCrop (or fixed short side). Then apply the same Normalize as training.

Below we show two grids: eight train samples vs eight eval samples from the same PIL image (eval grid is nearly identical across runs; train varies).

In [ ]:
IMAGENET_MEAN = (0.485, 0.456, 0.406)
IMAGENET_STD = (0.229, 0.224, 0.225)

train_tf = transforms.Compose(
    [
        transforms.RandomResizedCrop(32, scale=(0.8, 1.0)),
        transforms.RandomHorizontalFlip(),
        transforms.ColorJitter(0.15, 0.15, 0.15, 0.04),
        transforms.ToTensor(),
        transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD),
    ]
)

eval_tf = transforms.Compose(
    [
        transforms.Resize(40),
        transforms.CenterCrop(32),
        transforms.ToTensor(),
        transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD),
    ]
)

train_batch = torch.stack([train_tf(pil_img) for _ in range(8)])
eval_batch = torch.stack([eval_tf(pil_img) for _ in range(8)])

# Un-normalize for display only
def unnorm(t: torch.Tensor) -> torch.Tensor:
    mean = torch.tensor(IMAGENET_MEAN).view(3, 1, 1)
    std = torch.tensor(IMAGENET_STD).view(3, 1, 1)
    return (t * std + mean).clamp(0, 1)

plt.figure(figsize=(10, 4))
plt.imshow(make_grid(unnorm(train_batch), nrow=4).permute(1, 2, 0).numpy())
plt.title('Train-style augmentations + ImageNet normalize (unnormalized for display)')
plt.axis('off')
plt.show()

plt.figure(figsize=(10, 4))
plt.imshow(make_grid(unnorm(eval_batch), nrow=4).permute(1, 2, 0).numpy())
plt.title('Eval-style deterministic pipeline')
plt.axis('off')
plt.show()

4. Try this¶

  1. Swap RandomResizedCrop for RandomAffine (shear, translate). When might shear help or hurt weld or PCB images?
  2. For grayscale data, either use single-channel tensors with Normalize using one mean/std, or repeat the channel to RGB and reuse ImageNet normalization (common trick with pretrained backbones).
  3. Log the exact train_tf / eval_tf in experiment notes so results are reproducible.