Computer Vision#
The Computer Vision toolkit provides a hands-on journey through modern image understanding techniques - from classic CNN classifiers to cutting-edge generative and tracking models - all running on AMD GPU acceleration.
CV01 - Image Classification with CNN#
Build a Convolutional Neural Network from scratch and train it on the CIFAR-100 dataset (100 object categories). You will implement convolutional blocks, batch normalization, dropout, and a fully connected classifier head using PyTorch, then monitor training progress through loss and accuracy curves.
CV02 - Deep Residual Networks (ResNet-50)#
Train a ResNet-50 classifier on CIFAR-100 and explore how residual (skip) connections solve the vanishing-gradient problem in very deep networks. The lab covers Top-1/Top-5 accuracy evaluation and qualitative inspection of sample predictions.
CV03 - Object Detection with YOLOv9#
Apply YOLOv9 (“You Only Look Once”), a state-of-the-art one-stage detector, to locate and classify multiple objects in a single forward pass. You will train the model for ~10 epochs, evaluate it on validation images, and visualize detection bounding boxes.
CV04 - Semantic Segmentation with SegNet#
Train a SegNet encoder–decoder on the CamVid autonomous-driving dataset to assign a class label (road, car, pedestrian, building, …) to every pixel in an image. The lab saves checkpoints and produces side-by-side comparisons of predictions vs. ground truth.
CV05 - Segment Anything (SAM)#
Run inference with Meta AI’s Segment Anything Model (SAM), a foundation model that generalises to any image without task-specific training. The lab covers both automatic (all-mask) and prompt-based (point/box) segmentation modes, producing coloured overlay maps.
CV06 - Multi-Object Tracking with YOLOv8 + ByteTrack#
Apply a pretrained YOLOv8 detector combined with the ByteTrack association algorithm to track multiple objects across video frames, assigning each a persistent identity. No training is required - the lab focuses on end-to-end inference on custom videos.
CV07 - Variational Autoencoder (VAE & cVAE)#
Implement a Variational Autoencoder on MNIST to learn a probabilistic latent space, then sample from it to generate new handwritten digits. The lab also covers the Conditional VAE (cVAE) variant, which lets you control which digit class is generated.