A selection of personal explorations at the intersection of deep learning, computer vision, and generative models.
Implementation of a real-time human pose estimation pipeline using Google's MediaPipe framework, detecting 33 body landmarks at high frame rates on standard webcam input.
Exploration of unpaired image-to-image translation using CycleGAN architecture. The model learns bidirectional mappings between two visual domains without paired training data.
Implementation and benchmarking of object detection models (YOLO, Faster R-CNN) combined with semantic segmentation approaches to understand scenes at pixel-level granularity.