Project 2 · Jun 2025 – Nov 2025
GPU VDI Benchmark Automation
Technical Support Engineer @ PuzzleSystems · Python, PyTorch, TensorFlow, VMware vSphere, PowerShell
The Problem
A client (SK E&S) needed to validate the performance of VMware's virtualized GPU (vGPU) environment before launching a subscription-based GPU rental service. The key questions: how does GPU performance degrade under concurrent VM workloads? What is the maximum number of VDI sessions that can run GPU-intensive tasks simultaneously? How do resource utilization patterns change over extended stress periods? Manual benchmarking was time-consuming and produced inconsistent results.
The Solution
I built an automated GPU benchmarking pipeline:
- Stress-test scripts using PyTorch and TensorFlow that simulate realistic GPU workloads (matrix operations, model training, inference) inside VMware vGPU environments
- Automated data collection capturing GPU utilization, memory, temperature, and throughput at configurable intervals
- Statistical analysis and reporting that processes accumulated data and generates incremental reports with trend analysis
Results
- Provided the quantitative data SK E&S used to launch their GPU rental service
- Turned a multi-day manual test cycle into an unattended overnight run
- Delivered reproducible results across different vGPU configurations
Source code is held under prior employer's confidentiality.