https://github.com/yonigozlan/OptimVision

Overall Goal

Optimize the inference speed of vision models within the Hugging Face Transformers library, with a focus on models compiled using PyTorch's torch.compile.

This is first done by:

  1. Using relevant profiling tools
  2. Identify and correct bottlenecks in performance

Profiling and debugging: Tensorboard pytorch profiler

Most useful dashboards:

Overview:

Screenshot 2024-09-01 184058.png

Trace:

Screenshot 2024-09-01 183840.png

Screenshot 2024-09-01 183752.png


Torch.compile

Different modes (from docs):