Triton Inference Server Cheat Sheet

I use Triton a lot with PyTorch, and I always find it annoying to spin up a new configuration. The format (protobuf text) isn't super intuitive, and I often forget the naming of some parameters (like max_sequence_idle_microseconds 😱).

This page has basic configs for the most common functionalities that you can quickly copy and use!

I'm on X/Twitter if you have questions (or if something is broken).

Basic Pytorch Config Ensemble Config Sequence Batching Config Ragged Batching Config Implicit State Management Config Oldest Strategy Sequence Batching Config