AMD MI300X achieves highest per-GPU LLM inference throughput in MLPerf Inference v5.1, delivering 21,150 tokens/s per GPU on llama2-70b, outperforming NVIDIA H100 (15,610 tok/s), B200 (13,015 tok/s), and H200 (10,917 tok/s). Industry-standard benchmark validates AMD's competitiveness in AI inference.
MI300X achieves 169,197 tokens/s total (8 GPUs) = 21,150 tokens/s per GPU
Jan 10, 2026H100 achieves 124,879 tokens/s (8 GPUs) = 15,610 tokens/s per GPU
Jan 10, 2026B200 achieves 52,062 tokens/s (4 GPUs) = 13,015 tokens/s per GPU
Jan 10, 2026MLPerf v5.1 includes 1,448 benchmark entries from 27 submitters
Jan 10, 2026