Blackwell GPU with massive 192GB HBM. 4.5K TFLOPS FP16 for next-generation performance. 52K tok/s MLPerf.
High-bandwidth memory capacity
FP16 compute performance
Neural network architecture type
Thermal design power consumption
LLM training time-to-convergence
Software stack maturity and framework support
LLM inference throughput per GPU
Raw compute throughput and memory bandwidth
Market supply and cloud instance availability
AMD MI300X achieves highest per-GPU LLM inference throughput in MLPerf Inference v5.1, delivering 21,150 tokens/s per GPU on llama2-70b, outperforming NVIDIA H100 (15,610 tok/s), B200 (13,015 tok/s), and H200 (10,917 tok/s). Industry-standard benchmark validates AMD's competitiveness in AI inference.
Installed AI compute from NVIDIA chips has more than doubled annually since 2020, with new flagship chips accounting for most compute within 3 years of release.
NVIDIA's B200 Blackwell GPUs are shipping to hyperscalers, promising 2.5x performance gains over H100 for AI training workloads.
SemiAnalysis launched InferenceMAX, an open-source nightly benchmark comparing GPU inference performance across NVIDIA (H100, H200, B200, GB200 NVL72) and AMD (MI300X, MI325X, MI355X). Endorsed by Jensen Huang, Lisa Su, OpenAI, and Microsoft. First multi-vendor benchmark with TCO and power efficiency metrics.
AMD MI355X delivers lower TCO per million tokens than NVIDIA B200 for GPT-OSS 120B FP4 summarization at interactivity below 225 tok/s/user. MI300X also beats H100 on GPT-OSS 120B MX4 across all interactivity levels. B200 leads on LLaMA 70B FP4 and high-interactivity workloads.