CDNA3 GPU with massive 192GB HBM. 1.3K TFLOPS FP16 for production-grade compute. 169K tok/s MLPerf.
High-bandwidth memory capacity
FP16 compute performance
Neural network architecture type
Thermal design power consumption
LLM training time-to-convergence
Software stack maturity and framework support
LLM inference throughput per GPU
Raw compute throughput and memory bandwidth
Market supply and cloud instance availability
AMD MI300X achieves highest per-GPU LLM inference throughput in MLPerf Inference v5.1, delivering 21,150 tokens/s per GPU on llama2-70b, outperforming NVIDIA H100 (15,610 tok/s), B200 (13,015 tok/s), and H200 (10,917 tok/s). Industry-standard benchmark validates AMD's competitiveness in AI inference.
AMD's MI300X is seeing increased adoption as enterprises seek alternatives to NVIDIA's supply-constrained GPUs, with 192GB memory enabling larger model deployments.