Hopper GPU with extended 141GB HBM. 2K TFLOPS FP16 for high-performance compute. 87K tok/s MLPerf. Available from 5 providers starting at $4.50/hr.
High-bandwidth memory capacity
FP16 compute performance
Neural network architecture type
Thermal design power consumption
Lowest available cloud rental price
Average price across providers
Number of cloud providers offering this GPU
Last pricing data refresh date
LLM training time-to-convergence
Software stack maturity and framework support
LLM inference throughput per GPU
Raw compute throughput and memory bandwidth
Market supply and cloud instance availability
AMD MI300X achieves highest per-GPU LLM inference throughput in MLPerf Inference v5.1, delivering 21,150 tokens/s per GPU on llama2-70b, outperforming NVIDIA H100 (15,610 tok/s), B200 (13,015 tok/s), and H200 (10,917 tok/s). Industry-standard benchmark validates AMD's competitiveness in AI inference.
Installed AI compute from NVIDIA chips has more than doubled annually since 2020, with new flagship chips accounting for most compute within 3 years of release.
AWS has launched P5 instances featuring NVIDIA H200 GPUs, now generally available in US East and West regions with EU availability expected in Q1 2026.
NVIDIA H200's 141GB HBM3e memory requires updated CUDA drivers and framework versions. Teams should verify compatibility before migration from H100.
SemiAnalysis launched InferenceMAX, an open-source nightly benchmark comparing GPU inference performance across NVIDIA (H100, H200, B200, GB200 NVL72) and AMD (MI300X, MI325X, MI355X). Endorsed by Jensen Huang, Lisa Su, OpenAI, and Microsoft. First multi-vendor benchmark with TCO and power efficiency metrics.