Hopper GPU with large 80GB HBM. 2K TFLOPS FP16 for high-performance compute. 125K tok/s MLPerf. Available from 12 providers starting at $1.49/hr.
High-bandwidth memory capacity
FP16 compute performance
Neural network architecture type
Thermal design power consumption
Lowest available cloud rental price
Average price across providers
Number of cloud providers offering this GPU
Last pricing data refresh date
LLM training time-to-convergence
Software stack maturity and framework support
LLM inference throughput per GPU
Raw compute throughput and memory bandwidth
Market supply and cloud instance availability
AMD MI300X achieves highest per-GPU LLM inference throughput in MLPerf Inference v5.1, delivering 21,150 tokens/s per GPU on llama2-70b, outperforming NVIDIA H100 (15,610 tok/s), B200 (13,015 tok/s), and H200 (10,917 tok/s). Industry-standard benchmark validates AMD's competitiveness in AI inference.
The best open models runnable on consumer GPUs lag frontier AI by only ~1 year across GPQA, MMLU, and LMArena benchmarks, suggesting rapid capability democratization and regulatory implications.
Installed AI compute from NVIDIA chips has more than doubled annually since 2020, with new flagship chips accounting for most compute within 3 years of release.
AWS has launched P5 instances featuring NVIDIA H200 GPUs, now generally available in US East and West regions with EU availability expected in Q1 2026.
PyTorch has surpassed 100 million weekly downloads on PyPI, cementing its position as the dominant deep learning framework for research and production deployments.