AMD MI355X delivers lower TCO per million tokens than NVIDIA B200 for GPT-OSS 120B FP4 summarization at interactivity below 225 tok/s/user. MI300X also beats H100 on GPT-OSS 120B MX4 across all interactivity levels. B200 leads on LLaMA 70B FP4 and high-interactivity workloads.
MI355X beats B200 vLLM and B200 TRT-LLM on TCO for GPT-OSS 120B when interactivity < 225 tok/s/user
Oct 1, 2025MI300X outperforms H100 on GPT-OSS 120B MX4 across entire interactivity range on TCO basis
Oct 1, 2025B200 significantly outperforms MI355X on LLaMA 70B FP4 - AMD FP4 kernels need improvement
Oct 1, 2025