NVIDIA GB200 NVL72 with TRT-LLM Dynamo achieves 4x better TCO per million tokens than single-node servers for DeepSeek R1 at 30 tok/s/user. Rack-scale inference with disaggregated prefill, wide expert parallelism, and multi-token prediction (MTP) delivers 2-3x throughput gains.
GB200 NVL72 delivers 4x better TCO/M tokens vs single-node H200 at 30 tok/s/user for DeepSeek R1
Oct 1, 2025Multi-Token Prediction (MTP) provides 2-3x throughput improvement at 70-140 tok/s/user
Oct 1, 2025GB200 NVL72 shows ~8x improvement in tokens/s per MW vs single-node H200 FP8
Oct 1, 2025