DeepSeek V3 used 10x less compute than Llama 3 through MLA (multi-head latent attention), MoE innovations, and multi-token prediction, demonstrating 3x yearly algorithmic efficiency gains.
Compute efficiency vs Llama 3: 10x less