SGLang

FrameworksRadixAttention enables 6.4x throughput gains on RAG workloads

Inference serving framework with RadixAttention for KV-cache reuse. Stable latency (4-21ms), optimized for multi-turn chat and RAG.

Metrics

GitHub Stars

12K

GitHub repository popularity

Radar Status

trial

ThoughtWorks Technology Radar assessment

Framework Type

inference

Primary framework purpose

Weekly Downloads

95K

Package manager install frequency

adoption

30%78

Community size, growth velocity, and industry usage

ecosystem

25%86

Integrations, plugins, and extension availability

performance

30%94

Execution speed, latency, and resource efficiency