100 Trillion Tokens: What LLM Usage Data Reveals

NeoSignal Team
January 18, 2026
8 min read

What do 100 trillion tokens of LLM usage reveal about how we actually interact with AI? OpenRouter, the multi-model inference platform serving millions of developers globally, released an empirical study that challenges assumptions about AI adoption. The findings paint a picture of an ecosystem far more diverse, creative, and agent-driven than most realize.

100 Trillion Tokens: What LLM Usage Data Reveals100 Trillion Tokens: What LLM Usage Data Reveals

The Dataset: A Unique Window Into LLM Behavior

OpenRouter processes requests across 300+ models from 60+ providers, creating a uniquely representative dataset of real-world usage. The study analyzed anonymized metadata from billions of prompt-completion pairs spanning two years. Crucially, this is behavioral data at scale, not benchmark performance or marketing claims.

Three headline findings emerge:

  1. Open source models now serve ~30% of all tokens, up from negligible share in late 2024
  2. Roleplay and creative uses dominate open model usage, challenging productivity-first narratives
  3. Agentic inference has become the majority pattern, with reasoning models exceeding 50% of traffic

Save & organize insights

Save articles and excerpts to your personal library

Open Source Reaches 30% Market Share

The balance between proprietary and open-weight models has shifted decisively. While OpenAI and Anthropic still lead in structured business tasks, open source models collectively serve roughly one-third of all tokens.

This growth correlates with major releases. DeepSeek R1 and subsequent versions drove substantial adoption spikes, as did Qwen family releases. Importantly, usage persists beyond initial release weeks, indicating genuine production integration rather than experimentation.

Chinese-developed models account for much of this growth. Starting from 1.2% weekly share in late 2024, Chinese open source models reached nearly 30% in some weeks by late 2025. The Qwen family alone now powers 40% of all new fine-tunes on Hugging Face, overtaking Meta's Llama.

The equilibrium has stabilized at roughly 30% open source. These models are not mutually exclusive with proprietary ones; they complement each other within a multi-model stack.

The competitive dynamics have shifted from near-monopoly to genuine pluralism. No single open model exceeds 25% of the open source segment, with five to seven models maintaining meaningful share simultaneously.

The Surprise: Roleplay Dominates Creative Use

Perhaps the most unexpected finding: over half of all open source model usage falls under roleplay and creative dialogue. Programming ranks second, but the margin is substantial.

This counters the narrative that LLMs are primarily productivity tools. In practice, many users engage with these models for entertainment, storytelling, and character-driven experiences. Open models excel here because they offer more flexibility without commercial content restrictions.

The roleplay category breaks down into:

  • Gaming and Roleplaying Games: ~60% of roleplay tokens
  • Writers Resources: ~15.6%
  • Adult content: ~15.4%

This isn't casual chatting. Users treat LLMs as structured roleplaying engines, maintaining character consistency across extended interactions. The data shows a well-defined, replicable use case that differs markedly from productivity applications.

For Chinese open source models specifically, the pattern differs. Roleplay accounts for only 33%, while programming and technology combined represent 39%. This suggests models like DeepSeek and Qwen are increasingly competitive in technical domains, not just creative ones.

Programming: The Contested Battleground

While roleplay dominates open source, programming has become the most strategically important category overall. Programming queries grew from 11% of total tokens in early 2025 to over 50% in recent weeks.

Claude dominates this category, accounting for over 60% of programming-related spend through most of 2025. However, the landscape is shifting. During the week of November 17, Anthropic's share fell below 60% for the first time.

OpenAI expanded from roughly 2% to 8% share in recent weeks, while Google maintained approximately 15%. Open source providers including Mistral and Qwen are making steady inroads.

The programming segment exhibits distinct characteristics:

  • Average prompt length exceeds 20K tokens, far higher than other categories
  • Context includes codebases, documentation, and long conversations
  • Outputs tend to be concise, high-value insights rather than extended generation

Models without reliable code capabilities risk falling behind in enterprise adoption. The bar keeps rising as coding agents and IDE integrations become standard workflow components.

Agentic Inference: The New Default

The most significant structural shift is from single-turn interactions to multi-step, tool-integrated workflows. Reasoning models now handle over 50% of all tokens, up from negligible share at the start of 2025.

This reflects both supply and demand changes. On the supply side, releases like GPT-5, Claude 4.5, and Gemini 3 expanded what users expect from stepwise reasoning. On demand, users increasingly prefer models that manage task state, follow multi-step logic, and support agent-style workflows.

Tool-calling behavior has grown consistently throughout 2025. While initially concentrated among OpenAI's gpt-4o-mini and Anthropic's Claude series, tool invocation has spread to a broader ecosystem including Grok variants and open models.

The anatomy of requests has evolved accordingly:

  • Average prompt tokens quadrupled from ~1.5K to over 6K
  • Average completion tokens nearly tripled from ~150 to ~400
  • Sequence length more than tripled from under 2K to over 5.4K tokens

The median LLM request is no longer a simple question. Instead, it's part of a structured, agent-like loop, invoking external tools, reasoning over state, and persisting across longer contexts.

For infrastructure operators, this raises the bar on latency, tool handling, context support, and robustness. Soon enough, agentic inference will dominate the majority of inference workloads.

The Glass Slipper Effect: Why Early Cohorts Stay

The study introduces a compelling retention framework called the "Glass Slipper" phenomenon. In a rapidly evolving ecosystem, there exists a latent distribution of high-value workloads that remain unsolved across successive model generations. Each new frontier model is effectively "tried on" against these open problems.

When a newly released model happens to match a previously unmet constraint, it achieves precise fit. For developers whose workloads finally "fit," this alignment creates strong lock-in effects. Systems, data pipelines, and user experiences become anchored to the model that solved their problem first.

The data shows this clearly in retention curves:

  • Claude 4 Sonnet's May 2025 cohort retains approximately 40% at Month 5
  • Gemini 2.5 Pro's June 2025 cohort shows similarly elevated retention
  • Later cohorts for both models churn rapidly and cluster at the bottom

Conversely, models that never establish foundational fit struggle. Gemini 2.0 Flash and Llama 4 Maverick show no high-performing foundational cohort, with every cohort performing identically poorly.

DeepSeek models exhibit an unusual pattern: "resurrection jumps" where churned users return after trying alternatives. This "boomerang effect" suggests users confirm through competitive testing that DeepSeek provides optimal fit for their specific workload.

The implication: product-market fit equals workload-model fit. Being first to solve a real pain point drives deep, sticky adoption. In an increasingly fast-moving market, capturing foundational cohorts early determines who endures after the next capability leap.

Geography: Asia's Rise as Consumer and Producer

LLM usage is becoming genuinely global. North America, while still the largest region, now accounts for less than half of total spend. Asia's share grew from approximately 13% to 31% over the observed period.

RegionShare of Spend
North America47.2%
Asia28.6%
Europe21.3%
Other2.9%

English dominates at 82.9% of tokens, but Chinese accounts for nearly 5%. China has emerged not only as a consumer but as a major model producer and exporter. The success of DeepSeek, Qwen, and Moonshot AI demonstrates that competitive LLMs are now a global resource.

For model builders, cross-regional usability across languages, compliance regimes, and deployment settings is becoming table stakes.

Cost vs. Usage: Not Yet a Commodity

The LLM market does not behave like a commodity. Price alone explains little about usage. The correlation between cost and adoption is weak, with a nearly flat trendline suggesting demand is relatively price-inelastic.

Two distinct regimes appear:

  • Premium leaders like Claude 3.7 Sonnet command ~$2/M tokens with high usage
  • Efficient giants like Gemini 2.0 Flash pair strong performance with prices under $0.40/M tokens

Proprietary models retain pricing power for mission-critical applications, while open models capture high-volume, cost-sensitive workloads. This fragmentation suggests differentiation through latency, context length, and output quality remains strategically valuable.

Key Takeaways

1. Multi-model is the default. No single model dominates. Users increasingly maintain flexibility, choosing the best model for each task rather than betting on one provider.

2. Creative use cases are massive. Roleplay and entertainment drive volumes comparable to professional productivity. This represents a significant market opportunity for models optimized for engagement and character consistency.

3. Agentic inference is taking over. Multi-step, tool-integrated workflows now represent the majority pattern. Models and infrastructure must support orchestration, not just generation.

4. Early fit creates durable advantage. The Glass Slipper effect shows that being first to solve a workload creates sticky adoption. Retention, not growth, is the signal to watch.

5. The market is global. Asia now represents nearly 30% of demand. Cross-regional capability is essential for any serious model provider.

Explore Further

Build your AI stack with real-time compatibility data:


Source: OpenRouter State of AI Report, December 2025

Save & organize insights

Save articles and excerpts to your personal library

100 Trillion Tokens: What LLM Usage Data Reveals | NeoSignal Blog | NeoSignal