NeoSignal Scoring System: How We Rate AI Infrastructure Components

NeoSignal Team
January 15, 2026
7 min read

NeoSignal assigns every component a score from 0 to 100. Gemini 3 Pro leads models at 99. Claude Opus 4.5 scores 96. NVIDIA H100 rates 92. But what do these numbers actually mean? How are they calculated? And how should you interpret differences between a 96 and a 92?

NeoSignal component detail showing score breakdown with weighted dimensionsNeoSignal component detail showing score breakdown with weighted dimensions

NeoSignal's scoring system combines multiple dimensions—each weighted by importance—into a single composite score. For models, we measure Code (20%), Math (20%), Reasoning (15%), Intelligence (30%), and Instruction Following. The component detail page reveals the breakdown: Claude Opus 4.5 scores 96 on Code, 95 on Math, 98 on Reasoning, 96 on Intelligence. You see not just the number, but what drives it.

The benefit: understand exactly what each score means and how much weight to give it in your decisions. Trend indicators show whether scores are rising, stable, or declining. Data completeness tiers reveal how much information backs each rating.


Detailed Walkthrough

The Scoring Philosophy

AI infrastructure evaluation traditionally relies on:

  • Benchmark tables: Raw numbers without synthesis
  • Vendor marketing: Biased by commercial interests
  • Community opinion: Useful but inconsistent and unscalable

NeoSignal provides something different: a systematic framework that synthesizes multiple data sources into comparable scores. The goal isn't to declare winners—it's to provide a consistent basis for comparison that accounts for different priorities.

Get personalized signals

AI-curated updates on topics you follow

Score Components

Every NeoSignal score derives from weighted dimensions. The dimensions vary by category:

Model Dimensions:

DimensionWeightDescription
Code20%Code generation, understanding, debugging
Math20%Mathematical reasoning and problem solving
Reasoning15%Multi-step logical reasoning
Intelligence30%Overall task completion and understanding
Instruction Following15%Adherence to user instructions

Accelerator Dimensions:

DimensionWeightDescription
Compute25%Raw TFLOPS and throughput
Memory25%Capacity and bandwidth
Efficiency20%Performance per watt
Availability15%Cloud availability and supply
Ecosystem15%Software support and optimization

Cloud Provider Dimensions:

DimensionWeightDescription
GPU Availability25%Range and depth of GPU options
Pricing20%Cost competitiveness
Performance20%Network, storage, compute quality
Regions15%Geographic coverage
Services20%Supporting services and integrations

Company Dimensions:

DimensionWeightDescription
Team Quality20%Leadership and talent
Market Position20%Competitive standing
Funding Strength20%Capital and valuation
Growth Trajectory20%Revenue and user growth
Technical Leadership20%Product and research quality

The Score Calculation

Each dimension receives a 0-100 sub-score based on available data. The overall score is the weighted average:

Overall = Σ(dimension_score × dimension_weight) / Σ(dimension_weight)

For Claude Opus 4.5:

  • Code: 96 × 0.20 = 19.2
  • Math: 95 × 0.20 = 19.0
  • Reasoning: 98 × 0.15 = 14.7
  • Intelligence: 96 × 0.30 = 28.8
  • Instruction Following: 96 × 0.15 = 14.4

Overall = 96.1 → 96

Benchmark Integration

Model dimension scores derive from benchmark performance where available:

Epoch Capabilities Index (ECI): Primary model intelligence measure. Claude Opus 4.5 shows ECI Score 149.87, normalized to 96/100 on NeoSignal's scale.

Benchmark Coverage:

BenchmarkMeasuresUsed For
ARC-AGIGeneral reasoningReasoning, Intelligence
FrontierMathAdvanced mathematicsMath
SWE-BenchSoftware engineeringCode
MMLUKnowledge breadthIntelligence
HumanEvalCode generationCode

When benchmark data is available, it anchors the dimension score. When unavailable, we rely on secondary indicators (release notes, community reports, comparative analysis).

The Score Breakdown Visualization

Component detail pages show score breakdowns in two formats:

Progress Bars: Each dimension appears with a colored bar, percentage weight, and numeric score:

Code ████████████████████░░░░ 20%  96
Math ███████████████████░░░░░ 20%  95

Radar Charts: Multi-dimensional visualization comparing up to 4 components:

  • Pentagon shape for 5 dimensions
  • Scale zoomed to 90-100 for differentiation
  • Overlapping shapes reveal relative strengths

Trend Indicators

Every score includes a trend indicator:

  • ↑ Rising: Score improved in recent updates
  • → Stable: Score unchanged
  • ↓ Declining: Score dropped due to new data or competitive shifts

Trends reflect score movement over the past 30-90 days. A rising trend might indicate:

  • New benchmark results improving the component's standing
  • Ecosystem improvements (better availability, new integrations)
  • Competitive positioning gains

Trend reasons appear on hover: "Rising due to strong FrontierMath performance" or "Declining as newer models enter the market."

Data Completeness Tiers

Not all components have equal data coverage. NeoSignal indicates confidence through tiers:

Full Tier: 4+ data sources, comprehensive benchmark coverage, recent updates. Highest confidence in score accuracy.

Standard Tier: 2-3 data sources, partial benchmark coverage. Good confidence, some gaps.

Benchmark Tier: Epoch AI data only. Reliable for benchmark dimensions, limited on ecosystem and availability factors.

The tier badge appears on component cards and detail pages. When comparing components, prefer Full Tier ratings for high-stakes decisions.

Score Interpretation Guidelines

90-100 (Excellent): Top-tier components. Best-in-class performance across dimensions. Safe choices for critical workloads.

80-89 (Very Good): Strong performers with minor gaps. Excellent for most use cases; may lack in one dimension.

70-79 (Good): Solid components with clear tradeoffs. Consider carefully for specific needs.

60-69 (Moderate): Usable but with significant limitations. Best for non-critical workloads or specific niches.

Below 60 (Limited): Significant concerns. Consider alternatives unless specific requirements mandate this choice.

Score Differences

How to interpret score gaps:

1-2 points: Essentially equivalent. Differences within measurement noise; choose based on other factors.

3-5 points: Meaningful difference. Higher-scored component likely better, but evaluate dimension breakdowns.

6-10 points: Significant gap. Lower-scored component has clear weaknesses; understand them before choosing.

10+ points: Major difference. Lower-scored component is substantially inferior; choose higher unless specific constraints apply.

Confidence Scoring for Signals

Market signals use a different scoring system—confidence rather than capability:

Signal Confidence Breakdown:

DimensionDescription
Source AuthorityHow authoritative is the source? (Tier 1-4)
Data QualityHow clean and verifiable is the data?
RecencyHow fresh is the information? (days old)
CorroborationHow many sources confirm this?
SpecificityHow specific vs. general is the claim?

A signal with 85% confidence has strong sourcing, recent data, and corroboration. A 60% confidence signal might be based on a single source or older information.

Methodology Transparency

NeoSignal publishes scoring methodology for accountability:

  • Dimension weights appear on every component breakdown
  • Data sources link from component detail pages
  • Last updated timestamps show data freshness
  • Tier indicators reveal data completeness

If you disagree with a score, you can trace it to underlying data and understand why.

Using Scores in Decisions

Shortlisting: Filter components by minimum score. "Show models above 85" eliminates clearly inferior options.

Comparison: With similar overall scores, compare dimension breakdowns. A 92 with strong Code but weak Math differs from a 92 with balanced dimensions.

Trend Analysis: Consider trajectory alongside current score. A rising 88 may soon surpass a stable 91.

Tier Verification: For critical decisions, prefer Full Tier components with maximum data confidence.

Score Updates

Scores update continuously as new data emerges:

  • Benchmark releases: New evaluation results shift dimension scores
  • Market changes: Pricing updates affect cloud provider scores
  • Ecosystem evolution: New integrations improve framework scores
  • Competitive dynamics: New entrants affect relative positioning

Major score changes appear in the Signals feed as "benchmark_update" or "trend_shift" signals.

From Scores to Decisions

NeoSignal scores compress complex, multi-dimensional comparisons into actionable numbers. They're not perfect—no single metric can capture everything—but they provide a consistent framework for evaluation.

Use scores as starting points, not final answers. A 96 beats a 92 on average, but your specific use case might favor the 92's particular strengths. The dimension breakdown helps you understand why scores differ and whether those differences matter for your needs.

That's the NeoSignal approach to scoring: transparent methodology, multiple data sources, weighted dimensions, and honest acknowledgment of confidence levels. Understand the system, and you can use it effectively to navigate the AI infrastructure landscape.

Get personalized signals

AI-curated updates on topics you follow

NeoSignal Scoring System: How We Rate AI Infrastructure Components | NeoSignal Blog | NeoSignal