NeoSignal assigns every component a score from 0 to 100. Gemini 3 Pro leads models at 99. Claude Opus 4.5 scores 96. NVIDIA H100 rates 92. But what do these numbers actually mean? How are they calculated? And how should you interpret differences between a 96 and a 92?
NeoSignal component detail showing score breakdown with weighted dimensions
NeoSignal's scoring system combines multiple dimensions—each weighted by importance—into a single composite score. For models, we measure Code (20%), Math (20%), Reasoning (15%), Intelligence (30%), and Instruction Following. The component detail page reveals the breakdown: Claude Opus 4.5 scores 96 on Code, 95 on Math, 98 on Reasoning, 96 on Intelligence. You see not just the number, but what drives it.
The benefit: understand exactly what each score means and how much weight to give it in your decisions. Trend indicators show whether scores are rising, stable, or declining. Data completeness tiers reveal how much information backs each rating.
Detailed Walkthrough
The Scoring Philosophy
AI infrastructure evaluation traditionally relies on:
- Benchmark tables: Raw numbers without synthesis
- Vendor marketing: Biased by commercial interests
- Community opinion: Useful but inconsistent and unscalable
NeoSignal provides something different: a systematic framework that synthesizes multiple data sources into comparable scores. The goal isn't to declare winners—it's to provide a consistent basis for comparison that accounts for different priorities.
Get personalized signals
AI-curated updates on topics you follow
Score Components
Every NeoSignal score derives from weighted dimensions. The dimensions vary by category:
Model Dimensions:
| Dimension | Weight | Description |
|---|---|---|
| Code | 20% | Code generation, understanding, debugging |
| Math | 20% | Mathematical reasoning and problem solving |
| Reasoning | 15% | Multi-step logical reasoning |
| Intelligence | 30% | Overall task completion and understanding |
| Instruction Following | 15% | Adherence to user instructions |
Accelerator Dimensions:
| Dimension | Weight | Description |
|---|---|---|
| Compute | 25% | Raw TFLOPS and throughput |
| Memory | 25% | Capacity and bandwidth |
| Efficiency | 20% | Performance per watt |
| Availability | 15% | Cloud availability and supply |
| Ecosystem | 15% | Software support and optimization |
Cloud Provider Dimensions:
| Dimension | Weight | Description |
|---|---|---|
| GPU Availability | 25% | Range and depth of GPU options |
| Pricing | 20% | Cost competitiveness |
| Performance | 20% | Network, storage, compute quality |
| Regions | 15% | Geographic coverage |
| Services | 20% | Supporting services and integrations |
Company Dimensions:
| Dimension | Weight | Description |
|---|---|---|
| Team Quality | 20% | Leadership and talent |
| Market Position | 20% | Competitive standing |
| Funding Strength | 20% | Capital and valuation |
| Growth Trajectory | 20% | Revenue and user growth |
| Technical Leadership | 20% | Product and research quality |
The Score Calculation
Each dimension receives a 0-100 sub-score based on available data. The overall score is the weighted average:
Overall = Σ(dimension_score × dimension_weight) / Σ(dimension_weight)
For Claude Opus 4.5:
- Code: 96 × 0.20 = 19.2
- Math: 95 × 0.20 = 19.0
- Reasoning: 98 × 0.15 = 14.7
- Intelligence: 96 × 0.30 = 28.8
- Instruction Following: 96 × 0.15 = 14.4
Overall = 96.1 → 96
Benchmark Integration
Model dimension scores derive from benchmark performance where available:
Epoch Capabilities Index (ECI): Primary model intelligence measure. Claude Opus 4.5 shows ECI Score 149.87, normalized to 96/100 on NeoSignal's scale.
Benchmark Coverage:
| Benchmark | Measures | Used For |
|---|---|---|
| ARC-AGI | General reasoning | Reasoning, Intelligence |
| FrontierMath | Advanced mathematics | Math |
| SWE-Bench | Software engineering | Code |
| MMLU | Knowledge breadth | Intelligence |
| HumanEval | Code generation | Code |
When benchmark data is available, it anchors the dimension score. When unavailable, we rely on secondary indicators (release notes, community reports, comparative analysis).
The Score Breakdown Visualization
Component detail pages show score breakdowns in two formats:
Progress Bars: Each dimension appears with a colored bar, percentage weight, and numeric score:
Code ████████████████████░░░░ 20% 96
Math ███████████████████░░░░░ 20% 95
Radar Charts: Multi-dimensional visualization comparing up to 4 components:
- Pentagon shape for 5 dimensions
- Scale zoomed to 90-100 for differentiation
- Overlapping shapes reveal relative strengths
Trend Indicators
Every score includes a trend indicator:
- ↑ Rising: Score improved in recent updates
- → Stable: Score unchanged
- ↓ Declining: Score dropped due to new data or competitive shifts
Trends reflect score movement over the past 30-90 days. A rising trend might indicate:
- New benchmark results improving the component's standing
- Ecosystem improvements (better availability, new integrations)
- Competitive positioning gains
Trend reasons appear on hover: "Rising due to strong FrontierMath performance" or "Declining as newer models enter the market."
Data Completeness Tiers
Not all components have equal data coverage. NeoSignal indicates confidence through tiers:
Full Tier: 4+ data sources, comprehensive benchmark coverage, recent updates. Highest confidence in score accuracy.
Standard Tier: 2-3 data sources, partial benchmark coverage. Good confidence, some gaps.
Benchmark Tier: Epoch AI data only. Reliable for benchmark dimensions, limited on ecosystem and availability factors.
The tier badge appears on component cards and detail pages. When comparing components, prefer Full Tier ratings for high-stakes decisions.
Score Interpretation Guidelines
90-100 (Excellent): Top-tier components. Best-in-class performance across dimensions. Safe choices for critical workloads.
80-89 (Very Good): Strong performers with minor gaps. Excellent for most use cases; may lack in one dimension.
70-79 (Good): Solid components with clear tradeoffs. Consider carefully for specific needs.
60-69 (Moderate): Usable but with significant limitations. Best for non-critical workloads or specific niches.
Below 60 (Limited): Significant concerns. Consider alternatives unless specific requirements mandate this choice.
Score Differences
How to interpret score gaps:
1-2 points: Essentially equivalent. Differences within measurement noise; choose based on other factors.
3-5 points: Meaningful difference. Higher-scored component likely better, but evaluate dimension breakdowns.
6-10 points: Significant gap. Lower-scored component has clear weaknesses; understand them before choosing.
10+ points: Major difference. Lower-scored component is substantially inferior; choose higher unless specific constraints apply.
Confidence Scoring for Signals
Market signals use a different scoring system—confidence rather than capability:
Signal Confidence Breakdown:
| Dimension | Description |
|---|---|
| Source Authority | How authoritative is the source? (Tier 1-4) |
| Data Quality | How clean and verifiable is the data? |
| Recency | How fresh is the information? (days old) |
| Corroboration | How many sources confirm this? |
| Specificity | How specific vs. general is the claim? |
A signal with 85% confidence has strong sourcing, recent data, and corroboration. A 60% confidence signal might be based on a single source or older information.
Methodology Transparency
NeoSignal publishes scoring methodology for accountability:
- Dimension weights appear on every component breakdown
- Data sources link from component detail pages
- Last updated timestamps show data freshness
- Tier indicators reveal data completeness
If you disagree with a score, you can trace it to underlying data and understand why.
Using Scores in Decisions
Shortlisting: Filter components by minimum score. "Show models above 85" eliminates clearly inferior options.
Comparison: With similar overall scores, compare dimension breakdowns. A 92 with strong Code but weak Math differs from a 92 with balanced dimensions.
Trend Analysis: Consider trajectory alongside current score. A rising 88 may soon surpass a stable 91.
Tier Verification: For critical decisions, prefer Full Tier components with maximum data confidence.
Score Updates
Scores update continuously as new data emerges:
- Benchmark releases: New evaluation results shift dimension scores
- Market changes: Pricing updates affect cloud provider scores
- Ecosystem evolution: New integrations improve framework scores
- Competitive dynamics: New entrants affect relative positioning
Major score changes appear in the Signals feed as "benchmark_update" or "trend_shift" signals.
From Scores to Decisions
NeoSignal scores compress complex, multi-dimensional comparisons into actionable numbers. They're not perfect—no single metric can capture everything—but they provide a consistent framework for evaluation.
Use scores as starting points, not final answers. A 96 beats a 92 on average, but your specific use case might favor the 92's particular strengths. The dimension breakdown helps you understand why scores differ and whether those differences matter for your needs.
That's the NeoSignal approach to scoring: transparent methodology, multiple data sources, weighted dimensions, and honest acknowledgment of confidence levels. Understand the system, and you can use it effectively to navigate the AI infrastructure landscape.