Methodology · Validation

THE ACCOUNTABILITY NUMBERS

Every metric on this page is recomputed nightly from the full season of player-game data and published as-is. If the model degrades, this page says so before we do. Methodology ↗

2025-26 season · 1,318 games · 28,626 player-games · computed 2026-07-30T13:10:09Z

0.68

Margin R²

How much of every real final margin the model's per-player credit explains when summed by team.

84%

Winner agreement

Share of games where the team with more total Impact actually won.

0.98

Reliability (split-half)

Spearman-Brown adjusted correlation between each player's even- and odd-game Impact (421 players, 30+ games each). Measures whether Impact is signal or noise.

0.29

RAPM agreement

Correlation between the box-only signal and Ridge-regularized APM across 431 players (20+ games). Computed on the box-only signal — the published blend already includes RAPM, so testing the blend would be circular. This number is modest by construction: one season of RAPM is noisy, which is exactly why the published blend weights the box signal more heavily.

CALIBRATION BY DECILE

Games bucketed by the model's predicted margin (sum of home Impact minus sum of away Impact). A calibrated model's actual margins should rise monotonically with its predicted ones. Slope: 0.299 · mean absolute error: 7.3 pts.

DecileModel credit diff (avg)Actual margin (avg)Games

1 -77.3 -22.4 132

2 -44.2 -13.3 132

3 -26.9 -7.8 132

4 -14.4 -3.9 131

5 -3.3 +1.5 132

6 +9.0 +4.0 132

7 +20.9 +8.1 131

8 +34.2 +11.9 132

9 +50.5 +16.7 132

10 +81.6 +23.3 132

HOW TO READ THESE — AND HOW NOT TO

The margin fit is explanatory, not predictive. Per-game Impact is computed from the same game's events, so it must broadly agree with the scoreboard. The honest test here is whether credit assignment sums back to real margins with a stable slope and calibrated deciles — a check on attribution, not a forecast. Pregame prediction is a different (and harder) problem.
Reliability is the number to watch. A metric can "explain" games while being noise at the player level. Split-half reliability measures whether a player's Impact in half his games predicts his Impact in the other half.
RAPM agreement uses independent signals. We correlate the box-only component against RAPM precisely because the published number blends them — quoting a correlation between a blend and its own ingredient would be circular.
Garbage time is discounted, not ignored. Blowout-minute production carries reduced possession value inside the impact pipeline (a context-layer adjustment applied to 14% of player-games this season) — stat-padding in dead games doesn't buy leaderboard position, but it isn't zeroed out either.
Known blind spots (from the methodology page, unchanged): off-ball defense, screening value, and gravity are systematically under-measured by the box signal; elite players on elite teams are slightly under-rated by RAPM's team-demeaning.

THE ACCOUNTABILITY NUMBERS

CALIBRATION BY DECILE

HOW TO READ THESE — AND HOW NOT TO

GET YOUR TEAM'S BREAKDOWN