Pi labs logo
/
Pi Scoring Models
Scoring and Quality Engineering
Why should you user a scorer?
1
Scoring is the foundation of (online & offline) Quality Engineering.
2
Scoring increases the sophistication of your quality signals.
Level 1
Testing
Test against your labeled set
Level 2
Evals
Raters assess for open-ended subjective success
LLM-as-a-judge gets us here
Level 3
Scoring
Raters assess for open-ended subjective success
Foundation Scoring Models
High Precision
Scorers precisely assess many granular dimensions vs. imprecisely assess coarse ones
Low Variance
Scoring the same thing twice / with different verbiage gives the same score
Smooth score distribution
Scoring differentiates nuances (good vs. better) vs. bi-modal (good vs. bad)
Human Calibration
Scoring can be calibrated to ground truth from humans
High Explainability
Quick scan of scoring output immediately shows loss origin
Low Latency
Score 20+ dimensions in 100ms
3
Decoder models: Why do they struggle with scoring?
variance
High Variance
  • Next token prediction introduces randomness per iteration.
  • Autoregressive nature means randomness can compound across iterations.
slow
High Latency
  • Autoregressive nature adds cost without adding any new information.
  • Not trained for scoring, so requires reasoning for accuracy which is slow.
Low Accuracy
  • Trained for text generation, not for scoring.
  • Autoregressive nature (looking only at past tokens vs. full context) hurts accuracy.
compatibility
Bimodal Score Distribution
  • Gives either good or bad scores, without showcasing any nuance.
  • Trained for helpfulness, avoids any intention to “criticize”.
No Composability
  • No architecture to string together multiple signals, including non-language signals like code and other models.
Low Trainability
  • Prompt based, can only “calibrate” with English vs. with ground truth data.
Low Explainability
  • Requires parsing English justification of the score.
  • Justification tends to be reverse-rationalized as an artifact of “next token prediction”.
Next up
2: 2: The Pi Architecture
Learn how Pi's scoring framework works
Start Chapter 2
© 2025, Pi Labs Inc.