Pi Labs - Scoring that evolves with your AI

Not sure what to measure? Pi figures it out for you. Feed it any or all of your prompts, your PRDs, your user feedback, or just sit down and chat with it and it will help you figure out the best calibrated metrics for your application.

Quick, deterministic scores

Tap to view

Our foundation model, Pi Scorer, scores more accurately than Deepseek and GPT 4.1, but runs at the size and speed of GPT Mini and Gemini Flash. You can score 20+ custom dimensions in less than 100msec; it’s that fast.

Framework agnostic

Tap to view

A single Pi Scorer can be used in every part of your AI stack and existing tools: offline evals, online observability, training data quality, model optimization, agent control flows and more. Easily plug Pi into Google Spreadsheets, Promptfoo, CrewAI, or any other tool you might be using.

A foundation model designed for scoring

We train our models to understand principles, not mimic content. We continuously monitor performance to improve quality with each release.

Aligned with your users & experts.

The best metrics align with human judgment. You can continuously improve your Pi scoring system by calibrating it on your own labels, preferences, and user data, adjusting to match your team's expertise and actual user behavior in a virtuous feedback loop.

Fully captures correctness and taste.

Pi’s scoring system combines soft measures like natural language quality, hard measures like code correctness, and trained measures like thumbs-up prediction. This comprehensiveness gives you the highest quality evals, reward models, ranking functions, and agent decision nodes.

5x cheaper than LLM judges.

Maintaining the performance of a large model on a smaller size means you can afford to measure all that matters to you without running a massive bill. You can reinvest your savings to measure even more dimensions, more frequently, across your workflows.

Start scoring for free today

Get started with just a few lines of code

Read the docs

from withpi import PiClient

pi = PiClient()
scores = pi.scoring_system.score(
  llm_input="Pi Labs",
  llm_output="Score anything with Pi Labs today!",
  scoring_spec=[{"question": "Is there a strong call to action?"}]
)
print(scores.total_score)

The Hidden Costs of LLM-as-a-Judge: Why Your Evals Are Failing

Monday, June 16, 2025

Beyond Intuition: Building Principled LLM Applications

Monday, June 16, 2025

Home Docs Pricing Support