← All free toolsFree · scoping calculator

Eval harness scoping.

How big should your eval set be, and how long does it take to build? Realistic estimates from production engagements — labeling time, engineer time, total cost.

8

Each one needs its own eval coverage

40

50 is a typical floor for a real signal; 100+ for production-critical paths

Scoring complexity
50%

Judge cases are dramatically cheaper but need rubric calibration first

How this is calculated

Time-per-case multipliers are based on what we have measured across actual engagements. Labeling rate assumed at $150/hr (a senior analyst or domain expert); engineer time at $250/hr.

Real engagements vary. The point of this calculator is not to produce the SOW — it is to anchor the conversation about scope before you commit to “we’ll add evals later” and ship without them.