Commit d9309fb
feat: add calibration report with stratified sampling and bias analysis
Enhances validate_on_contextbench.py with:
- Stratified sampling by language and gold context complexity
- Comprehensive calibration report (error profile, TPR/FPR)
- Go/no-go threshold (file recall >= 0.60)
- Systematic gap analysis (missed file categories)
- Per-language and per-complexity bias breakdowns
- Domain gap warning for polyrepo tasks
- Paper-ready statement generation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent c85ae1b commit d9309fb
1 file changed
+392
-30
lines changed
0 commit comments