Commit b37834e
feat(compare): add normalized gain metric (#1101)
* feat(compare): add normalized gain metric to agentv compare
Add Hake's normalized gain (g) to compare output, measuring improvement
relative to remaining headroom rather than raw absolute delta.
Formula: g = (score_candidate − score_baseline) / (1 − score_baseline)
This separates genuine scaffolding from ceiling effects — a +5pp gain
from a 90% baseline (g=0.5) is proportionally much larger than +5pp
from a 10% baseline (g=0.056).
Shown as "Norm. gain" in table output and "g" in matrix pairwise summary.
Available as mean_normalized_gain in JSON output. Returns null when
baseline is 1.0 (perfect score, no headroom).
Closes #1100
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(compare): use standard symbol 'g' for normalized gain
Use 'g' consistently in both table summary and matrix pairwise output,
matching the standard notation from Hake (1998) and SkillsBench paper.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs(compare): document normalized gain metric
Add normalized gain (g) to compare docs: formula, interpretation table,
updated table/JSON output examples, and tips section.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent 5855aad commit b37834e
3 files changed
Lines changed: 247 additions & 35 deletions
File tree
- apps
- cli
- src/commands/compare
- test/commands/compare
- web/src/content/docs/docs/tools
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | 45 | | |
45 | 46 | | |
| |||
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| |||
111 | 113 | | |
112 | 114 | | |
113 | 115 | | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
114 | 130 | | |
115 | 131 | | |
116 | 132 | | |
| |||
137 | 153 | | |
138 | 154 | | |
139 | 155 | | |
| 156 | + | |
140 | 157 | | |
141 | 158 | | |
142 | 159 | | |
| |||
153 | 170 | | |
154 | 171 | | |
155 | 172 | | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
156 | 179 | | |
157 | 180 | | |
158 | 181 | | |
| |||
163 | 186 | | |
164 | 187 | | |
165 | 188 | | |
| 189 | + | |
166 | 190 | | |
167 | 191 | | |
168 | 192 | | |
| |||
323 | 347 | | |
324 | 348 | | |
325 | 349 | | |
326 | | - | |
| 350 | + | |
327 | 351 | | |
328 | 352 | | |
329 | 353 | | |
| |||
340 | 364 | | |
341 | 365 | | |
342 | 366 | | |
343 | | - | |
344 | | - | |
345 | | - | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
346 | 376 | | |
347 | 377 | | |
348 | 378 | | |
| |||
414 | 444 | | |
415 | 445 | | |
416 | 446 | | |
417 | | - | |
| 447 | + | |
418 | 448 | | |
419 | 449 | | |
420 | 450 | | |
421 | | - | |
422 | | - | |
423 | | - | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
424 | 459 | | |
425 | 460 | | |
426 | 461 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
459 | 460 | | |
460 | 461 | | |
461 | 462 | | |
462 | | - | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
463 | 472 | | |
464 | 473 | | |
465 | 474 | | |
| |||
476 | 485 | | |
477 | 486 | | |
478 | 487 | | |
479 | | - | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
480 | 497 | | |
481 | 498 | | |
482 | 499 | | |
483 | 500 | | |
484 | 501 | | |
485 | 502 | | |
486 | | - | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
487 | 512 | | |
488 | 513 | | |
489 | 514 | | |
| |||
500 | 525 | | |
501 | 526 | | |
502 | 527 | | |
503 | | - | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
504 | 537 | | |
505 | 538 | | |
506 | 539 | | |
507 | 540 | | |
508 | 541 | | |
509 | 542 | | |
510 | | - | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
511 | 552 | | |
512 | 553 | | |
513 | 554 | | |
| |||
530 | 571 | | |
531 | 572 | | |
532 | 573 | | |
533 | | - | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
534 | 583 | | |
535 | 584 | | |
536 | 585 | | |
| |||
550 | 599 | | |
551 | 600 | | |
552 | 601 | | |
553 | | - | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
554 | 611 | | |
555 | 612 | | |
556 | 613 | | |
| |||
584 | 641 | | |
585 | 642 | | |
586 | 643 | | |
587 | | - | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
588 | 653 | | |
589 | 654 | | |
590 | 655 | | |
| |||
622 | 687 | | |
623 | 688 | | |
624 | 689 | | |
625 | | - | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
626 | 699 | | |
627 | 700 | | |
628 | 701 | | |
| |||
648 | 721 | | |
649 | 722 | | |
650 | 723 | | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
651 | 806 | | |
0 commit comments