Commit 9f8ffe8
fix: prevent heuristic/verifier drift and surface partial steps in goal verification
Three issues addressed:
1. Heuristic/verifier step drift: The agent's keyword-based
_advance_plan_steps() heuristic and the DemoController's VLM verifier
operated on independent state, allowing them to disagree on which step
was current. Fix: add _external_step_control flag to the agent that the
DemoController sets at init, making _advance_plan_steps() a no-op when
the controller manages step progression via VLM verification.
2. partially_verified invisible to goal verification: When steps were
marked partially_verified, the final goal verification pass had no
visibility into which steps had partial completions. Fix: _verify_goal()
now builds a step verification summary and augments the goal text with
it when noteworthy statuses (partially_verified, failed) exist.
3. Missing integration tests: Added TestHeuristicVerifierSync (4 tests)
and TestGoalVerificationContext (5 tests) that verify the heuristic is
properly disabled under controller management, step advancement is
driven by VLM verification, and partial/failed step context reaches
goal verification. Also added 2 agent-level tests for
_external_step_control behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent d498bca commit 9f8ffe8
4 files changed
Lines changed: 405 additions & 2 deletions
File tree
- openadapt_evals
- agents
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
315 | 315 | | |
316 | 316 | | |
317 | 317 | | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
318 | 324 | | |
319 | 325 | | |
320 | 326 | | |
| |||
727 | 733 | | |
728 | 734 | | |
729 | 735 | | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
730 | 740 | | |
731 | 741 | | |
732 | 742 | | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
733 | 746 | | |
734 | 747 | | |
735 | 748 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
158 | 165 | | |
159 | 166 | | |
160 | 167 | | |
| |||
742 | 749 | | |
743 | 750 | | |
744 | 751 | | |
745 | | - | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
746 | 755 | | |
747 | 756 | | |
748 | 757 | | |
| |||
755 | 764 | | |
756 | 765 | | |
757 | 766 | | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
758 | 777 | | |
759 | 778 | | |
760 | | - | |
| 779 | + | |
761 | 780 | | |
762 | 781 | | |
763 | 782 | | |
764 | 783 | | |
765 | 784 | | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
766 | 820 | | |
767 | 821 | | |
768 | 822 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1084 | 1084 | | |
1085 | 1085 | | |
1086 | 1086 | | |
| 1087 | + | |
| 1088 | + | |
| 1089 | + | |
| 1090 | + | |
| 1091 | + | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
| 1097 | + | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
0 commit comments