Skip to content

Commit ded41c8

Browse files
abrichrclaude
andcommitted
fix: address PR #97 review comments with clarifying comments and test dep
- Add comment in reset() explaining why _external_step_control is not reset - Add comment on hasattr guard explaining MagicMock behavior is acceptable - Add docstring note in TestFalseNegativeRegressions about VLM response limitation - Add flask to test optional-dependencies for CI coverage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 82ca648 commit ded41c8

4 files changed

Lines changed: 9 additions & 0 deletions

File tree

openadapt_evals/agents/claude_computer_use_agent.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -362,6 +362,8 @@ def _clamp_coord(self, x_norm: float, y_norm: float) -> tuple[float, float]:
362362

363363
def reset(self) -> None:
364364
"""Reset agent state between episodes."""
365+
# Note: _external_step_control is not reset here because the controller
366+
# that set it persists across resets
365367
self._messages = []
366368
self._step_count = 0
367369
self._last_tool_use_id = None

openadapt_evals/demo_controller.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,8 @@ def __init__(
159159
# step progression is driven exclusively by VLM verification here
160160
# in the controller. This prevents drift between the agent's
161161
# keyword-based heuristic and the controller's verifier.
162+
# hasattr works correctly for real agents; MagicMock auto-creates attrs
163+
# but that's fine since we're setting the value anyway
162164
if hasattr(agent, "_external_step_control"):
163165
agent._external_step_control = True
164166

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ all = [
9494
]
9595
test = [
9696
"anthropic>=0.76.0",
97+
"flask>=3.0.0",
9798
]
9899

99100
[project.scripts]

tests/test_plan_verify.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -622,6 +622,10 @@ class TestFalseNegativeRegressions:
622622
623623
These verify that the updated prompts and status model correctly handle
624624
cases where the old verifier would produce false negatives.
625+
626+
Note: These tests validate the parsing pipeline, not the VLM's actual
627+
response to the new prompts. Live eval is needed to validate prompt
628+
effectiveness.
625629
"""
626630

627631
# -- Scenario 1: Header typed correctly, cursor moved after entry ------

0 commit comments

Comments
 (0)