You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Align PolicyAgent prompt with training format (#31)
* Align PolicyAgent prompt with training format from convert_demos.py
- Import SYSTEM_PROMPT from convert_demos (canonical source)
- Add system message to SFT sample
- Change "Goal:" label to "Instruction:" (training format)
- Remove a11y tree, URL, window title injection (not in training data)
- Add <think> instruction matching training tail prompt
- Format history as " Step {i}: {action}" (0-indexed, indented)
- Track previous actions across steps (reset on reset())
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix PolicyAgent to call predict_action_from_sample (not predict)
AgentPolicy has predict_action_from_sample() which returns a 4-tuple
(Action, thought, state, raw_text). The previous code called predict()
which doesn't exist on AgentPolicy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Fix _action_to_string to match training format from convert_demos
Replace UPPERCASE/normalized format (CLICK(0.500, 0.300)) with
training-aligned format (click(x=500, y=300)): lowercase function
names, [0,1000] coordinates, named parameters, press() for keys,
finished() for done.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(modal): increase inference timeout from 300s to 600s
Vision model inference with large screenshots can take 3+ minutes on
A10G, especially on cold start. 300s was causing premature timeouts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: remove dead system prompt from PolicyAgent._build_sample()
QwenVLAdapter.generate() only extracts user role messages, dropping
the system prompt. Since training also ignores it, removing it at
inference keeps behaviour consistent and eliminates misleading code.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: ruff format agent.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
0 commit comments