Skip to content

feat: improve browsecomp#1109

Merged
bxyu-nvidia merged 6 commits intomainfrom
yukih/improve-browsecomp
Apr 26, 2026
Merged

feat: improve browsecomp#1109
bxyu-nvidia merged 6 commits intomainfrom
yukih/improve-browsecomp

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Apr 22, 2026

Summary

Improves the browsecomp_agent with better observability and tighter control over long agentic trajectories.

  • max_reset_count: Adds a new config field to cap the number of context resets per trajectory. Previously, context resets could happen unboundedly whenever the prompt exceeded the reset threshold. This lets users trade off context freshness against reset overhead.
  • reset_count / num_tool_calls logging: Surfaces reset_count and num_tool_calls directly on the agent response object for downstream analysis. Also fixes num_tool_calls — it was previously counted inside the resources server and is now correctly tracked in the agent loop.
  • Trajectory recording (snap_dir): Adds an optional snap_dir config field. When set, the agent saves a JSONL snapshot of the full conversation at each context reset and at the end of the trajectory (keyed by task_index and attempt). Useful for debugging and offline analysis of long browsing runs.
  • Config: Exposes snap_dir and max_reset_count in benchmarks/browsecomp/config.yaml.

yuki-97 added 4 commits April 22, 2026 03:04
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuki-97 yuki-97 marked this pull request as ready for review April 23, 2026 14:42
yuki-97 and others added 2 commits April 23, 2026 07:50
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@bxyu-nvidia bxyu-nvidia merged commit fe9845e into main Apr 26, 2026
6 checks passed
@bxyu-nvidia bxyu-nvidia deleted the yukih/improve-browsecomp branch April 26, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants