feat(example): add Terminal Bench training example#1224
feat(example): add Terminal Bench training example#1224garrett4wade merged 11 commits intoinclusionAI:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a training pipeline for terminal agents using AReaL and Terminal Bench, featuring a CAMEL-based agent and specialized rollout workflows. Key feedback includes fixing variable substitution syntax in the YAML configurations, resolving code duplication in the tracing agent, and correcting markdown formatting in the prompts. Additionally, recommendations were made to improve security by removing insecure curl flags, enhance maintainability by avoiding hardcoded paths and environment modifications, and follow best practices regarding logging and dependency pinning.
| from .prompts import get_developer_agent_prompt | ||
|
|
||
|
|
||
| DATASET_ROOT = Path(__file__).resolve().parents[3] / "dataset" |
There was a problem hiding this comment.
Hardcoding the dataset root path relative to the current file (Path(__file__).resolve().parents[3] / "dataset") is fragile. If the directory structure changes, this will break. It's better to make this path configurable, for instance by passing it through the agent's configuration or as an environment variable. This improves maintainability and makes the example more robust.
| Path(__file__).parent.parent.parent | ||
| / "dataset" | ||
| / config.train_dataset.path | ||
| ) |
There was a problem hiding this comment.
Constructing the dataset path relative to the current file's location (Path(__file__).parent.parent.parent) makes the script fragile and dependent on a specific directory structure. A more robust approach would be to define a root directory in the configuration and construct paths relative to that, or expect absolute paths. This would make the example easier to adapt to different environments.
| timeout=self.task_timeouts._reset_env + 60.0, | ||
| ) | ||
| except asyncio.TimeoutError: | ||
| print(f"Timeout while building docker image for task {data.get('task_name')}") |
There was a problem hiding this comment.
| from terminal_bench.terminal.docker_compose_manager import DockerComposeManager | ||
|
|
||
|
|
||
| DATASET_ROOT = Path(__file__).resolve().parents[3] / "dataset" |
There was a problem hiding this comment.
Hardcoding the dataset root path relative to the current file (Path(__file__).resolve().parents[3] / "dataset") is fragile. If the directory structure changes, this will break. It's better to make this path configurable, for instance by passing it as an argument to the function or reading it from a central configuration. This improves maintainability and reusability.
| input_path=task_path, | ||
| output_path=Path("build_outputs"), | ||
| ) | ||
| print(f"Task path: {task_path}") |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
garrett4wade
left a comment
There was a problem hiding this comment.
LGTM except for some cleanups
Description
This PR adds a new examples/terminal_bench example for training terminal agents with AReaL on Terminal Bench 1.0 tasks.
The example is an AReaL adaptation of the Terminal Bench training workflow from SETA, targeting an easy subset from the converted SETA dataset. It includes a full training entrypoint, rollout workflow, CAMEL-based terminal agent, example configs for SGLang and vLLM-on-NPU, example-scoped dependency metadata, a reward figure, and a README covering setup, runtime assumptions, dataset preparation, and training commands.
A few points are important for users of this example. The workflow is intended to run inside the AReaL runtime with host Docker mounted in, because Terminal Bench task environments are launched through docker compose from inside the rollout runtime. The example also depends on the converted dataset layout under AReaL/dataset, sourced from either SETA or terminal-bench-seta; the bundled parquet is only a convenience copy and is not sufficient by itself without the referenced task assets.
Related Issue
NA
Fixes #(issue)
Type of Change
Checklist
pre-commit run --all-files)./docs/build_all.sh)main/review-prcommand/create-prBreaking Change Details (if applicable):
Additional Context
Need help? Check the Contributing Guide or ask in
GitHub Discussions!