feat(example): add Terminal Bench training example by ActuallyEdward · Pull Request #1224 · inclusionAI/AReaL

ActuallyEdward · 2026-04-21T20:27:15Z

Description

This PR adds a new examples/terminal_bench example for training terminal agents with AReaL on Terminal Bench 1.0 tasks.

The example is an AReaL adaptation of the Terminal Bench training workflow from SETA, targeting an easy subset from the converted SETA dataset. It includes a full training entrypoint, rollout workflow, CAMEL-based terminal agent, example configs for SGLang and vLLM-on-NPU, example-scoped dependency metadata, a reward figure, and a README covering setup, runtime assumptions, dataset preparation, and training commands.

A few points are important for users of this example. The workflow is intended to run inside the AReaL runtime with host Docker mounted in, because Terminal Bench task environments are launched through docker compose from inside the rollout runtime. The example also depends on the converted dataset layout under AReaL/dataset, sourced from either SETA or terminal-bench-seta; the bundled parquet is only a convenience copy and is not sufficient by itself without the referenced task assets.

Related Issue

NA

Fixes #(issue)

Type of Change

Checklist

I have read the Contributing Guide
Pre-commit hooks pass (pre-commit run --all-files)
Relevant tests pass; new tests added for new functionality
Documentation updated (if applicable; built with ./docs/build_all.sh)
Branch is up to date with main
Self-reviewed via /review-pr command
This PR was created by a coding agent via /create-pr
This PR is a breaking change

Breaking Change Details (if applicable):

Additional Context

Original Workflow & Example source: https://github.com/camel-ai/seta

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

gemini-code-assist

Code Review

This pull request implements a training pipeline for terminal agents using AReaL and Terminal Bench, featuring a CAMEL-based agent and specialized rollout workflows. Key feedback includes fixing variable substitution syntax in the YAML configurations, resolving code duplication in the tracing agent, and correcting markdown formatting in the prompts. Additionally, recommendations were made to improve security by removing insecure curl flags, enhance maintainability by avoiding hardcoded paths and environment modifications, and follow best practices regarding logging and dependency pinning.

gemini-code-assist · 2026-04-21T20:31:43Z

+from .prompts import get_developer_agent_prompt
+
+
+DATASET_ROOT = Path(__file__).resolve().parents[3] / "dataset"


Hardcoding the dataset root path relative to the current file (Path(__file__).resolve().parents[3] / "dataset") is fragile. If the directory structure changes, this will break. It's better to make this path configurable, for instance by passing it through the agent's configuration or as an environment variable. This improves maintainability and makes the example more robust.

gemini-code-assist · 2026-04-21T20:31:43Z

+                Path(__file__).parent.parent.parent
+                / "dataset"
+                / config.train_dataset.path
+            )


Constructing the dataset path relative to the current file's location (Path(__file__).parent.parent.parent) makes the script fragile and dependent on a specific directory structure. A more robust approach would be to define a root directory in the configuration and construct paths relative to that, or expect absolute paths. This would make the example easier to adapt to different environments.

gemini-code-assist · 2026-04-21T20:31:43Z

+                    timeout=self.task_timeouts._reset_env + 60.0,
+                )
+        except asyncio.TimeoutError:
+            print(f"Timeout while building docker image for task {data.get('task_name')}")


Using print for logging can make it difficult to manage log levels, format messages consistently, and redirect output. It's better to use the logging module. This allows for more flexible and powerful logging, which is especially important in a complex workflow like this.

gemini-code-assist · 2026-04-21T20:31:43Z

+from terminal_bench.terminal.docker_compose_manager import DockerComposeManager
+
+
+DATASET_ROOT = Path(__file__).resolve().parents[3] / "dataset"


Hardcoding the dataset root path relative to the current file (Path(__file__).resolve().parents[3] / "dataset") is fragile. If the directory structure changes, this will break. It's better to make this path configurable, for instance by passing it as an argument to the function or reading it from a central configuration. This improves maintainability and reusability.

gemini-code-assist · 2026-04-21T20:31:43Z

+        input_path=task_path,
+        output_path=Path("build_outputs"),
+    )
+    print(f"Task path: {task_path}")


Using print for logging is generally discouraged in library or application code. It's better to use the logging module, which provides more control over verbosity, formatting, and output streams (e.g., stdout, stderr, files).

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

garrett4wade

LGTM except for some cleanups

garrett4wade

LGTM

Add Terminal Bench training example

2ee84a9

ActuallyEdward requested review from fishcrap and garrett4wade as code owners April 21, 2026 20:27

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

Edward Wang and others added 6 commits April 21, 2026 13:41

Update terminal bench example configs

53e0724

Update examples/terminal_bench/command.sh

abeb7f3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Fix terminal bench lint issues

df43c0d

style: apply pre-commit fixes

4ef0ff0

Pin terminal bench example dependencies

1eeb7ea

Merge branch 'main' into edward/terminal-bench-example

1ceb396

garrett4wade reviewed Apr 23, 2026

View reviewed changes

garrett4wade added the reviewed label Apr 23, 2026

ActuallyEdward and others added 4 commits April 23, 2026 10:10

Merge branch 'main' into edward/terminal-bench-example

c0371af

chore: remove terminal bench example artifacts

d08ac3b

chore: update terminal bench config dataset paths

4ca279d

chore: fix terminal bench npu dataset path

e4acc6d

garrett4wade approved these changes Apr 24, 2026

View reviewed changes

garrett4wade merged commit aeb237b into inclusionAI:main Apr 24, 2026
1 check failed

		from .prompts import get_developer_agent_prompt


		DATASET_ROOT = Path(__file__).resolve().parents[3] / "dataset"

		from terminal_bench.terminal.docker_compose_manager import DockerComposeManager


		DATASET_ROOT = Path(__file__).resolve().parents[3] / "dataset"

Conversation

ActuallyEdward commented Apr 21, 2026

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

garrett4wade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

garrett4wade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants