diff --git a/examples/terminal_bench/README.md b/examples/terminal_bench/README.md
new file mode 100644
index 0000000000..9ff7b7b9b0
--- /dev/null
+++ b/examples/terminal_bench/README.md
@@ -0,0 +1,318 @@
+# Terminal Agent Training with Terminal Bench 1.0
+
+## Overview
+
+This example demonstrates how to train terminal agents with AReaL's PPO/GRPO-style
+training pipeline on Terminal Bench tasks.
+
+It is an AReaL adaptation of the training workflow originally developed in
+[SETA](https://github.com/camel-ai/seta), with the environment management and rollout
+loop refactored into an AReaL example. In this example, we focus on an easy subset of
+Terminal Bench 1.0 derived from the SETA conversion of Terminal Bench tasks.
+
+[Terminal Bench](https://github.com/harbor-framework/terminal-bench) is a benchmark for
+evaluating AI agents in real terminal environments. It provides a task dataset plus an
+execution harness, where each task includes a natural language instruction, a runnable
+environment, and outcome-based verification. This example targets the Terminal Bench 1.0
+style workflow used in SETA and trains on the easy subset prepared for that pipeline.
+
+## Relation to SETA
+
+This directory is not a copy of SETA. It is a conversion of the Terminal Bench training
+path in SETA into AReaL's workflow abstraction and launcher model.
+
+Compared with SETA:
+
+- updated to work with the current AReaL stack (`v1.0.2`)
+- supports single-controller mode through AReaL's `PPOTrainer`
+- rollout logic is implemented as an AReaL `RolloutWorkflow`
+- the CAMEL-based terminal agent is packaged as an example-local agent module
+- Terminal Bench task environments are still created and verified through
+  `terminal_bench`
+
+## Code Architecture
+
+- `train.py`: Entry point that loads config, builds the dataset, and launches AReaL
+  training.
+- `workflow/camel_rlvr_workflow.py`: Rollout workflow that builds task images, runs
+  trajectories, collects rewards, and exports interactions.
+- `workflow/pre_build_tasks_utils.py`: Helper for pre-building Terminal Bench task
+  images before rollout.
+- `agent/camel_terminal_agent.py`: CAMEL-based terminal agent wrapper used for each
+  trajectory.
+- `agent/chat_agent_trace.py`: Traced `ChatAgent` variant used by the agent.
+- `agent/prompts.py`: Developer-agent prompt construction.
+- `agent_rl_config.py`: Example-specific config extensions on top of AReaL `GRPOConfig`.
+
+## Included Configurations
+
+Two example configs are currently included:
+
+| Config                    | Backend | Cluster Target        | Use Case                      |
+| ------------------------- | ------- | --------------------- | ----------------------------- |
+| `config_tb_sglang.yaml`   | SGLang  | single-node GPU setup | local or small-scale training |
+| `config_tb_vllm_npu.yaml` | vLLM    | Ascend NPU setup      | NPU training                  |
+
+## Running the Example
+
+### Prerequisites
+
+Please make sure AReaL itself is already installed and working.
+
+You will need:
+
+- Python `>=3.10`
+- a working AReaL environment
+- Docker CLI available inside the AReaL runtime
+- Docker Compose and Buildx available as Docker CLI plugins
+- the `terminal_bench` Python package
+
+For NPU usage, you will also need:
+
+- Ascend drivers and runtime
+- access to the required `/dev/davinci*` devices
+- `sglang[srt_npu]`, since this workflow currently depends on SGLang tool parsing even
+  when using the vLLM-based config
+
+### Recommended Runtime Model
+
+This example is intended to run inside the AReaL runtime, with host Docker mounted into
+that runtime container.
+
+That structure is important: Terminal Bench task environments are launched via
+`docker compose`, and the `docker compose` invocation needs to happen from the same
+AReaL runtime that is performing rollout and evaluation.
+
+The recommended setup is:
+
+- run AReaL inside a runtime container
+- mount the host Docker socket into that container
+- mount the Docker CLI and Docker CLI plugins into that container
+- run this example from inside that AReaL runtime container
+
+Minimum mounts:
+
+```bash
+-v /var/run/docker.sock:/var/run/docker.sock
+-v /usr/bin/docker:/usr/bin/docker:ro
+-v /usr/libexec/docker/cli-plugins:/usr/libexec/docker/cli-plugins:ro
+```
+
+### Install Example Dependencies
+
+From the AReaL repo root:
+
+```bash
+cd examples/terminal_bench
+pip install -e .
+```
+
+This installs the example-scoped dependencies declared in
+[`pyproject.toml`](./pyproject.toml):
+
+- `ipython`
+- `ruamel.yaml`
+- `streamlit`
+- `sqlalchemy`
+- `docker`
+- `camel_ai`
+- `terminal_bench`
+
+If you are using the NPU / vLLM path, also install the optional extra:
+
+```bash
+pip install -e ".[npu]"
+```
+
+If `terminal_bench` fails to install because of an upstream Python-version constraint
+mismatch, which can happen on some NPU runtime images, install it from source and relax
+its Python requirement to `>=3.11`:
+
+```bash
+git clone https://github.com/harbor-framework/terminal-bench.git
+cd terminal-bench
+```
+
+Edit `pyproject.toml`:
+
+```toml
+requires-python = ">=3.11"
+```
+
+Then install it manually:
+
+```bash
+pip install --no-deps -e .
+```
+
+If you use this fallback path, you can install the rest of the example dependencies
+separately:
+
+```bash
+cd ../AReaL/examples/terminal_bench
+pip install --no-deps -e .
+pip install ipython ruamel.yaml streamlit sqlalchemy docker
+```
+
+### Manual Dependency Path
+
+If you already manage some dependencies separately, you can use the same manual setup
+pattern used in SETA.
+
+Install CAMEL and Terminal Bench from a SETA checkout:
+
+```bash
+git clone https://github.com/camel-ai/seta.git
+cd seta
+git submodule update --init --recursive
+
+cd external/camel
+pip install --no-deps -e .
+
+cd ../terminal-bench
+pip install --no-deps -e .
+```
+
+Then install the remaining example dependencies:
+
+```bash
+pip install ipython ruamel.yaml streamlit sqlalchemy docker
+```
+
+### Install SGLang for NPU
+
+One working installation path from the original setup is:
+
+```bash
+git clone -b v0.5.6.post2 https://github.com/sgl-project/sglang.git
+cd sglang
+mv python/pyproject_other.toml python/pyproject.toml
+pip install -e python[srt_npu] --no-deps
+```
+
+### Configure `tiktoken`
+
+This example assumes `o200k_base.tiktoken` is cached locally.
+
+```bash
+export TIKTOKEN_CACHE_DIR=/tmp/tiktoken-cache
+mkdir -p "$TIKTOKEN_CACHE_DIR"
+curl -k -o "$TIKTOKEN_CACHE_DIR/o200k_base.tiktoken" \
+  https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken
+```
+
+If you need the hashed cache filename used by `tiktoken`, compute it with:
+
+```bash
+python3 - <<'PY'
+import hashlib
+url = "https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken"
+print(hashlib.sha1(url.encode()).hexdigest())
+PY
+```
+
+### Prepare the Dataset
+
+This example does not work with the parquet file alone. The parquet rows point to task
+assets that must also exist under `AReaL/dataset/`.
+
+You should prepare the converted Terminal Bench dataset from either of these sources:
+
+- SETA: https://github.com/camel-ai/seta
+- terminal-bench-seta: https://github.com/ActuallyEdward/terminal-bench-seta
+
+For this example, those two sources should be treated as equivalent dataset sources.
+
+The configs in this directory expect the easy-subset parquet to be available at:
+
+```bash
+AReaL/dataset/tbench-tasks_convert/tbench-selected-tasks-easy.parquet
+```
+
+and they also expect the referenced task files and directories from the same converted
+dataset to be present under `AReaL/dataset/`.
+
+One workable setup is:
+
+```bash
+cd AReaL/dataset
+git clone https://github.com/ActuallyEdward/terminal-bench-seta.git
+```
+
+The `train_filtered_easy.parquet` file is also provided in
+[`terminal-bench-seta`](https://github.com/ActuallyEdward/terminal-bench-seta).
+
+Then place or link the easy-subset parquet from that checkout at the path expected by
+the configs:
+
+```bash
+mkdir -p AReaL/dataset/tbench-tasks_convert
+cp AReaL/dataset/terminal-bench-seta/train_filtered_easy.parquet \
+  AReaL/dataset/tbench-tasks_convert/tbench-selected-tasks-easy.parquet
+```
+
+If you source the data from SETA instead, use the same converted dataset layout and
+place the parquet and referenced task assets under `AReaL/dataset/` in the same way.
+
+### Docker Compose / Buildx
+
+Docker Compose and Buildx should be available inside the AReaL runtime at:
+
+```bash
+/usr/libexec/docker/cli-plugins/
+```
+
+If needed:
+
+```bash
+chmod +x /usr/libexec/docker/cli-plugins/docker-compose
+chmod +x /usr/libexec/docker/cli-plugins/docker-buildx
+```
+
+### Training Commands
+
+The following commands are intended to be executed from the AReaL repo root.
+
+#### SGLang
+
+```bash
+python3 examples/terminal_bench/train.py \
+    --config examples/terminal_bench/config_tb_sglang.yaml
+```
+
+#### vLLM on NPU
+
+```bash
+python3 examples/terminal_bench/train.py \
+    --config examples/terminal_bench/config_tb_vllm_npu.yaml
+```
+
+## Results
+
+The following figure shows a representative training reward curve on the easy subset
+derived from SETA:
+
+<p align="left">
+  <img src="reward.png" width="500">
+</p>
+
+On this setup, we observe reward-curve behavior qualitatively similar to the GRPO
+training trends reported in
+[terminal-bench-rl](https://github.com/Danau5tin/terminal-bench-rl). This is a
+directional comparison of training dynamics rather than a claim of identical setup,
+identical scale, or identical leaderboard numbers.
+
+## Notes
+
+1. This example currently targets the easy subset used in the SETA conversion, not the
+   full Terminal Bench task distribution.
+1. `pyproject.toml` in this directory is intentionally example-scoped. It does not
+   replace installing AReaL itself.
+1. Docker, proxy, model-mount, and NPU device details are environment-specific and
+   should be adapted locally.
+
+## References
+
+- SETA: https://github.com/camel-ai/seta
+- Terminal Bench: https://github.com/harbor-framework/terminal-bench
+- Terminal-Bench-RL: https://github.com/Danau5tin/terminal-bench-rl
diff --git a/examples/terminal_bench/__init__.py b/examples/terminal_bench/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/examples/terminal_bench/agent/__init__.py b/examples/terminal_bench/agent/__init__.py
new file mode 100644
index 0000000000..fe6332c6d4
--- /dev/null
+++ b/examples/terminal_bench/agent/__init__.py
@@ -0,0 +1,9 @@
+from .camel_terminal_agent import CamelTerminalAgent
+from .chat_agent_trace import ChatAgentTrace
+from .prompts import get_developer_agent_prompt
+
+__all__ = [
+    "CamelTerminalAgent",
+    "ChatAgentTrace",
+    "get_developer_agent_prompt",
+]
diff --git a/examples/terminal_bench/agent/camel_terminal_agent.py b/examples/terminal_bench/agent/camel_terminal_agent.py
new file mode 100644
index 0000000000..5547aba768
--- /dev/null
+++ b/examples/terminal_bench/agent/camel_terminal_agent.py
@@ -0,0 +1,349 @@
+from __future__ import annotations
+
+import asyncio
+import datetime
+import json
+import os
+from concurrent.futures import ThreadPoolExecutor
+from functools import partial
+from pathlib import Path
+
+from agent_rl_config import TaskTimeouts
+from camel.messages import BaseMessage
+from camel.toolkits import FunctionTool, TerminalToolkit
+from transformers import PreTrainedTokenizerFast
+
+from terminal_bench.handlers.trial_handler import TrialHandler
+from terminal_bench.parsers.base_parser import UnitTestStatus
+from terminal_bench.parsers.parser_factory import ParserFactory
+from terminal_bench.terminal.docker_compose_manager import DockerComposeManager
+from terminal_bench.terminal.terminal import Terminal
+
+from areal.experimental.camel.openai_model import AReaLOpenAICompatibleModel
+from areal.utils.perf_tracer import (
+    Category,
+    atrace_scope,
+    atrace_session_phase,
+    session_context,
+    trace_perf,
+    trace_scope,
+)
+
+from .chat_agent_trace import ChatAgentTrace
+from .prompts import get_developer_agent_prompt
+
+DATASET_ROOT = Path(__file__).resolve().parents[3] / "dataset"
+
+
+class CamelTerminalAgent:
+    def __init__(
+        self,
+        tokenizer: PreTrainedTokenizerFast | None = None,
+        max_tokens_per_turn: int = 1024,
+        max_total_tokens: int = 40000,
+        output_path: str = "CamelTerminalAgent_Output",
+        max_iteration: int = 50,
+        executor: ThreadPoolExecutor | None = None,
+        task_timeouts: TaskTimeouts | None = None,
+        non_think_mode: bool = True,
+        encourage_completion_reward: bool = False,
+    ):
+        self.tokenizer = tokenizer
+        self.max_tokens_per_turn = max_tokens_per_turn
+        self.max_total_tokens = max_total_tokens
+        self.output_path = output_path
+        self.max_iteration = max_iteration
+        self.task_timeouts = task_timeouts or TaskTimeouts()
+        self.executor = executor
+        self.non_think_mode = non_think_mode
+        self.encourage_completion_reward = encourage_completion_reward
+        assert self.executor is not None, (
+            "Executor must be provided to CamelTerminalAgent"
+        )
+
+    @session_context()
+    @trace_perf("CamelTerminalAgent.run_agent", category=Category.COMPUTE)
+    async def run_agent(
+        self,
+        data,
+        client,
+        uid: str | None = None,
+        traj_i: int = 0,
+    ) -> float | None:
+        """Execute a complete agent workflow: setup environment, run agent, cleanup."""
+        task_name = data.get("task_name")
+        self.task_name = task_name
+        self.uid = uid
+        self.traj_i = traj_i
+        self.meta_info = {}
+        reward = None
+
+        print(f"Running task {task_name}")
+
+        try:
+            async with atrace_scope(
+                f"reset_env:{task_name}, traj:{traj_i}",
+                args={"uid": uid, "timeout": self.task_timeouts._reset_env},
+            ):
+                prompt = await self.run_in_executor(
+                    self._reset_env,
+                    data,
+                    uid,
+                    timeout=self.task_timeouts._reset_env,
+                )
+            print(f"env started: {task_name}")
+
+            async with atrace_scope(
+                f"reset_agent:{task_name}, traj:{traj_i}",
+                args={"uid": uid, "timeout": self.task_timeouts._reset_agent},
+            ):
+                await self.run_in_executor(
+                    self._reset_agent,
+                    client,
+                    timeout=self.task_timeouts._reset_agent,
+                )
+
+            try:
+                async with atrace_scope(
+                    f"agent_astep:{task_name}, traj:{traj_i}",
+                    args={"uid": uid, "timeout": self.task_timeouts.agent_astep},
+                ):
+                    self.response = await self.agent.astep(prompt)
+            except TimeoutError as exc:
+                print(f"Agent step timeout for task {task_name}: {exc}")
+            print(f"Task {task_name}: agent responded")
+
+            async with atrace_session_phase(
+                "reward",
+                start_payload={
+                    "task_name": task_name,
+                    "traj_i": traj_i,
+                    "uid": uid,
+                    "timeout": self.task_timeouts._evaluate_completion_sync,
+                },
+            ):
+                async with atrace_scope(
+                    f"evaluate_completion_sync:{task_name}, traj:{traj_i}",
+                    args={
+                        "uid": uid,
+                        "timeout": self.task_timeouts._evaluate_completion_sync,
+                    },
+                ):
+                    print("try to set rewards")
+                    reward = await self.run_in_executor(
+                        self._evaluate_completion_sync,
+                        timeout=self.task_timeouts._evaluate_completion_sync,
+                    )
+                    print(f"reward from run in executor is set as {reward}")
+            client.set_last_reward(reward)
+
+        except TimeoutError as exc:
+            print(f"Timeout for task {task_name}: {exc}")
+        except Exception as exc:
+            print(f"Error in task {task_name}: {exc}")
+            import traceback
+
+            traceback.print_exc()
+        finally:
+            try:
+                if hasattr(self, "terminal") and self.terminal is not None:
+                    async with atrace_scope(
+                        f"cleanup_env:{task_name}, traj:{traj_i}",
+                        args={"uid": uid, "timeout": self.task_timeouts._cleanup},
+                    ):
+                        await self.run_in_executor(
+                            self._close_env,
+                            timeout=self.task_timeouts._cleanup,
+                        )
+                    print(f"Task {task_name}: cleaned up")
+            except Exception as exc:
+                print(f"Cleanup error for task {task_name}: {exc}")
+            finally:
+                return reward
+
+    def _close_env(self):
+        if self.terminal:
+            self.terminal.stop(timeout=self.task_timeouts._cleanup)
+
+    async def run_in_executor(self, fn, *args, timeout: float | None = None, **kwargs):
+        loop = asyncio.get_running_loop()
+        executor_task = loop.run_in_executor(
+            self.executor,
+            partial(fn, *args, **kwargs),
+        )
+        if timeout is not None:
+            return await asyncio.wait_for(executor_task, timeout=timeout)
+        return await executor_task
+
+    def _reset_env(self, task: dict, uid: str | None):
+        output_path = Path(self.output_path).resolve()
+        output_path.mkdir(parents=True, exist_ok=True)
+
+        task_path = DATASET_ROOT / task.get("task_path")
+        print(f"Task path: {task_path}")
+        instruction = task.get("instruction")
+        task_id = task.get("task_name")
+
+        self.trial_handler = TrialHandler(
+            trial_name=f"{task_id}.{uid}.areal-run",
+            input_path=task_path,
+            output_path=output_path,
+        )
+
+        task_config = self.trial_handler.task
+        self.parser = ParserFactory.get_parser(task_config.parser_name)
+
+        self.client_container_name = f"{self.trial_handler.client_container_name}"
+        self.terminal = Terminal(
+            client_container_name=self.trial_handler.client_container_name,
+            client_image_name=self.trial_handler.client_image_name,
+            docker_compose_path=self.trial_handler.task_paths.docker_compose_path,
+            docker_image_name_prefix=self.trial_handler.docker_image_name_prefix,
+            sessions_logs_path=self.trial_handler.trial_paths.sessions_path,
+            agent_logs_path=self.trial_handler.trial_paths.agent_logging_dir,
+            no_rebuild=True,
+            cleanup=False,
+        )
+        with trace_scope(
+            f"reset_env.start_terminal:{task_id}, traj:{self.traj_i}",
+            args={"uid": uid},
+        ):
+            self.terminal.start(timeout=self.task_timeouts._reset_env)
+
+        return f"Task name:{self.task_name}\nTask instruction: {instruction}"
+
+    def _reset_agent(self, client):
+        session_logs_dir = (
+            self.trial_handler.trial_paths.sessions_path
+            / "terminal_toolkit_session_logs"
+        )
+        terminal_toolkit = TerminalToolkit(
+            timeout=20.0,
+            working_directory=None,
+            use_docker_backend=True,
+            docker_container_name=self.trial_handler.client_container_name,
+            session_logs_dir=session_logs_dir,
+            safe_mode=False,
+        )
+        tools = [
+            FunctionTool(terminal_toolkit.shell_exec),
+            FunctionTool(terminal_toolkit.shell_view),
+            FunctionTool(terminal_toolkit.shell_write_to_process),
+            FunctionTool(terminal_toolkit.shell_write_content_to_file),
+        ]
+
+        system_message = get_developer_agent_prompt(
+            current_date=str(datetime.date.today()),
+            system="Linux (in Docker)",
+            machine="aarch64",
+            is_workforce=False,
+            non_think_mode=self.non_think_mode,
+        )
+        print("starting chat agent")
+        os.environ["CAMEL_MODEL_LOG_ENABLED"] = "True"
+        os.environ["CAMEL_LOG_DIR"] = str(
+            self.trial_handler.trial_paths.sessions_path.parent / "CAMEL_LOG_DIR"
+        )
+        model = AReaLOpenAICompatibleModel(
+            openai_client=client,
+            tokenizer=self.tokenizer,
+            model_type="areal",
+            model_config_dict={
+                "max_completion_tokens": self.max_tokens_per_turn,
+            },
+        )
+        self.agent = ChatAgentTrace(
+            system_message=BaseMessage.make_assistant_message(
+                role_name="Developer Agent",
+                content=system_message,
+            ),
+            model=model,
+            tools=tools,
+            token_limit=self.max_total_tokens,
+            step_timeout=self.task_timeouts.agent_astep,
+        )
+        self.agent.reset()
+        self.agent.max_iteration = self.max_iteration
+        print(f"{self.task_name}: agent started")
+
+    def _evaluate_completion_sync(self) -> float:
+        assert self.trial_handler is not None and self.terminal is not None
+
+        paths = [self.trial_handler.task_paths.run_tests_path]
+        if self.trial_handler.task_paths.test_dir.exists():
+            paths.append(self.trial_handler.task_paths.test_dir)
+        with trace_scope(
+            f"evaluate_completion_sync.copy_tests:{self.task_name}, traj:{self.traj_i}"
+        ):
+            self.terminal.copy_to_container(
+                paths=paths,
+                container_dir=str(DockerComposeManager.CONTAINER_TEST_DIR),
+            )
+
+        print("running tests in a new shell")
+        with trace_scope(
+            f"evaluate_completion_sync.create_test_session:{self.task_name}, traj:{self.traj_i}"
+        ):
+            test_session = self.terminal.create_session(
+                "tests",
+                is_active_stream=False,
+                as_configured_user=False,
+            )
+
+        test_script_path = str(DockerComposeManager.CONTAINER_TEST_DIR / "run-tests.sh")
+        try:
+            with trace_scope(
+                f"evaluate_completion_sync.run_tests:{self.task_name}, traj:{self.traj_i}"
+            ):
+                test_session.send_keys(
+                    [f"bash {test_script_path}", "Enter"],
+                    block=True,
+                    max_timeout_sec=min(
+                        self.task_timeouts._evaluate_completion_sync,
+                        4 * self.trial_handler.task.max_test_timeout_sec,
+                    ),
+                )
+            test_output = test_session.capture_pane(capture_entire=True)
+            parser_results = self.parser.parse(test_output)
+
+            all_passed = parser_results and all(
+                status == UnitTestStatus.PASSED for status in parser_results.values()
+            )
+            pass_ratio = (
+                sum(
+                    1
+                    for status in parser_results.values()
+                    if status == UnitTestStatus.PASSED
+                )
+                / len(parser_results)
+                if parser_results
+                else 0.0
+            )
+            results_path = str(
+                self.trial_handler.trial_paths.sessions_path.parent
+                / "test_results.json"
+            )
+            result_dict = {
+                "test_results": {
+                    k: (v == UnitTestStatus.PASSED) for k, v in parser_results.items()
+                },
+                "all_passed": all_passed,
+                "pass_ratio": pass_ratio,
+            }
+            try:
+                result_dict["iteration"] = len(self.response.info["tool_calls"])
+                result_dict.update(self.response.info["usage"])
+            except Exception:
+                pass
+            with open(results_path, "w") as f:
+                json.dump(result_dict, f, indent=4)
+
+        except Exception as exc:
+            print(exc)
+            all_passed = False
+            pass_ratio = 0.0
+
+        if self.encourage_completion_reward and pass_ratio == 1.0:
+            pass_ratio += 1.0
+
+        return pass_ratio
diff --git a/examples/terminal_bench/agent/chat_agent_trace.py b/examples/terminal_bench/agent/chat_agent_trace.py
new file mode 100644
index 0000000000..812ac3a377
--- /dev/null
+++ b/examples/terminal_bench/agent/chat_agent_trace.py
@@ -0,0 +1,381 @@
+import asyncio
+import atexit
+import json
+import os
+import re
+import textwrap
+import threading
+import time
+import uuid
+from typing import (
+    TYPE_CHECKING,
+)
+
+from camel.agents import ChatAgent
+from camel.agents._types import ToolCallRequest
+from camel.logger import get_logger
+from camel.messages import (
+    BaseMessage,
+    FunctionCallingMessage,
+)
+from camel.prompts import TextPrompt
+from camel.responses import ChatAgentResponse
+from camel.types import (
+    OpenAIBackendRole,
+)
+from camel.types.agents import ToolCallingRecord
+from pydantic import BaseModel
+
+from areal.utils.perf_tracer import (
+    Category,
+    atrace_scope,
+    atrace_session_phase,
+)
+
+if TYPE_CHECKING:
+    pass
+
+logger = get_logger(__name__)
+
+# Cleanup temp files on exit
+_temp_files: set[str] = set()
+_temp_files_lock = threading.Lock()
+
+
+def _cleanup_temp_files():
+    with _temp_files_lock:
+        for path in _temp_files:
+            try:
+                os.unlink(path)
+            except Exception:
+                pass
+
+
+atexit.register(_cleanup_temp_files)
+
+SIMPLE_FORMAT_PROMPT = TextPrompt(
+    textwrap.dedent(
+        """\
+        Please format the following content:
+
+        {content}
+        """
+    )
+)
+
+
+class ChatAgentTrace(ChatAgent):
+    """A ChatAgent with performance tracing capabilities."""
+
+    def __init__(self, *args, **kwargs):
+        """Initialize ChatAgentTrace with parse error tracking."""
+        super().__init__(*args, **kwargs)
+        self.max_parse_errors = kwargs.get("max_parse_errors", 3)
+        self.parse_error_count = 0
+
+    async def adetect_tool_calls_parse_error(self, response):
+        r"""
+        Asynchronously detect tool calls in the response content using Qwen25Detector.
+        if the model is Qwen 2.5 or Qwen 3.
+        if there's tool call tokens detected, but got json parse failure, format the information into a tool call record,
+        so that the agent can handle the error next step.
+        add a self.count_parse_error, so that we can limit the number of parse errors we handle in one step. if max reached, just
+        break the loop.
+
+        Args:
+            response: The model response to check for parse errors
+
+        Returns:
+            Optional[ToolCallingRecord]: A tool calling record with error information if parse error detected, None otherwise
+        """
+        bot_token = "<tool_call>\n"
+        eot_token = "\n</tool_call>"
+
+        # Check if we've reached max parse errors
+        if self.parse_error_count >= self.max_parse_errors:
+            logger.warning(
+                f"Max parse errors ({self.max_parse_errors}) reached, stopping error handling"
+            )
+            return None
+
+        # Extract content from response
+        if not response.output_messages:
+            return None
+
+        content = response.output_messages[0].content
+        if not content or bot_token not in content:
+            return None
+
+        # Find all potential tool call blocks
+        pattern = rf"{re.escape(bot_token)}(.*?){re.escape(eot_token)}"
+        matches = re.findall(pattern, content, re.DOTALL)
+
+        if not matches:
+            return None
+
+        # Check each match for JSON parse errors
+        for match_text in matches:
+            try:
+                # Try to parse the JSON
+                json.loads(match_text.strip())
+                # If successful, no error for this match
+                continue
+            except json.JSONDecodeError as e:
+                # Found a parse error
+                self.parse_error_count += 1
+                logger.warning(
+                    f"Detected JSON parse error (count: {self.parse_error_count}/{self.max_parse_errors}): {str(e)}"
+                )
+                logger.warning(f"Problematic content: {match_text[:200]}...")
+
+                # Create an error tool calling record
+                error_message = (
+                    f"JSON Parse Error: {str(e)}\n"
+                    f"The tool call format is incorrect. Please ensure:\n"
+                    f"1. The JSON is valid and properly formatted\n"
+                    f"2. All quotes are properly escaped\n"
+                    f"3. The structure matches: {{'name': 'function_name', 'arguments': {{}}}}\n"
+                    f"Problematic content (first 200 chars): {match_text[:200]}..."
+                )
+
+                # Generate a unique error tool call ID
+                error_tool_call_id = f"error_{uuid.uuid4().hex[:8]}"
+
+                # Create the error record
+                error_record = ToolCallingRecord(
+                    tool_name="json_parse_error",
+                    args={"raw_content": match_text, "error": str(e)},
+                    result=error_message,
+                    tool_call_id=error_tool_call_id,
+                )
+
+                # Record this in memory so the model can see the error
+                assist_msg = FunctionCallingMessage(
+                    role_name=self.role_name,
+                    role_type=self.role_type,
+                    meta_dict=None,
+                    content="",
+                    func_name="json_parse_error",
+                    args={"raw_content": match_text[:200], "error": str(e)},
+                    tool_call_id=error_tool_call_id,
+                )
+
+                func_msg = FunctionCallingMessage(
+                    role_name=self.role_name,
+                    role_type=self.role_type,
+                    meta_dict=None,
+                    content="",
+                    func_name="json_parse_error",
+                    result=error_message,
+                    tool_call_id=error_tool_call_id,
+                )
+
+                # Use precise timestamps
+                current_time_ns = time.time_ns()
+                base_timestamp = current_time_ns / 1_000_000_000
+
+                self.update_memory(
+                    assist_msg, OpenAIBackendRole.ASSISTANT, timestamp=base_timestamp
+                )
+                self.update_memory(
+                    func_msg,
+                    OpenAIBackendRole.FUNCTION,
+                    timestamp=base_timestamp + 1e-6,
+                )
+
+                return error_record
+
+        return None
+
+    async def _astep_non_streaming_task(
+        self,
+        input_message: BaseMessage | str,
+        response_format: type[BaseModel] | None = None,
+    ) -> ChatAgentResponse:
+        r"""Internal async method for non-streaming astep logic."""
+
+        # try to extract task name if exists in input_message
+        if isinstance(input_message, str):
+            task_name_match = re.search(r"Task name:(.*)\n", input_message)
+            if task_name_match:
+                task_name = task_name_match.group(1).strip()
+            else:
+                task_name = "default"
+        else:
+            task_name = "default"
+
+        # Reset parse error counter at the start of each step
+        self.parse_error_count = 0
+
+        try:
+            from camel.utils.langfuse import set_current_agent_session_id
+
+            set_current_agent_session_id(self.agent_id)
+        except ImportError:
+            pass  # Langfuse not available
+
+        # Check if this call is from a RegisteredAgentToolkit to prevent tool
+        # use
+        disable_tools = self._is_called_from_registered_toolkit()
+
+        # Handle response format compatibility with non-strict tools
+        original_response_format = response_format
+        input_message, response_format, used_prompt_formatting = (
+            self._handle_response_format_with_non_strict_tools(
+                input_message, response_format
+            )
+        )
+
+        if isinstance(input_message, str):
+            input_message = BaseMessage.make_user_message(
+                role_name="User", content=input_message
+            )
+
+        self.update_memory(input_message, OpenAIBackendRole.USER)
+
+        tool_call_records: list[ToolCallingRecord] = []
+        external_tool_call_requests: list[ToolCallRequest] | None = None
+        accumulated_context_tokens = (
+            0  # This tracks cumulative context tokens, not API usage tokens
+        )
+
+        # Initialize token usage tracker
+        step_token_usage = self._create_token_usage_tracker()
+        iteration_count: int = 0
+        prev_num_openai_messages: int = 0
+        while True:
+            if self.pause_event is not None and not self.pause_event.is_set():
+                if isinstance(self.pause_event, asyncio.Event):
+                    await self.pause_event.wait()
+                elif isinstance(self.pause_event, threading.Event):
+                    # For threading.Event in async context, run in executor
+                    loop = asyncio.get_event_loop()
+                    await loop.run_in_executor(None, self.pause_event.wait)
+            try:
+                openai_messages, num_tokens = self.memory.get_context()
+                accumulated_context_tokens += num_tokens
+            except RuntimeError as e:
+                return self._step_terminate(
+                    e.args[1], tool_call_records, "max_tokens_exceeded"
+                )
+
+            async with atrace_scope(
+                f"agent_astep._aget_model_response:{task_name}",
+                category=Category.COMM,
+                args={"agent_id": self.agent_id, "iteration": iteration_count},
+            ):
+                async with atrace_session_phase("generate"):
+                    response = await self._aget_model_response(
+                        openai_messages,
+                        # num_tokens=num_tokens,
+                        current_iteration=iteration_count,
+                        response_format=response_format,
+                        tool_schemas=[]
+                        if disable_tools
+                        else self._get_full_tool_schemas(),
+                        prev_num_openai_messages=prev_num_openai_messages,
+                    )
+
+            prev_num_openai_messages = len(openai_messages)
+            iteration_count += 1
+
+            # Accumulate API token usage
+            self._update_token_usage_tracker(step_token_usage, response.usage_dict)
+
+            # Terminate Agent if stop_event is set
+            if self.stop_event and self.stop_event.is_set():
+                # Use the _step_terminate to terminate the agent with reason
+                logger.info(f"Termination triggered at iteration {iteration_count}")
+                return self._step_terminate(
+                    accumulated_context_tokens,
+                    tool_call_records,
+                    "termination_triggered",
+                )
+
+            if tool_call_requests := response.tool_call_requests:
+                # Process all tool calls
+                for tool_call_request in tool_call_requests:
+                    if tool_call_request.tool_name in self._external_tool_schemas:
+                        if external_tool_call_requests is None:
+                            external_tool_call_requests = []
+                        external_tool_call_requests.append(tool_call_request)
+                    else:
+                        if (
+                            self.pause_event is not None
+                            and not self.pause_event.is_set()
+                        ):
+                            if isinstance(self.pause_event, asyncio.Event):
+                                await self.pause_event.wait()
+                            elif isinstance(self.pause_event, threading.Event):
+                                loop = asyncio.get_event_loop()
+                                await loop.run_in_executor(None, self.pause_event.wait)
+                        async with atrace_scope(
+                            f"agent_astep._aexecute_tool:{task_name}",
+                            category=Category.IO,
+                            args={
+                                "agent_id": self.agent_id,
+                                "iteration": iteration_count,
+                                "tool_name": tool_call_request.tool_name,
+                            },
+                        ):
+                            async with atrace_session_phase("toolcall"):
+                                tool_call_record = await self._aexecute_tool(
+                                    tool_call_request
+                                )
+                        tool_call_records.append(tool_call_record)
+
+                # If we found an external tool call, break the loop
+                if external_tool_call_requests:
+                    break
+
+                if (
+                    self.max_iteration is not None
+                    and iteration_count >= self.max_iteration
+                ):
+                    break
+
+                # If we're still here, continue the loop
+                continue
+
+            # Check for JSON parse errors in tool calls (Qwen 2.5/3 specific)
+            parse_error_record = await self.adetect_tool_calls_parse_error(response)
+            if parse_error_record:
+                print(
+                    f"Task {task_name}: Detected tool call parse error, prompting model to correct."
+                )
+                tool_call_records.append(parse_error_record)
+
+                # Check if we've reached max parse errors
+                if self.parse_error_count >= self.max_parse_errors:
+                    logger.error(
+                        f"Max parse errors reached ({self.max_parse_errors}), "
+                        "terminating step to prevent infinite loop"
+                    )
+                    break
+
+                # Continue to let the model try again with the error feedback
+                continue
+
+            break
+
+        await self._aformat_response_if_needed(response, response_format)
+
+        # Apply manual parsing if we used prompt-based formatting
+        if used_prompt_formatting and original_response_format:
+            self._apply_prompt_based_parsing(response, original_response_format)
+
+        self._record_final_output(response.output_messages)
+
+        # Clean tool call messages from memory after response generation
+        if self.prune_tool_calls_from_memory and tool_call_records:
+            self.memory.clean_tool_calls()
+
+        return self._convert_to_chatagent_response(
+            response,
+            tool_call_records,
+            accumulated_context_tokens,
+            external_tool_call_requests,
+            step_token_usage["prompt_tokens"],
+            step_token_usage["completion_tokens"],
+            step_token_usage["total_tokens"],
+        )
diff --git a/examples/terminal_bench/agent/prompts.py b/examples/terminal_bench/agent/prompts.py
new file mode 100644
index 0000000000..59579a362f
--- /dev/null
+++ b/examples/terminal_bench/agent/prompts.py
@@ -0,0 +1,161 @@
+def get_developer_agent_prompt(
+    current_date: str,
+    system: str,
+    machine: str,
+    is_workforce: bool,
+    non_think_mode: bool = True,
+):
+    """
+    Generate the prompt for the Lead Software Engineer agent.
+    Args:
+        current_date (str): The current date.
+        system (str): The operating system. (e.g., "Linux", "Darwin", "Windows", "Linux (in Docker)"...)
+        machine (str): The machine type. (e.g., "x86_64", "arm64")
+        is_workforce (bool): Whether the agent is part of a workforce with other agents or standalone.
+    Returns:
+        str: The prompt for the Lead Software Engineer agent.
+    """
+    LEAD_SDE_ROLE_PROMPT = """
+<role>
+You are a Lead Software Engineer, a master-level coding assistant with a
+powerful and unrestricted terminal. Your primary role is to solve any
+technical task by analyzing the problem, making plans,
+writing and executing code, installing necessary libraries,
+interacting with the operating system, and deploying applications. You are the
+team's go-to expert for all technical implementation.
+</role>
+"""
+    TEAM_STRUCTURE_PROMPT = ""
+
+    OPERATING_ENVIRONMENT_PROMPT = (
+        f"""
+<operating_environment>
+- **System**: {system} ({machine}).
+"""
+        + (
+            """
+Note that the terminal commands and file system operations you perform will be
+executed inside a Docker container. But note taking tools will operate on the host system.
+"""
+        )
+        if "Docker" in system
+        else ""
+        + f"""
+- **Current Date**: {current_date}.
+</operating_environment>
+"""
+    )
+
+    MANDATORY_INSTRUCTIONS_PROMPT = """
+<mandatory_instructions>
+- You MUST use analyze, plan and review requirements and your work.
+- When you complete your task, your final response must be a comprehensive
+summary of your work and the outcome, presented in a clear, detailed, and
+easy-to-read format. Avoid using markdown tables for presenting data; use
+plain text formatting instead.
+- You MUST use tools and follow tool schemas precisely for every response,
+- You MUST be concise about your reasoning and planning, and limit within 600 tokens.
+- You MUST try diverse tools available in toolkits.
+</mandatory_instructions>
+"""
+    CAPABILITIES_PROMPT = (
+        """
+<capabilities>
+Your capabilities are extensive and powerful:
+- **Unrestricted Code Execution**: You can write and execute code in any
+language to solve a task.
+- For multi-line code, You MUST use tool (shell_write_content_to_file) to first save your code
+to somewhere on the system (e.g.,`script.py`) and then run it from the terminal (e.g.,
+`python script.py`). Beware of the code that includes quotes\"\'; ensure proper
+escaping when writing arguments for toolkit. Make sure it can be parsed by JSON.
+- **Full Terminal Control**: You have root-level access to the terminal. You
+can run any command-line tool, manage files, and interact with the OS. If
+a tool is missing, you MUST install it with the appropriate package manager
+(e.g., `pip3`, `uv`, or `apt-get`). Your capabilities include:
+    - **Text & Data Processing**: `awk`, `sed`, `grep`, `jq`.
+    - **File System & Execution**: `find`, `xargs`, `tar`, `zip`, `unzip`,
+    `chmod`.
+    - **Networking & Web**: `curl`, `wget` for web requests; `ssh` for
+    remote access.
+- **IMPORTANT**: Always complete the full automation workflow—do not just
+prepare or suggest actions. Execute them to completion.
+- **Solution Verification**: You can immediately test and verify your
+solutions by executing them in the terminal.
+"""
+        + """
+</capabilities>
+"""
+    )
+
+    PHILOSOPHY_PROMPT = """
+<philosophy>
+- **Bias for Action**: Your purpose is to take action. Don't just suggest
+solutions—implement them. Write code, run commands, and build things.
+- **Complete the Full Task**: When automating GUI applications, always finish
+what you start. If the task involves sending something, send it. If it
+involves submitting data, submit it. Never stop at just preparing or
+drafting—execute the complete workflow to achieve the desired outcome.
+- **Embrace Challenges**: Never say "I can't." If you
+encounter a limitation, find a way to overcome it.
+- **Resourcefulness**: If a tool is missing, install it. If information is
+lacking, find it. You have the full power of a terminal to acquire any
+resource you need.
+- **Think Like an Engineer**: Approach problems methodically. Analyze
+requirements, execute it, and verify the results. Your
+strength lies in your ability to engineer solutions.
+- ** Use Absolute Paths**: You can access files from any place in the file
+system. For all file system operations, you MUST use absolute paths to ensure
+precision and avoid ambiguity.
+- ** Check current directory**: Always check your current directory with `pwd` and list
+files with `ls -la` before performing file operations. This helps you
+understand your context and avoid mistakes.
+- ** Search for Files**: If you need a file but cannot find it in the current directory,
+use commands like `find / -name "filename"` or search in directories common for the System
+to locate it anywhere in the file system. This ensures you can always access the resources you need.
+- ** Adhere to the initial task instruction**: Always keep the original task instruction in mind, make sure to understand
+all requirements and useful information. Make sure finish every subtask mentioned in the instruction.
+</philosophy>
+"""
+
+    TERMINAL_TIPS_PROMPT = """
+<terminal_tips>
+The terminal tools are session-based, identified by a unique `id`. Master
+these tips to maximize your effectiveness:
+
+- **Command-Line Best Practices**:
+- **Be Creative**: The terminal is your most powerful tool. Use it boldly.
+- **Automate Confirmation**: Use `-y` or `-f` flags to avoid interactive
+prompts.
+- **Manage Output**: Redirect long outputs to a file (e.g., `> output.txt`).
+- **Chain Commands**: Use `&&` to link several commands for sequential execution.
+But also avoid chaining too many commands in one line
+to avoid json parse errors due to complex escaping issues.
+- **Piping**: Use `|` to pass output from one command to another.
+- **Permissions**: Use `ls -F` to check file permissions.
+- **Installation**: Use `pip3 install` or `apt-get install` for new
+packages.
+- **Time Management**: `shell_exec` commands come with block or non-block mode. The block mode
+    has a time limit, and only suitable for very quick commands. If you expect a command to take a long time, or
+    you have experienced a timeout for a command, you MUST use non-block mode by setting `block=False`.
+    The non-block mode allows commands to run in the background. You can check the status using `shell_view`,
+    send in further input using `shell_write_to_process`, and kill it using `shell_kill_process` if needed.
+
+</terminal_tips>
+"""
+    COLLABORATION_AND_ASSISTANCE_PROMPT = """
+                                            """
+
+    FINAL_INSTRUCTIONS_PROMPT = f"""
+        {LEAD_SDE_ROLE_PROMPT}
+        {TEAM_STRUCTURE_PROMPT}
+        {OPERATING_ENVIRONMENT_PROMPT}
+        {MANDATORY_INSTRUCTIONS_PROMPT}
+        {CAPABILITIES_PROMPT}
+        {PHILOSOPHY_PROMPT}
+        {TERMINAL_TIPS_PROMPT}
+        {COLLABORATION_AND_ASSISTANCE_PROMPT}
+        """
+    if non_think_mode:
+        FINAL_INSTRUCTIONS_PROMPT = rf"{FINAL_INSTRUCTIONS_PROMPT} /no_think"
+
+    return FINAL_INSTRUCTIONS_PROMPT
diff --git a/examples/terminal_bench/agent_rl_config.py b/examples/terminal_bench/agent_rl_config.py
new file mode 100644
index 0000000000..3fa2d44ed7
--- /dev/null
+++ b/examples/terminal_bench/agent_rl_config.py
@@ -0,0 +1,25 @@
+from dataclasses import dataclass, field
+
+from areal.api.cli_args import GRPOConfig
+
+
+@dataclass
+class TaskTimeouts:
+    _reset_env: float = 1800.0
+    _reset_agent: float = 120.0
+    agent_astep: float = 300.0
+    _evaluate_completion_sync: float = 1200.0
+    _cleanup: float | None = None
+
+
+@dataclass
+class AgentRLConfig(GRPOConfig):
+    n_trajs: int = field(default=1)
+    max_tokens_per_trajectory: int = field(default=32768)
+    max_iteration: int = field(default=3)
+    max_workers: int = field(default=25)
+    non_think_mode: bool = field(default=True)
+    async_training: bool = field(default=False)
+    task_timeouts: TaskTimeouts = field(default_factory=TaskTimeouts)
+    filter_uniform_reward: bool = field(default=False)
+    encourage_completion_reward: bool = field(default=False)
diff --git a/examples/terminal_bench/config_tb_sglang.yaml b/examples/terminal_bench/config_tb_sglang.yaml
new file mode 100644
index 0000000000..746ccb8c9b
--- /dev/null
+++ b/examples/terminal_bench/config_tb_sglang.yaml
@@ -0,0 +1,187 @@
+experiment_name: terminal_bench_rl
+trial_name: trial0
+
+seed: 1
+total_train_epochs: 200
+tokenizer_path: ${actor.path}
+async_training: true
+
+n_trajs: 4
+max_tokens_per_trajectory: 40000
+
+dynamic_bs: false
+
+cluster:
+  n_nodes: 1
+  n_gpus_per_node: 8
+  fileroot: /tmp/areal/experiments
+  name_resolve:
+    type: nfs
+    nfs_record_root: /tmp/areal/name_resolve
+
+allocation_mode: sglang:d2p1t2+d2p1t1c2
+
+scheduler:
+  type: local
+
+rollout:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  max_concurrent_rollouts: 64
+  queue_size: null
+  consumer_batch_size: ${train_dataset.batch_size}
+  max_head_offpolicyness: 2
+  enable_rollout_tracing: false
+  scheduling_spec: ${actor.scheduling_spec}
+  fileroot: ${cluster.fileroot}
+  tokenizer_path: ${tokenizer_path}
+  dump_to_file: true
+  check_trajectory_format: true
+
+gconfig:
+  n_samples: 2
+  min_new_tokens: 0
+  max_new_tokens: 1024
+  greedy: false
+  temperature: 1.0
+
+actor:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  path: Qwen/Qwen3-8B
+  init_from_scratch: false
+  disable_dropout: true
+  gradient_checkpointing: true
+  dtype: bfloat16
+  mb_spec:
+    max_tokens_per_mb: 30000
+  optimizer:
+    type: adam
+    lr: 1.70e-5
+    weight_decay: 0.017
+    beta1: 0.9
+    beta2: 0.999
+    eps: 1e-8
+    lr_scheduler_type: constant
+    gradient_clipping: 1.0
+    warmup_steps_proportion: 0.001
+  eps_clip: 0.4
+  temperature: ${gconfig.temperature}
+  reward_scaling: 10.0
+  reward_bias: -0.5
+  kl_ctl: 0.0
+  ppo_n_minibatches: 1
+  recompute_logprob: true
+  use_decoupled_loss: true
+  behave_imp_weight_cap: 5.0
+  adv_norm:
+    mean_level: batch
+    std_level: batch
+  weight_update_mode: xccl
+  max_new_tokens: ${gconfig.max_new_tokens}
+  scheduling_spec:
+    - task_type: worker
+      port_count: 2
+      gpu: 1
+      cpu: 4
+      mem: 16
+      cmd: python3 -m areal.infra.rpc.rpc_server
+      env_vars: {}
+
+ref:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  path: ${actor.path}
+  init_from_scratch: false
+  disable_dropout: true
+  dtype: ${actor.dtype}
+  mb_spec:
+    max_tokens_per_mb: 10240
+  optimizer:
+    type: adam
+    lr: 1.70e-5
+    weight_decay: 0.017
+    beta1: 0.9
+    beta2: 0.999
+    eps: 1e-8
+    lr_scheduler_type: constant
+    gradient_clipping: 1.0
+    warmup_steps_proportion: 0.001
+  scheduling_strategy:
+    type: colocation
+    target: actor
+  scheduling_spec: ${actor.scheduling_spec}
+
+# SGLang
+sglang:
+  model_path: ${actor.path}
+  random_seed: ${seed}
+  skip_tokenizer_init: true
+  dtype: ${actor.dtype}
+  max_running_requests: null
+  context_length: 40000
+  mem_fraction_static: 0.8
+  disable_radix_cache: true
+
+
+# datasets
+train_dataset:
+  batch_size: 16
+  shuffle: true
+  pin_memory: true
+  num_workers: 4
+  path: path/to/tbench-selected-tasks-easy.parquet
+  type: rl
+  max_length: 1024
+
+valid_dataset:
+  batch_size: 4
+  shuffle: true
+  pin_memory: true
+  num_workers: 4
+  path: path/to/val.parquet
+  type: rl
+
+# Utilities
+saver:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  freq_epochs: 1
+  freq_steps: null
+  freq_secs: null
+
+recover:
+  mode: disabled
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  freq_epochs: 1
+  freq_steps: null
+  freq_secs: 3600
+
+evaluator:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  freq_epochs: null
+  freq_steps: null
+  freq_secs: null
+
+stats_logger:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  wandb:
+      mode: disabled
+
+
+perf_tracer:
+  enabled: true
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  save_interval: 1
+  session_tracer:
+    enabled: true
+    flush_threshold: 100
diff --git a/examples/terminal_bench/config_tb_vllm_npu.yaml b/examples/terminal_bench/config_tb_vllm_npu.yaml
new file mode 100644
index 0000000000..df6432e370
--- /dev/null
+++ b/examples/terminal_bench/config_tb_vllm_npu.yaml
@@ -0,0 +1,196 @@
+experiment_name: terminal_bench_npu_rl
+trial_name: trial0
+
+seed: 1
+total_train_epochs: 200
+tokenizer_path: ${actor.path}
+async_training: true
+
+n_trajs: 4
+max_tokens_per_trajectory: 40000
+
+dynamic_bs: false
+
+cluster:
+  n_nodes: 1
+  n_gpus_per_node: 16
+  fileroot: /tmp/areal/experiments
+  name_resolve:
+    type: nfs
+    nfs_record_root: /tmp/areal/name_resolve
+
+allocation_mode: vllm:d4p1t2+d4p1t1c2
+
+scheduler:
+  type: local
+
+rollout:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  max_concurrent_rollouts: 64
+  queue_size: null
+  consumer_batch_size: ${train_dataset.batch_size}
+  max_head_offpolicyness: 2
+  enable_rollout_tracing: false
+  scheduling_spec: ${actor.scheduling_spec}
+  fileroot: ${cluster.fileroot}
+  tokenizer_path: ${tokenizer_path}
+  dump_to_file: true
+  check_trajectory_format: true
+
+gconfig:
+  n_samples: 2
+  min_new_tokens: 0
+  max_new_tokens: 1024
+  greedy: false
+  temperature: 1.0
+
+actor:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  path: Qwen/Qwen3-8B
+  init_from_scratch: false
+  disable_dropout: true
+  gradient_checkpointing: true
+  dtype: bfloat16
+  mb_spec:
+    max_tokens_per_mb: 30000
+  optimizer:
+    type: adam
+    lr: 1.70e-5
+    weight_decay: 0.017
+    beta1: 0.9
+    beta2: 0.999
+    eps: 1e-8
+    lr_scheduler_type: constant
+    gradient_clipping: 1.0
+    warmup_steps_proportion: 0.001
+  eps_clip: 0.4
+  temperature: ${gconfig.temperature}
+  reward_scaling: 10.0
+  reward_bias: -0.5
+  kl_ctl: 0.0
+  ppo_n_minibatches: 1
+  recompute_logprob: true
+  use_decoupled_loss: true
+  behave_imp_weight_cap: 5.0
+  adv_norm:
+    mean_level: batch
+    std_level: batch
+  weight_update_mode: xccl
+  max_new_tokens: ${gconfig.max_new_tokens}
+  scheduling_spec:
+    - task_type: worker
+      port_count: 2
+      gpu: 1
+      cpu: 4
+      mem: 16
+      cmd: python3 -m areal.infra.rpc.rpc_server
+      env_vars:
+        VLLM_USE_V1: "1"
+        VLLM_ALLOW_LONG_MAX_MODEL_LEN: "1"
+        TASK_QUEUE_ENABLE: "2"
+        HCCL_EXEC_TIMEOUT: "14400"
+        HCCL_OP_EXPANSION_MODE: "HOST"
+        ACL_DEVICE_SYNC_TIMEOUT: "14400"
+        HCCL_EVENT_TIMEOUT: "14500"
+        HCCL_ASYNC_ERROR_HANDLING: "0"
+        ACL_STREAM_TIMEOUT: "14500000"
+        HCCL_CONNECT_TIMEOUT: "7200"
+        PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
+
+ref:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  path: ${actor.path}
+  init_from_scratch: false
+  disable_dropout: true
+  dtype: ${actor.dtype}
+  mb_spec:
+    max_tokens_per_mb: 10240
+  optimizer:
+    type: adam
+    lr: 1.70e-5
+    weight_decay: 0.017
+    beta1: 0.9
+    beta2: 0.999
+    eps: 1e-8
+    lr_scheduler_type: constant
+    gradient_clipping: 1.0
+    warmup_steps_proportion: 0.001
+  scheduling_strategy:
+    type: colocation
+    target: actor
+  scheduling_spec: ${actor.scheduling_spec}
+
+# adpated for NPU
+vllm:
+  model: ${actor.path}
+  seed: ${seed}
+  skip_tokenizer_init: false
+  dtype: ${actor.dtype}
+  max_model_len: 40000
+  gpu_memory_utilization: 0.8
+  enforce_eager: true
+
+
+# datasets
+train_dataset:
+  batch_size: 16
+  shuffle: true
+  pin_memory: true
+  num_workers: 4
+  path: path/to/tbench-selected-tasks-easy.parquet
+  type: rl
+  max_length: 1024
+
+valid_dataset:
+  batch_size: 4
+  shuffle: true
+  pin_memory: true
+  num_workers: 4
+  path: path/to/val.parquet
+  type: rl
+
+# Utilities
+saver:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  freq_epochs: 1
+  freq_steps: null
+  freq_secs: null
+
+recover:
+  mode: disabled
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  freq_epochs: 1
+  freq_steps: null
+  freq_secs: 3600
+
+evaluator:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  freq_epochs: null
+  freq_steps: null
+  freq_secs: null
+
+stats_logger:
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  wandb:
+    mode: disabled
+
+perf_tracer:
+  enabled: true
+  experiment_name: ${experiment_name}
+  trial_name: ${trial_name}
+  fileroot: ${cluster.fileroot}
+  save_interval: 1
+  session_tracer:
+    enabled: true
+    flush_threshold: 100
diff --git a/examples/terminal_bench/pyproject.toml b/examples/terminal_bench/pyproject.toml
new file mode 100644
index 0000000000..4fde56fa37
--- /dev/null
+++ b/examples/terminal_bench/pyproject.toml
@@ -0,0 +1,40 @@
+[build-system]
+requires = ["setuptools>=61"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "areal-terminal-bench-env"
+version = "0.1.0"
+description = "Minimal dependency spec for AReaL + Camel + Terminal Bench workflow with the premise of a working AReaL environment"
+requires-python = ">=3.10"
+dependencies = [
+  "ipython",
+  "ruamel.yaml",
+  "streamlit",
+  "sqlalchemy",
+  "docker",
+  "camel-ai==0.2.85a0",
+  "terminal-bench==0.2.18",
+]
+
+[project.optional-dependencies]
+npu = [
+  "sglang[srt_npu] @ git+https://github.com/sgl-project/sglang.git@v0.5.6.post2#subdirectory=python",
+]
+
+[tool.notes]
+tiktoken_cache_dir = "/tmp/tiktoken-cache"
+tiktoken_cache_file = "o200k_base.tiktoken"
+
+[tool.install-notes]
+no_deps = [
+  "docker",
+  "camel-ai",
+  "terminal-bench",
+]
+
+commands = [
+  "export TIKTOKEN_CACHE_DIR=/tmp/tiktoken-cache",
+  "mkdir -p '$TIKTOKEN_CACHE_DIR'",
+  "curl -k -O https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken",
+]
diff --git a/examples/terminal_bench/reward.png b/examples/terminal_bench/reward.png
new file mode 100644
index 0000000000..42a7ec3249
Binary files /dev/null and b/examples/terminal_bench/reward.png differ
diff --git a/examples/terminal_bench/train.py b/examples/terminal_bench/train.py
new file mode 100644
index 0000000000..043df0830f
--- /dev/null
+++ b/examples/terminal_bench/train.py
@@ -0,0 +1,74 @@
+import os
+
+os.environ["TOKENIZERS_PARALLELISM"] = "false"
+import sys
+from pathlib import Path
+
+from agent_rl_config import AgentRLConfig
+from datasets import load_dataset
+
+from areal import PPOTrainer
+from areal.api.alloc_mode import AllocationMode
+from areal.api.cli_args import load_expr_config
+from areal.utils import seeding
+from areal.utils.hf_utils import load_hf_tokenizer
+from areal.utils.stats_logger import StatsLogger
+
+WORKFLOW_PATH = "workflow.camel_rlvr_workflow.CamelRLVRWorkflow"
+
+
+def main(args):
+    config, _ = load_expr_config(args, AgentRLConfig)
+
+    rank = int(os.getenv("RANK", "0"))
+    tokenizer = load_hf_tokenizer(config.tokenizer_path)
+
+    seeding.set_random_seed(config.seed, key=f"trainer{rank}")
+    allocation_mode = AllocationMode.from_str(config.allocation_mode)
+    assert allocation_mode.train is not None
+
+    dataset = load_dataset(
+        path="parquet",
+        split="train",
+        data_files=[
+            str(
+                Path(__file__).parent.parent.parent
+                / "dataset"
+                / config.train_dataset.path
+            )
+        ],
+    )
+
+    workflow_kwargs = dict(
+        gconfig=config.gconfig,
+        tokenizer=tokenizer,
+        n_trajs=config.n_trajs,
+        max_tokens=config.max_tokens_per_trajectory,
+        dump_dir=os.path.join(
+            StatsLogger.get_log_path(config.stats_logger), "generated"
+        ),
+        max_iteration=config.max_iteration,
+        max_workers=config.max_workers,
+        non_think_mode=config.non_think_mode,
+        task_timeouts=config.task_timeouts,
+        filter_uniform_reward=config.filter_uniform_reward,
+        encourage_completion_reward=config.encourage_completion_reward,
+    )
+
+    eval_workflow_kwargs = workflow_kwargs.copy()
+
+    with PPOTrainer(
+        config,
+        train_dataset=dataset,
+        valid_dataset=dataset,
+    ) as trainer:
+        trainer.train(
+            workflow=WORKFLOW_PATH,
+            workflow_kwargs=workflow_kwargs,
+            eval_workflow=WORKFLOW_PATH,
+            eval_workflow_kwargs=eval_workflow_kwargs,
+        )
+
+
+if __name__ == "__main__":
+    main(sys.argv[1:])
diff --git a/examples/terminal_bench/workflow/__init__.py b/examples/terminal_bench/workflow/__init__.py
new file mode 100644
index 0000000000..a66a78c750
--- /dev/null
+++ b/examples/terminal_bench/workflow/__init__.py
@@ -0,0 +1,4 @@
+from .camel_rlvr_workflow import CamelRLVRWorkflow
+from .pre_build_tasks_utils import build_docker_image
+
+__all__ = ["CamelRLVRWorkflow", "build_docker_image"]
diff --git a/examples/terminal_bench/workflow/camel_rlvr_workflow.py b/examples/terminal_bench/workflow/camel_rlvr_workflow.py
new file mode 100644
index 0000000000..4b63724dfd
--- /dev/null
+++ b/examples/terminal_bench/workflow/camel_rlvr_workflow.py
@@ -0,0 +1,175 @@
+from __future__ import annotations
+
+import asyncio
+import os
+import uuid
+from concurrent.futures import ThreadPoolExecutor
+from functools import partial
+
+os.environ["TOKENIZERS_PARALLELISM"] = "false"
+
+from agent.camel_terminal_agent import CamelTerminalAgent
+from agent_rl_config import TaskTimeouts
+from transformers import PreTrainedTokenizerFast
+
+from areal.api.cli_args import GenerationHyperparameters
+from areal.api.workflow_api import RolloutWorkflow
+from areal.experimental.openai import ArealOpenAI
+from areal.utils import stats_tracker
+from areal.utils.perf_tracer import atrace_scope
+
+from .pre_build_tasks_utils import build_docker_image
+
+
+class CamelRLVRWorkflow(RolloutWorkflow):
+    def __init__(
+        self,
+        gconfig: GenerationHyperparameters,
+        tokenizer: PreTrainedTokenizerFast,
+        dump_dir: str | None = None,
+        rollout_stat_scope: str = "rollout",
+        n_trajs: int = 1,
+        max_tokens: int = 32768,
+        max_iteration: int = 50,
+        max_workers: int = 25,
+        non_think_mode: bool = True,
+        task_timeouts: TaskTimeouts | None = None,
+        filter_uniform_reward: bool = False,
+        encourage_completion_reward: bool = False,
+    ):
+        self.gconfig = gconfig
+        self.gconfig.n_samples = 1
+        self.tokenizer = tokenizer
+        self.dump_dir = dump_dir
+        self.max_tokens = max_tokens
+        self.max_iteration = max_iteration
+        self.rollout_stat_scope = rollout_stat_scope
+        if self.dump_dir is not None and not os.path.exists(self.dump_dir):
+            os.makedirs(self.dump_dir, exist_ok=True)
+
+        self.n_trajs = n_trajs
+        self.non_think_mode = non_think_mode
+        self.task_timeouts = task_timeouts or TaskTimeouts()
+        self.filter_uniform_reward = filter_uniform_reward
+        self.encourage_completion_reward = encourage_completion_reward
+        self.executor = ThreadPoolExecutor(max_workers=max_workers)
+
+    async def arun_episode(self, engine, data):
+        clients = [
+            ArealOpenAI(
+                engine=engine,
+                tokenizer=self.tokenizer,
+                tool_call_parser="qwen25",
+            )
+            for _ in range(self.n_trajs)
+        ]
+        uids = [uuid.uuid4().hex[:8] for _ in range(self.n_trajs)]
+
+        loop = asyncio.get_running_loop()
+        try:
+            async with atrace_scope(
+                f"build_docker_image:{data.get('task_name')}",
+                args={"timeout": self.task_timeouts._reset_env},
+            ):
+                await asyncio.wait_for(
+                    loop.run_in_executor(
+                        self.executor,
+                        partial(
+                            build_docker_image,
+                            task=data,
+                            timeout=self.task_timeouts._reset_env,
+                        ),
+                    ),
+                    timeout=self.task_timeouts._reset_env + 60.0,
+                )
+        except TimeoutError:
+            print(
+                f"Timeout while building docker image for task {data.get('task_name')}"
+            )
+            return None
+
+        print(f"\n{'=' * 70}")
+        print(f"[EPISODE START] Task {data.get('task_name')}")
+        print(f"{'=' * 70}\n")
+
+        rewards = await asyncio.gather(
+            *[
+                CamelTerminalAgent(
+                    max_tokens_per_turn=self.gconfig.max_new_tokens,
+                    max_total_tokens=self.max_tokens,
+                    max_iteration=self.max_iteration,
+                    output_path=f"{self.dump_dir}/CamelTerminalAgent_Output",
+                    executor=self.executor,
+                    non_think_mode=self.non_think_mode,
+                    task_timeouts=self.task_timeouts,
+                    encourage_completion_reward=self.encourage_completion_reward,
+                ).run_agent(
+                    data=data,
+                    client=clients[i],
+                    uid=uids[i],
+                    traj_i=i,
+                )
+                for i in range(self.n_trajs)
+            ]
+        )
+
+        print(f"\n{'=' * 70}")
+        print(f"[EPISODE END] Task {data.get('task_name')}")
+        print(f"{'=' * 70}\n")
+
+        completions_with_reward = {}
+        if self.filter_uniform_reward:
+            valid_rewards = [reward for reward in rewards if reward is not None]
+            if valid_rewards and all(
+                reward == valid_rewards[0] for reward in valid_rewards
+            ):
+                print(
+                    f"Rank {os.getenv('RANK', '0')} - Task {data.get('task_name')} "
+                    "has uniform reward across trajectories. Discarding all."
+                )
+                return completions_with_reward
+            if not valid_rewards:
+                print(
+                    f"Rank {os.getenv('RANK', '0')} - Task {data.get('task_name')} "
+                    "all trajectories failed."
+                )
+                return completions_with_reward
+
+        for i, (reward, client) in enumerate(zip(rewards, clients)):
+            if reward is None:
+                print(
+                    f"Rank {os.getenv('RANK', '0')} - Task {data.get('task_name')}, "
+                    f"Trajectory {i} failed."
+                )
+                os.makedirs(f"{self.dump_dir}/failed_tasks", exist_ok=True)
+                with open(
+                    f"{self.dump_dir}/failed_tasks/{data.get('task_name')}_traj_{i}.txt",
+                    "w",
+                ) as f:
+                    f.write(f"Task {data.get('task_name')} trajectory {i} failed.\n")
+                continue
+
+            print(
+                f"Rank {os.getenv('RANK', '0')} - Task {data.get('task_name')}, "
+                f"Trajectory {i} reward: {reward}"
+            )
+            stats_tracker.get(self.rollout_stat_scope).scalar(reward=reward)
+            client.apply_reward_discount(turn_discount=0.9)
+            completions = client.export_interactions(style="individual")
+            completions_with_reward.update(completions)
+
+        if len(completions_with_reward) == 0:
+            print(f"All trajectories failed for task {data.get('task_name')}.")
+            completions_with_reward = None
+
+        stats_tracker.get(self.rollout_stat_scope).scalar(
+            num_full_passes=sum(1 for reward in rewards if reward == 1.0)
+        )
+        stats_tracker.get(self.rollout_stat_scope).scalar(
+            num_trajectories_failed=sum(1 for reward in rewards if reward is None)
+        )
+
+        print(
+            f"Rank {os.getenv('RANK', '0')} - Task {data.get('task_name')} completed."
+        )
+        return completions_with_reward
diff --git a/examples/terminal_bench/workflow/pre_build_tasks_utils.py b/examples/terminal_bench/workflow/pre_build_tasks_utils.py
new file mode 100644
index 0000000000..92a57428a2
--- /dev/null
+++ b/examples/terminal_bench/workflow/pre_build_tasks_utils.py
@@ -0,0 +1,28 @@
+from pathlib import Path
+
+from terminal_bench.handlers.trial_handler import TrialHandler
+from terminal_bench.terminal.docker_compose_manager import DockerComposeManager
+
+DATASET_ROOT = Path(__file__).resolve().parents[3] / "dataset"
+
+
+def build_docker_image(task: dict, timeout=1200.0):
+    task_path = DATASET_ROOT / task.get("task_path")
+    trial_handler = TrialHandler(
+        trial_name="build_run",
+        input_path=task_path,
+        output_path=Path("build_outputs"),
+    )
+    print(f"Task path: {task_path}")
+
+    compose_manager = DockerComposeManager(
+        client_container_name=trial_handler.client_container_name,
+        client_image_name=trial_handler.client_image_name,
+        docker_image_name_prefix=trial_handler.docker_image_name_prefix,
+        docker_compose_path=trial_handler.task_paths.docker_compose_path,
+        no_rebuild=True,
+        cleanup=True,
+        sessions_logs_path=trial_handler.trial_paths.sessions_path,
+        agent_logs_path=trial_handler.trial_paths.agent_logging_dir,
+    )
+    compose_manager.build(timeout=timeout)