Skip to content

Commit 1aaf72b

Browse files
committed
docs: update readme
build: update training base image
1 parent 28d8101 commit 1aaf72b

7 files changed

Lines changed: 177 additions & 18 deletions

File tree

.github/workflows/build-training-base-image.yml

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,23 @@ name: Training Base Image
22

33
on:
44
workflow_dispatch:
5+
inputs:
6+
tag:
7+
description: Tag of the image to build
8+
required: true
9+
type: choice
10+
options:
11+
- 25.09-cu130-torch290-sglang053
12+
- 25.06-cu129-torch280-sglang053
13+
- 25.03-cu128-torch271-sglang048
514

615
permissions:
716
contents: read
817
packages: write
918

1019
concurrency:
11-
group: ${{ github.workflow }}-${{ github.ref }}
12-
cancel-in-progress: false
20+
group: ${{ github.workflow }}-${{ inputs.tag }}
21+
cancel-in-progress: true
1322

1423
jobs:
1524
build-and-push:
@@ -43,10 +52,10 @@ jobs:
4352
uses: docker/build-push-action@v6
4453
with:
4554
context: .
46-
file: extra/docker/training-base.Dockerfile
55+
file: extra/docker/training-base/${{ inputs.tag }}.Dockerfile
4756
push: true
48-
tags: ${{ github.repository == 'THUDM/AgentRL' && format('{0}/agentrl-training-base:25.03-cu128-torch271-sglang048', secrets.DOCKER_USER) || format('ghcr.io/{0}/training-base:25.03-cu128-torch271-sglang048', steps.repo_slug.outputs.repo_lower) }}
57+
tags: ${{ github.repository == 'THUDM/AgentRL' && format('{0}/agentrl-training-base:{1}', secrets.DOCKER_USER, inputs.tag) || format('ghcr.io/{0}/training-base:{1}', steps.repo_slug.outputs.repo_lower, inputs.tag) }}
4958
cache-from: |
50-
type=registry,ref=${{ github.repository == 'THUDM/AgentRL' && format('{0}/agentrl-training-base:buildcache', secrets.DOCKER_USER) || format('ghcr.io/{0}/training-base:buildcache', steps.repo_slug.outputs.repo_lower) }}
59+
type=registry,ref=${{ github.repository == 'THUDM/AgentRL' && format('{0}/agentrl-training-base:cache-{1}', secrets.DOCKER_USER, inputs.tag) || format('ghcr.io/{0}/training-base:cache-{1}', steps.repo_slug.outputs.repo_lower, inputs.tag) }}
5160
cache-to: |
52-
type=registry,ref=${{ github.repository == 'THUDM/AgentRL' && format('{0}/agentrl-training-base:buildcache', secrets.DOCKER_USER) || format('ghcr.io/{0}/training-base:buildcache', steps.repo_slug.outputs.repo_lower) }},mode=max
61+
type=registry,ref=${{ github.repository == 'THUDM/AgentRL' && format('{0}/agentrl-training-base:cache-{1}', secrets.DOCKER_USER, inputs.tag) || format('ghcr.io/{0}/training-base:cache-{1}', steps.repo_slug.outputs.repo_lower, inputs.tag) }},mode=max

README.md

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
1111
- [Quickstart](#quickstart)
1212
- [Architectural Overview](#architectural-overview)
1313
- [Training Overview](#training-overview)
14+
- [Installation](#installation)
15+
- [Getting Started](#getting-started)
1416
- [Placement Group](#placement-group)
1517
- [Workers](#workers)
1618
- [Data](#data)
@@ -24,25 +26,39 @@ Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
2426

2527
## Quickstart
2628

27-
For a minimal example of how to use the environment framework,
29+
- For a minimal example of how to use the environment framework,
2830
refer to [`examples/simple-calculator`](examples/simple-calculator).
2931

32+
- For the environment and training data used in our paper,
33+
see [AgentBench FC](https://github.com/THUDM/AgentBench).
34+
35+
- For reproducing the training results in our paper,
36+
refer to [`examples/training/agentrl_trainer.py`](examples/training/agentrl_trainer.py).
37+
3038
## Architectural Overview
3139

3240
![architecture](docs/assets/deployment-framework.png)
3341

3442
This project mainly consists of two parts: the training framework and the environment deployment framework.
3543

36-
For the training framework, see [Training Overview](#training-overview).
44+
- For the training framework, see [Training Overview](#training-overview).
3745
The code is available in the [`trainer`](trainer) directory.
3846

39-
For the environment deployment framework, see [Environment Overview](#environment-overview).
47+
- For the environment deployment framework, see [Environment Overview](#environment-overview).
4048
The code of the controller and the task worker is available in [`controller`](controller) and [`worker`](worker) respectively.
4149

4250
## Training Overview
4351

4452
AgentRL training package provide basic workers and components to compose a training routine.
4553

54+
### Installation
55+
56+
```shell
57+
pip install -e ./trainer
58+
```
59+
60+
### Getting Started
61+
4662
We take [`async_trainer.py`](examples/training/async_trainer.py) as an example to demonstrate how to compose a fully asynchronous GRPO agentic training pipeline.
4763

4864
`async_trainer` trains LLM agents by utilizing three specialised worker pools over a Ray cluster:
@@ -187,7 +203,7 @@ There's also sample configs for the example trainer available in [`examples/trai
187203

188204
## Environment Overview
189205

190-
Building upon [AgentBench](https://github.com/THUDM/AgentBench),
206+
Building upon [AgentBench](https://github.com/THUDM/AgentBench/tree/v0.2),
191207
this part mainly consists of the following components:
192208

193209
### Controller

docs/tasks.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ We provide first-party integration for the following tasks into the environment
1515

1616
### AgentBench FC
1717

18-
We have refactored the original [AgentBench](https://github.com/THUDM/AgentBench),
18+
We have refactored the original [AgentBench](https://github.com/THUDM/AgentBench/tree/v0.2),
1919
supporting a function-calling style prompt and containerized deployment.
2020

21-
Available in the [agentbench_fc](https://github.com/THUDM/AgentBench/tree/agentbench_fc) branch of the original repository.
21+
Available in the [AgentBench](https://github.com/THUDM/AgentBench) repository.
2222

2323
### MobileRL (Android)
2424

extra/docker/training-base.Dockerfile renamed to extra/docker/training-base/25.03-cu128-torch271-sglang048.Dockerfile

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,16 +26,19 @@ RUN apt-get update && \
2626
RUN curl -fsSL https://astral.sh/uv/install.sh | sh
2727

2828
RUN --mount=type=cache,target=/root/.cache/uv \
29-
uv pip install --system --upgrade setuptools packaging pybind11
29+
uv pip install --system --upgrade setuptools packaging psutil ninja pybind11
3030
RUN --mount=type=cache,target=/root/.cache/uv \
3131
uv pip install --system \
32-
--extra-index-url https://download.pytorch.org/whl/cu128 \
32+
--extra-index-url https://download.pytorch.org/whl/cu128 \
3333
torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1
3434
RUN --mount=type=cache,target=/root/.cache/uv \
3535
echo "flashinfer-python==0.2.11.post3" > /tmp/overrides.txt && \
3636
uv pip install --system --override /tmp/overrides.txt \
37-
sglang[all]==0.4.8.post1 megatron-core transformer-engine[pytorch] \
38-
flash-attn accelerate binpacking wandb ray[rllib] tensordict nvitop py-spy && \
37+
sglang[all]==0.4.8.post1 \
38+
megatron-core transformer-engine[pytorch] "flash-attn<=2.8.1" \
39+
accelerate aiohttp binpacking filelock numpy Pillow \
40+
PyYAML ray[rllib] requests tensordict transformers \
41+
wandb nvitop py-spy && \
3942
rm -f /tmp/overrides.txt
4043

4144
### 3. configure utils
@@ -45,8 +48,15 @@ RUN echo 'set -g default-terminal "tmux-256color"' > /root/.tmux.conf && \
4548
echo 'set-environment -g LC_ALL "C.UTF-8"' >> /root/.tmux.conf && \
4649
echo 'set-option -g history-limit 50000' >> /root/.tmux.conf && \
4750
echo 'set-option -g mouse on' >> /root/.tmux.conf && \
51+
echo 'alias pip="uv pip"' >> /root/.bashrc && \
4852
echo 'alias tt="tmux attach -t"' >> /root/.bashrc && \
4953
echo 'alias tn="tmux new -s"' >> /root/.bashrc && \
5054
echo 'alias dp="ls -A | parallel du -sh 2>/dev/null | sort -h"' >> /root/.bashrc && \
5155
echo 'alias ds="du -sh .[!.]* * 2>/dev/null | sort -h"' >> /root/.bashrc && \
5256
echo 'alias pd="py-spy dump --pid"' >> /root/.bashrc
57+
58+
### 4. install current agentrl trainer
59+
COPY . /workspace/agentrl
60+
RUN --mount=type=cache,target=/root/.cache/uv \
61+
uv pip install --system --no-deps \
62+
-e /workspace/agentrl/trainer[megatron]
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
### Common dependencies for the training environment
2+
# May not be up to date, double-check before using
3+
4+
FROM nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04
5+
6+
ENV DEBIAN_FRONTEND=noninteractive
7+
ENV LANG=C.UTF-8
8+
ENV LC_ALL=C.UTF-8
9+
ENV PYTHONUNBUFFERED=1
10+
ENV UV_BREAK_SYSTEM_PACKAGES=1
11+
ENV UV_LINK_MODE=copy
12+
ENV UV_NO_BUILD_ISOLATION=1
13+
ENV PATH="/root/.local/bin:${PATH}"
14+
15+
WORKDIR /workspace
16+
17+
### 1. install python and base tooling
18+
RUN apt-get update && \
19+
apt-get install -y \
20+
python-is-python3 python3 python3-dev \
21+
curl ca-certificates git htop ncurses-term parallel tmux && \
22+
apt-get clean && \
23+
rm -rf /var/lib/apt/lists/*
24+
25+
### 2. install uv and python dependencies
26+
RUN curl -fsSL https://astral.sh/uv/install.sh | sh
27+
28+
RUN --mount=type=cache,target=/root/.cache/uv \
29+
uv pip install --system --upgrade setuptools packaging psutil ninja pybind11
30+
RUN --mount=type=cache,target=/root/.cache/uv \
31+
uv pip install --system \
32+
--extra-index-url https://download.pytorch.org/whl/cu129 \
33+
torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0
34+
RUN --mount=type=cache,target=/root/.cache/uv \
35+
echo "torch-memory-saver==0.0.9rc2" > /tmp/overrides.txt && \
36+
uv pip install --system --override /tmp/overrides.txt \
37+
sglang[all]==0.5.3.post2 \
38+
megatron-core transformer-engine[pytorch] flash-attn==2.7.3 \
39+
accelerate aiohttp binpacking filelock numpy Pillow \
40+
PyYAML ray[rllib] requests tensordict transformers \
41+
wandb nvitop py-spy && \
42+
rm -f /tmp/overrides.txt
43+
44+
### 3. configure utils
45+
RUN echo 'set -g default-terminal "tmux-256color"' > /root/.tmux.conf && \
46+
echo "set -ga terminal-overrides ',*:Tc'" >> /root/.tmux.conf && \
47+
echo 'set-environment -g LANG "C.UTF-8"' >> /root/.tmux.conf && \
48+
echo 'set-environment -g LC_ALL "C.UTF-8"' >> /root/.tmux.conf && \
49+
echo 'set-option -g history-limit 50000' >> /root/.tmux.conf && \
50+
echo 'set-option -g mouse on' >> /root/.tmux.conf && \
51+
echo 'alias pip="uv pip"' >> /root/.bashrc && \
52+
echo 'alias tt="tmux attach -t"' >> /root/.bashrc && \
53+
echo 'alias tn="tmux new -s"' >> /root/.bashrc && \
54+
echo 'alias dp="ls -A | parallel du -sh 2>/dev/null | sort -h"' >> /root/.bashrc && \
55+
echo 'alias ds="du -sh .[!.]* * 2>/dev/null | sort -h"' >> /root/.bashrc && \
56+
echo 'alias pd="py-spy dump --pid"' >> /root/.bashrc
57+
58+
### 4. install current agentrl trainer
59+
COPY . /workspace/agentrl
60+
RUN --mount=type=cache,target=/root/.cache/uv \
61+
uv pip install --system --no-deps \
62+
-e /workspace/agentrl/trainer[megatron]
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
### Common dependencies for the training environment
2+
# May not be up to date, double-check before using
3+
4+
FROM nvcr.io/nvidia/cuda-dl-base:25.09-cuda13.0-devel-ubuntu24.04
5+
6+
ENV DEBIAN_FRONTEND=noninteractive
7+
ENV LANG=C.UTF-8
8+
ENV LC_ALL=C.UTF-8
9+
ENV PYTHONUNBUFFERED=1
10+
ENV UV_BREAK_SYSTEM_PACKAGES=1
11+
ENV UV_LINK_MODE=copy
12+
ENV UV_NO_BUILD_ISOLATION=1
13+
ENV PATH="/root/.local/bin:${PATH}"
14+
15+
WORKDIR /workspace
16+
17+
### 1. install python and base tooling
18+
RUN apt-get update && \
19+
apt-get install -y \
20+
python-is-python3 python3 python3-dev \
21+
curl ca-certificates git htop ncurses-term parallel tmux && \
22+
apt-get clean && \
23+
rm -rf /var/lib/apt/lists/*
24+
25+
### 2. install uv and python dependencies
26+
RUN curl -fsSL https://astral.sh/uv/install.sh | sh
27+
28+
RUN --mount=type=cache,target=/root/.cache/uv \
29+
uv pip install --system --upgrade setuptools packaging psutil ninja pybind11
30+
RUN --mount=type=cache,target=/root/.cache/uv \
31+
uv pip install --system \
32+
--extra-index-url https://download.pytorch.org/whl/cu130 \
33+
torch==2.9.0 torchvision==0.24.0 torchaudio==2.9.0
34+
RUN --mount=type=cache,target=/root/.cache/uv \
35+
echo "torch-memory-saver==0.0.9rc2" > /tmp/overrides.txt && \
36+
uv pip install --system --override /tmp/overrides.txt \
37+
sglang[cu130_all]==0.5.3.post2 \
38+
accelerate aiohttp binpacking filelock numpy Pillow \
39+
PyYAML ray[rllib] requests tensordict transformers \
40+
wandb nvitop py-spy && \
41+
rm -f /tmp/overrides.txt
42+
43+
### 3. configure utils
44+
RUN echo 'set -g default-terminal "tmux-256color"' > /root/.tmux.conf && \
45+
echo "set -ga terminal-overrides ',*:Tc'" >> /root/.tmux.conf && \
46+
echo 'set-environment -g LANG "C.UTF-8"' >> /root/.tmux.conf && \
47+
echo 'set-environment -g LC_ALL "C.UTF-8"' >> /root/.tmux.conf && \
48+
echo 'set-option -g history-limit 50000' >> /root/.tmux.conf && \
49+
echo 'set-option -g mouse on' >> /root/.tmux.conf && \
50+
echo 'alias pip="uv pip"' >> /root/.bashrc && \
51+
echo 'alias tt="tmux attach -t"' >> /root/.bashrc && \
52+
echo 'alias tn="tmux new -s"' >> /root/.bashrc && \
53+
echo 'alias dp="ls -A | parallel du -sh 2>/dev/null | sort -h"' >> /root/.bashrc && \
54+
echo 'alias ds="du -sh .[!.]* * 2>/dev/null | sort -h"' >> /root/.bashrc && \
55+
echo 'alias pd="py-spy dump --pid"' >> /root/.bashrc
56+
57+
### 4. install current agentrl trainer
58+
COPY . /workspace/agentrl
59+
RUN --mount=type=cache,target=/root/.cache/uv \
60+
uv pip install --system --no-deps \
61+
-e /workspace/agentrl/trainer

trainer/pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ dependencies = [
2323
"aiohttp",
2424
"binpacking",
2525
"filelock",
26-
"flash-attn",
26+
"flash-attn>=2.4.3",
2727
"numpy",
2828
"Pillow",
2929
"PyYAML",
@@ -39,7 +39,8 @@ dependencies = [
3939
[project.optional-dependencies]
4040
megatron = [
4141
"megatron-core",
42-
"transformer-engine[pytorch]"
42+
"transformer-engine[pytorch]",
43+
"flash-attn<=2.8.1"
4344
]
4445

4546
[project.readme]

0 commit comments

Comments
 (0)