Skip to content

Commit a367879

Browse files
authored
[recipe] feat: Revamp single-controller demo with agentic multi-turn rollout and add CI (Ascend#63)
## Summary - **Rewrite `single_controller_demo.py`** to showcase a realistic agentic RLHF training loop: replaces the previous hardcoded tensor inputs and `AsyncvLLMServer` with a multi-turn `AgentLoop` that interleaves LLM generation with simulated tool calls, an OpenAI-style `MessageDataset` with interleaved image support, a proper `DataLoader`-based training loop, and an explicit `compute_reward` step that writes advantages back through TQ. - **Adopt the new KV APIs** introduced in Ascend#57 throughout the recipe — uses `kv_batch_get_by_meta`, the return value of `kv_batch_put` (cumulative `KVBatchMeta`), and removes all manual `kv_meta.fields.append(...)` calls. - **Add structured configuration** via `@dataclass` classes (`TrainerConfig`, `AgentLoopConfig`, `MessageDatasetConfig`) and `argparse` CLI, replacing the flat `OmegaConf.create` dict. Trainer config and TQ config are now cleanly separated. - **Add `recipe-check.yml` CI workflow** that runs the demo end-to-end on every push/PR with reduced parameters (`--num-samples 8 --global-batch-size 4 --rollout-agent-num-workers 1`) to keep CI fast. ## Key changes | Area | Before | After | |------|--------|-------| | Rollout | `AsyncvLLMServer` (Ray actor, single-turn) | `AgentLoop` (multi-turn with tool calls, per-sample async) | | Data | Hardcoded `[[1,2],[3,4],...]` tensors | `MessageDataset` + `DataLoader` with random multi-modal messages | | Reward | Simulated inline (`time.sleep`) | `compute_reward()` producing per-token advantages via TQ | | KV API | `kv_batch_get(keys=..., fields=...)` + manual field tracking | `kv_batch_get_by_meta(meta=...)` + `kv_batch_put` return value | | Config | Flat `OmegaConf` dict | Typed `@dataclass` hierarchy + `argparse` CLI | | CI | None | `recipe-check.yml` workflow | ## Test plan - [ ] Run the recipe locally: `python recipe/simple_use_case/single_controller_demo.py --num-samples 8 --global-batch-size 4` - [ ] Verify the new `recipe-check.yml` workflow passes in CI - [ ] Confirm existing tests (`pytest tests`) still pass --------- Signed-off-by: Chi Zhang <czhangseu@gmail.com>
1 parent 48c15da commit a367879

2 files changed

Lines changed: 444 additions & 137 deletions

File tree

.github/workflows/recipe-check.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: Recipe check
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
- v0.*
8+
pull_request:
9+
branches:
10+
- main
11+
- v0.*
12+
13+
jobs:
14+
build:
15+
runs-on: ubuntu-latest
16+
timeout-minutes: 10
17+
strategy:
18+
fail-fast: false
19+
matrix:
20+
python-version: ["3.11"]
21+
22+
steps:
23+
- uses: actions/checkout@v4
24+
- name: Set up Python ${{ matrix.python-version }}
25+
uses: actions/setup-python@v3
26+
with:
27+
python-version: ${{ matrix.python-version }}
28+
- name: Install dependencies
29+
run: |
30+
python -m pip install --upgrade pip
31+
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
32+
pip install -e ".[yuanrong]"
33+
- name: Run recipes
34+
run: |
35+
export RAY_DEDUP_LOGS=0
36+
python3 recipe/simple_use_case/single_controller_demo.py --num-samples 8 --global-batch-size 4 --rollout-agent-num-workers 1

0 commit comments

Comments
 (0)