Commit a367879
authored
[recipe] feat: Revamp single-controller demo with agentic multi-turn rollout and add CI (Ascend#63)
## Summary
- **Rewrite `single_controller_demo.py`** to showcase a realistic
agentic RLHF training loop: replaces the
previous hardcoded tensor inputs and `AsyncvLLMServer` with a multi-turn
`AgentLoop` that interleaves
LLM generation with simulated tool calls, an OpenAI-style
`MessageDataset` with interleaved image
support, a proper `DataLoader`-based training loop, and an explicit
`compute_reward` step that writes
advantages back through TQ.
- **Adopt the new KV APIs** introduced in Ascend#57 throughout the recipe —
uses `kv_batch_get_by_meta`,
the return value of `kv_batch_put` (cumulative `KVBatchMeta`), and
removes all manual
`kv_meta.fields.append(...)` calls.
- **Add structured configuration** via `@dataclass` classes
(`TrainerConfig`, `AgentLoopConfig`,
`MessageDatasetConfig`) and `argparse` CLI, replacing the flat
`OmegaConf.create` dict. Trainer config
and TQ config are now cleanly separated.
- **Add `recipe-check.yml` CI workflow** that runs the demo end-to-end
on every push/PR with reduced
parameters (`--num-samples 8 --global-batch-size 4
--rollout-agent-num-workers 1`) to keep CI fast.
## Key changes
| Area | Before | After |
|------|--------|-------|
| Rollout | `AsyncvLLMServer` (Ray actor, single-turn) | `AgentLoop`
(multi-turn with tool calls, per-sample async) |
| Data | Hardcoded `[[1,2],[3,4],...]` tensors | `MessageDataset` +
`DataLoader` with random multi-modal messages |
| Reward | Simulated inline (`time.sleep`) | `compute_reward()`
producing per-token advantages via TQ |
| KV API | `kv_batch_get(keys=..., fields=...)` + manual field tracking
| `kv_batch_get_by_meta(meta=...)` + `kv_batch_put` return value |
| Config | Flat `OmegaConf` dict | Typed `@dataclass` hierarchy +
`argparse` CLI |
| CI | None | `recipe-check.yml` workflow |
## Test plan
- [ ] Run the recipe locally: `python
recipe/simple_use_case/single_controller_demo.py --num-samples 8
--global-batch-size 4`
- [ ] Verify the new `recipe-check.yml` workflow passes in CI
- [ ] Confirm existing tests (`pytest tests`) still pass
---------
Signed-off-by: Chi Zhang <czhangseu@gmail.com>1 parent 48c15da commit a367879
2 files changed
Lines changed: 444 additions & 137 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
0 commit comments