Skip to content

Commit 334e80c

Browse files
YanhuiDuaHAOCHENYE
authored andcommitted
[CI] Add RL-specific Claude review guidance
1 parent 3c6e61f commit 334e80c

3 files changed

Lines changed: 86 additions & 4 deletions

File tree

.claude/CLAUDE.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -93,8 +93,11 @@
9393
5. **Performance**: Unnecessary CPU-GPU synchronization (`item()`, `.cpu()` in hot paths), redundant data copies, inefficient collective communication patterns.
9494
6. **API Contracts**: Do public interface changes maintain backward compatibility? Are deprecation warnings added for breaking changes?
9595
7. **Resource Cleanup**: Are file handles, NCCL communicators, and CUDA streams properly cleaned up?
96-
8. **Metric Changes in `ProduceBatchResult`**: Check whether any metrics in `ProduceBatchResult` will be changed significantly. Call out any obvious increases or decreases, and highlight anything that looks abnormal or unexpected.
97-
9. **Routed Experts Memory Leak Risk**: Review whether this PR could unintentionally retain additional references to routed experts or make their lifecycle/state harder to track, which could eventually result in memory leaks.
96+
97+
### Area-Specific Review Rules
98+
99+
If a PR changes code under `xtuner/v1/rl`, read `.claude/rules/rl_review.md` before commenting or
100+
posting the final review summary.
98101

99102
### Review Output Format
100103

@@ -130,4 +133,4 @@ APPROVE / REQUEST_CHANGES / COMMENT
130133

131134
## Rules
132135

133-
Please refer to the docs in `./dev-rules` for the development guidelines.
136+
Please refer to the docs in `.claude/rules` for the development guidelines.

.claude/rules/rl_review.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# RL Review Guidelines
2+
3+
Use these rules whenever a PR changes code under `xtuner/v1/rl`. They extend the general review
4+
standards in `.claude/CLAUDE.md`; they do not replace them.
5+
6+
## Required Review Output
7+
8+
Every review for an RL change must state the ProduceBatchResult impact near the top-level summary:
9+
10+
```text
11+
ProduceBatchResult impact: <specific impact or "not affected">
12+
```
13+
14+
Also include routed-experts impact when the PR touches routed-experts logic, rollout response
15+
handling, `RolloutState.extra_fields`, object references, or memory ownership around rollout outputs:
16+
17+
```text
18+
RoutedExperts impact: <specific impact or "not affected">
19+
```
20+
21+
Do not leave these impacts implicit in the finding text. If a finding is about rollout status,
22+
pause/abort/timeout behavior, response handling, producer aggregation, or routed-experts ownership,
23+
repeat the relevant impact line inside that finding so the downstream effect is visible at the point
24+
of review.
25+
26+
## ProduceBatchResult Checklist
27+
28+
Review whether the PR can change trainer-visible batch accounting, timing, or reward semantics in
29+
`ProduceBatchResult`.
30+
31+
Check this area when the PR changes any of these paths:
32+
33+
- `RolloutState.status`, `finish_reason`, or status conversion.
34+
- Abort, filter, expire, retry, timeout, cancellation, or failure handling.
35+
- Producer, agent loop, judger, replay buffer, rollout worker, rollout controller, or backend pause
36+
cleanup logic.
37+
- Writers, readers, tests, or fake implementations that construct or consume `ProduceBatchResult`,
38+
including trainer-facing code outside `xtuner/v1/rl`.
39+
40+
Name the concrete field-level impact when any of these fields can change:
41+
42+
- Batch status: `status`.
43+
- Returned groups: `rollout_states`.
44+
- Generation timing: `group_gen_count`, `group_gen_mean_s`, `group_gen_p50_s`, `group_gen_p99_s`,
45+
`group_gen_p99_p50_ratio`, `group_gen_pause_time_s`.
46+
- Replay-buffer leftovers: `leftover_init`, `leftover_completed`, `leftover_aborted`,
47+
`leftover_expired`, `leftover_failed`, `leftover_filtered`.
48+
- Reward accounting: `raw_rewards_sum`, `raw_rewards_count`.
49+
- Produced work counters: `produced_samples`, `produced_tokens`, `produce_time_s`.
50+
- Multi-task aggregation: `task_batch_sizes`, `task_results`.
51+
52+
Common impacts to call out explicitly:
53+
54+
- A sample moving between `ABORTED`, `FAILED`, `FILTERED`, `EXPIRED`, and `COMPLETED` can change the
55+
corresponding `leftover_*` counts.
56+
- Pause, abort, timeout, and cancellation changes can inflate or deflate `group_gen_*` timing,
57+
especially `group_gen_pause_time_s`.
58+
- Reward or filter-path changes can change `raw_rewards_sum`, `raw_rewards_count`,
59+
`produced_samples`, and `produced_tokens`.
60+
61+
## RoutedExperts Checklist
62+
63+
Review whether routed-experts ownership and cleanup remain correct.
64+
65+
Check this area when the PR changes any of these paths:
66+
67+
- LMDeploy rollout response handling.
68+
- `return_routed_experts`, `routed_experts`, `RolloutState.extra_fields`, or object-ref plumbing.
69+
- Abort, cancellation, timeout, filter, retry, or failure paths before response handling completes.
70+
- Background tasks, replay-buffer storage, trainer batches, metrics, tests, or fake rollout
71+
responses that can retain routed-experts object refs.
72+
73+
For LMDeploy rollout, `rollout_worker` obtains routed-experts object refs from the LMDeploy shared
74+
store. At that point, ownership moves from LMDeploy to XTuner. A review finding should call out both
75+
sides of the ownership boundary when relevant:
76+
77+
- Leak before transfer: requests whose routed experts remain in LMDeploy because XTuner never
78+
obtains the object refs.
79+
- Leak after transfer: object refs that XTuner keeps alive too long through `RolloutState`, replay
80+
buffer, trainer batches, metrics, fake tests, or background tasks.

.github/workflows/claude-general.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,4 +223,3 @@ jobs:
223223
}
224224
allowed_non_write_users: "*"
225225
track_progress: false
226-

0 commit comments

Comments
 (0)