You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: configs/debug/training_modes/README.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,7 @@ Minimal end-to-end configs for the three training modes (`rl` / `opd` / `sft`) a
10
10
|`sft.toml`|`sft`| local vLLM (`Qwen3-0.6B-Reverse-Text-RL`) ||
11
11
|`sft_lora.toml`|`sft`| local vLLM (`Qwen3-0.6B-Reverse-Text-RL`) | trains a LoRA adapter (rank 8) |
12
12
|`sft_external.toml`|`sft`| PI inference (`openai/gpt-5-mini`) | external OAI endpoint; no local teacher |
13
+
|`sft_replay.toml`|`sft`| none | replays saved message traces through `sft-replay`|
13
14
14
15
The student inference server is auto-launched on GPU 0 at `http://localhost:8000/v1` with `gpu_memory_utilization=0.5`. The local teacher (used by everything except `rl.toml` and `sft_external.toml`) is **not** auto-launched — start it manually on GPU 1.
15
16
@@ -42,6 +43,9 @@ uv run rl @ configs/debug/training_modes/sft_lora.toml
42
43
# SFT hard distill from openai/gpt-5-mini via PI inference
43
44
# (requires PRIME_API_KEY + PRIME_TEAM_ID in env; no local teacher needed)
44
45
uv run rl @ configs/debug/training_modes/sft_external.toml
46
+
47
+
# SFT from replayed dataset traces (no teacher)
48
+
uv run rl @ configs/debug/training_modes/sft_replay.toml
45
49
```
46
50
47
51
See [docs/training.md](../../docs/training.md#training-modes-rl--opd--sft-via-orchestrator) for what each mode does.
0 commit comments