Skip to content

[codex] Add replay-backed SFT path#2720

Draft
tim0120 wants to merge 3 commits into
mainfrom
feat/replay-sft
Draft

[codex] Add replay-backed SFT path#2720
tim0120 wants to merge 3 commits into
mainfrom
feat/replay-sft

Conversation

@tim0120
Copy link
Copy Markdown
Contributor

@tim0120 tim0120 commented Jun 4, 2026

Summary

  • consume Verifiers sft-replay instead of keeping a local rollout replay env
  • allow teacherless SFT only for explicit sft-replay taskset datasets
  • force SFT to MITO/no renderer and drop the default zero-advantage post-batch filter unless explicitly configured
  • backfill token usage for replayed message-only trajectories
  • add a debug sft_replay.toml config, focused replay/config tests, and docs/skill guidance for replay-backed SFT

Verifiers dependency

deps/verifiers points at 14be5ee, the current head of PrimeIntellect-ai/verifiers#1536.

Validation

  • uv run --no-project --with ruff ruff check ...
  • inline config validation for teacherless SFT guardrails and configs/debug/training_modes/sft_replay.toml parsing
  • inline sft-replay rollout -> token backfill -> interleave smoke
  • git diff --check HEAD~2..HEAD for the docs/skill and submodule-pin sync commits
  • Verifiers dependency: uv run pytest tests/test_v1_replay_harness.py tests/test_v1_config_extension.py -q

@tim0120 tim0120 force-pushed the feat/replay-sft branch from 8522efd to 750ea0a Compare June 4, 2026 23:27
@tim0120 tim0120 changed the title [codex] Add replay-backed SFT path [codex] Add hosted OPD and replay SFT paths Jun 4, 2026
@tim0120 tim0120 changed the base branch from feat/hosted-opd-teacher-logprobs to main June 4, 2026 23:28
@tim0120 tim0120 force-pushed the feat/replay-sft branch from 750ea0a to 8208ee0 Compare June 4, 2026 23:42
@tim0120 tim0120 changed the title [codex] Add hosted OPD and replay SFT paths [codex] Add replay-backed SFT path Jun 4, 2026
@tim0120 tim0120 force-pushed the feat/replay-sft branch from 8208ee0 to 9e5ef35 Compare June 4, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant