Commit 6a38956
test: add 10 TRL parity tests for deprecation readiness (#241)
Adds tests/test_trl_parity.py with 25 test cases covering the 10 areas
identified in docs/STANDALONE_VS_TRL_COMPARISON.md as needed before the
standalone GRPO trainer can be deprecated:
1. Constrained decoding — Outlines generator build + ACTION_REGEX
2. Constrained decoding ImportError — returns None, not silent success
3. Prompt format identity — TRL imports SYSTEM_PROMPT from standalone
4. DSL round-trip parsing — CLICK, TYPE, WAIT, DONE via parse_action_json
5. Thought-prefix parsing — "Thought: ...\nAction: DSL" format
6. Unsloth loading — FastVisionModel.from_pretrained + get_peft_model
7. LoRA checkpoint resume — lora_checkpoint passed through config
8. HookBridge on_step_complete — callback fires with correct args
9. HookBridge unused hooks — on_before_collect/on_rollout_complete stored
10. _AgentOutput schema — Pydantic validation, JSON schema, roundtrip
All tests are light (no torch/transformers/trl imports), use unittest.mock,
and pass with [dev] deps only.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 114ad0e commit 6a38956
1 file changed
Lines changed: 573 additions & 0 deletions
0 commit comments