Skip to content

test: add 10 TRL parity tests for deprecation readiness#241

Merged
abrichr merged 1 commit into
mainfrom
test/trl-parity-tests
Mar 29, 2026
Merged

test: add 10 TRL parity tests for deprecation readiness#241
abrichr merged 1 commit into
mainfrom
test/trl-parity-tests

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 29, 2026

Summary

  • Adds tests/test_trl_parity.py with 25 test cases covering the 10 parity areas identified in docs/STANDALONE_VS_TRL_COMPARISON.md section "Tests Needed for TRL Deprecation Readiness"
  • All tests are "light" (no torch/transformers/trl top-level imports), use unittest.mock, and pass with [dev] deps only
  • Covers: constrained decoding (Outlines + regex), prompt format identity, DSL round-trip parsing, Thought-prefix parsing, Unsloth loading, LoRA checkpoint resume, HookBridge callbacks, unused hooks storage, and _AgentOutput Pydantic schema validation

Test plan

  • uv run --no-sources pytest tests/test_trl_parity.py -v — all 25 pass
  • uv run --no-sources pytest tests/ -v — no new failures (7 pre-existing failures in test_synthetic_demos.py and test_demo_persistence.py due to missing demo fixture files)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Adds tests/test_trl_parity.py with 25 test cases covering the 10 areas
identified in docs/STANDALONE_VS_TRL_COMPARISON.md as needed before the
standalone GRPO trainer can be deprecated:

1. Constrained decoding — Outlines generator build + ACTION_REGEX
2. Constrained decoding ImportError — returns None, not silent success
3. Prompt format identity — TRL imports SYSTEM_PROMPT from standalone
4. DSL round-trip parsing — CLICK, TYPE, WAIT, DONE via parse_action_json
5. Thought-prefix parsing — "Thought: ...\nAction: DSL" format
6. Unsloth loading — FastVisionModel.from_pretrained + get_peft_model
7. LoRA checkpoint resume — lora_checkpoint passed through config
8. HookBridge on_step_complete — callback fires with correct args
9. HookBridge unused hooks — on_before_collect/on_rollout_complete stored
10. _AgentOutput schema — Pydantic validation, JSON schema, roundtrip

All tests are light (no torch/transformers/trl imports), use unittest.mock,
and pass with [dev] deps only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr merged commit 6a38956 into main Mar 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant