Adding single-host integration test for GRPO and GSPO.#3409
Adding single-host integration test for GRPO and GSPO.#3409copybara-service[bot] merged 1 commit intomainfrom
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
49adda4 to
7d0bd50
Compare
xuefgu
left a comment
There was a problem hiding this comment.
Many thanks Nico! Just a couple of quick clarifications.
|
|
||
| base_config: "base.yml" | ||
| attention: "vllm_rpa" | ||
| model_call_mode: "inference" |
There was a problem hiding this comment.
Where is this new config used below?
There was a problem hiding this comment.
This is for MoE! Its unrelated but I wanted to squeeze it in to this PR since we should always be setting this in the vLLM codepath.
There was a problem hiding this comment.
Is the entry point for this through train_rl?
There was a problem hiding this comment.
No, vllm.yml is only used in the MaxText on vLLM adapter class here: https://github.com/AI-Hypercomputer/maxtext/blob/main/src/maxtext/integration/vllm/maxtext_vllm_adapter/adapter.py#L73
| "load_parameters_path=gs://maxtext-model-checkpoints/qwen3-0.6b/2025-10-27/scanned/0/items", | ||
| ] | ||
|
|
||
| def _run_rl_workflow_end_to_end(self, extra_argv): |
There was a problem hiding this comment.
Why do we need this vs. directly calling rl_train() for various configs?
There was a problem hiding this comment.
The biggest benefit is to be able to access some of the internals of rl_train(). For instance, this allows us to assert that the actor model params are being updated.
There was a problem hiding this comment.
The concern is the gradual/eventual divergence of logic in rl_train() and the tests. Any way we can mitigate that?
There was a problem hiding this comment.
Its a good point - I could add a couple of test cases that just calls rl_train() directly to test that API also.
xuefgu
left a comment
There was a problem hiding this comment.
Deferring to you on how to handle the divergence between tests and application logic. Thanks again for the PR!
7d0bd50 to
c6284a2
Compare
c6284a2 to
c45b59c
Compare
Description
Adds an end-to-end integration test for both GRPO and GSPO workflows on qwen3-0.6b (which should work e2e on a single v6e-4 TPU VM).
FIXES: b/490410666, b/490411100, b/490409949, b/490410743
Tests
Tests are skipped for now pending debugging on v6e-4
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.