-
Notifications
You must be signed in to change notification settings - Fork 374
[chore] remove mentions of flashrl from the repo and point to vllm quantization support instead #1855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[chore] remove mentions of flashrl from the repo and point to vllm quantization support instead #1855
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,7 @@ | |
| "search", | ||
| "geometry3k", | ||
| "visgym", | ||
| "quantized_rollouts", | ||
| "mini_swe_agent", | ||
| "openenv" | ||
| ] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| --- | ||
| title: "Training with Quantized Rollouts" | ||
| --- | ||
|
|
||
| <Callout type="info"> | ||
| Quantized rollouts are supported with the `fsdp` and `megatron` backends. | ||
| </Callout> | ||
|
|
||
| In this example, we walk through how to train a model with quantized (FP8) rollouts using SkyRL. Quantizing the rollout weights speeds up generation for large models while keeping the trainer in full precision. | ||
|
|
||
| ## Overview | ||
|
|
||
| During RL training, rollouts (generation) are typically the throughput bottleneck. Running the inference engine with quantized weights (e.g. FP8) can significantly speed up generation for larger models. SkyRL supports this directly through vLLM — there is no separate patched engine or extra to install. | ||
|
|
||
| There are two pieces to make quantized rollouts work well: | ||
|
|
||
| 1. **Quantized generation** — we ask vLLM to load and serve the policy weights in a quantized format. This is a pass-through option to the vLLM engine, so no SkyRL-specific integration is required. | ||
| 2. **Off-policy correction** — quantizing the rollout weights widens the gap between the rollout (inference) distribution and the training distribution. We correct for this mismatch with [Truncated Importance Sampling (TIS)](../algorithms/off_policy_correction), which applies an importance-sampling correction to the policy loss. | ||
|
|
||
| ### How does it work? | ||
|
|
||
| We sample generations from the inference engine with quantized weights. We then compute advantages and the policy loss, applying the TIS correction factor to account for the difference between the rollout and training probability distributions. On each weight update, weights are synced to the inference engine layer by layer in half precision (bfloat16); vLLM then quantizes them to the target format before serving. | ||
|
|
||
| ## Enabling quantized generation | ||
|
|
||
| Quantization is enabled by passing it through to the vLLM engine via `engine_init_kwargs`: | ||
|
|
||
| ```bash | ||
| generator.inference_engine.engine_init_kwargs.quantization=fp8 | ||
| ``` | ||
|
|
||
| This uses vLLM's [online dynamic FP8 quantization](https://docs.vllm.ai/en/latest/features/quantization/fp8.html), so no calibration data or pre-quantized checkpoint is required. | ||
|
|
||
| ## Enabling off-policy correction (TIS) | ||
|
|
||
| To apply TIS, we need the inference engine to return the rollout logprobs for the generated tokens, and we configure the correction on the policy loss: | ||
|
|
||
| ```bash | ||
| # return rollout logprobs for the generated tokens (required for TIS) | ||
| generator.sampling_params.logprobs=0 \ | ||
| # apply sequence-level TIS with an importance-ratio clip | ||
| trainer.algorithm.off_policy_correction.tis_ratio_type=sequence \ | ||
| trainer.algorithm.off_policy_correction.sequence_tis_ratio_clip_high=4.0 | ||
| ``` | ||
|
|
||
| TIS can be applied at the `token` or `sequence` level. See the [Off-Policy Correction guide](../algorithms/off_policy_correction) for a full discussion of the available corrections and recommended settings. | ||
|
|
||
| ## Example | ||
|
|
||
| We provide a complete example that trains `Qwen2.5-Coder-7B-Instruct` on the SkyRL-SQL dataset with FP8 rollouts and a full-precision trainer at `examples/train/text_to_sql/run_skyrl_sql_fp8.sh`. | ||
|
|
||
| The key parameters are: | ||
|
|
||
| ```bash title="examples/train/text_to_sql/run_skyrl_sql_fp8.sh" | ||
| # TIS parameters | ||
| TIS_IMP_RATIO_CAP=4.0 | ||
| TIS_TYPE=sequence | ||
| # returns rollout logprobs for the generated tokens; required for TIS | ||
| LOGPROBS=0 | ||
|
|
||
| uv run --isolated --extra fsdp -m skyrl.train.entrypoints.main_base \ | ||
| trainer.algorithm.off_policy_correction.tis_ratio_type=$TIS_TYPE \ | ||
| trainer.algorithm.off_policy_correction.sequence_tis_ratio_clip_high=$TIS_IMP_RATIO_CAP \ | ||
| generator.sampling_params.logprobs=$LOGPROBS \ | ||
| generator.inference_engine.backend=vllm \ | ||
| generator.inference_engine.engine_init_kwargs.quantization=fp8 \ | ||
| ... | ||
| ``` | ||
|
|
||
| To run it (from the SkyRL root directory): | ||
|
|
||
| ```bash | ||
| hf download NovaSky-AI/SkyRL-SQL-653-data-newfmt --local-dir $HOME/data/sql --repo-type dataset | ||
| export WANDB_API_KEY=<your_key_here> | ||
| bash examples/train/text_to_sql/run_skyrl_sql_fp8.sh | ||
| ``` | ||
|
|
||
| <Callout type="warn"> | ||
| Quantized rollouts are most beneficial for larger models, where generation dominates step time. For smaller models the overhead of quantizing weights during each weight sync can outweigh the generation speedup. We recommend benchmarking on your own model and hardware. | ||
| </Callout> | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.