Skip to content

Commit e90f18c

Browse files
authored
[model] feat: support ByteDance Seed-OSS 36B model (verl-project#3347)
### What does this PR do? support ByteDance Seed-OSS 36B model: 1. add RL and SFT example 2. support mfu metrics Requirement: pip install transformers>=4.56.0 Notes: vllm v0.10.0 does not support Seed-OSS, but can fail back to transformers to get it working. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test (TaskRunner pid=373084) step:2 - global_seqlen/min:6260 - global_seqlen/max:11318 - global_seqlen/minmax_diff:5058 - global_seqlen/balanced_min:8466 - global_seqlen/balanced_max:8468 - global_seqlen/mean:8467.375 - actor/entropy:0.47251570224761963 - actor/kl_loss:0.03297248564194888 - actor/kl_coef:0.001 - actor/pg_loss:-0.0494408356025815 - actor/pg_clipfrac:0.019900403218343854 - actor/ppo_kl:0.020935473148711026 - actor/pg_clipfrac_lower:9.349289757665247e-05 - actor/grad_norm:0.47875913605093956 - perf/mfu/actor:0.2823303751694612 - perf/max_memory_allocated_gb:134.74115753173828 - perf/max_memory_reserved_gb:141.615234375 - perf/cpu_memory_used_gb:150.75712203979492 - actor/lr:1e-06 - training/global_step:2 - training/epoch:0 - critic/score/mean:0.3515625 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.3515625 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.023741308599710464 - critic/advantages/max:0.7071057558059692 - critic/advantages/min:-0.7071057558059692 - critic/returns/mean:-0.023741308599710464 - critic/returns/max:0.7071057558059692 - critic/returns/min:-0.7071057558059692 - response_length/mean:444.4296875 - response_length/max:1024.0 - response_length/min:50.0 - response_length/clip_ratio:0.140625 - response_length_non_aborted/mean:444.4296875 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:50.0 - response_length_non_aborted/clip_ratio:0.140625 - response/aborted_ratio:0.0 - prompt_length/mean:84.78125 - prompt_length/max:141.0 - prompt_length/min:54.0 - prompt_length/clip_ratio:0.0 - timing_s/start_profile:6.250300793908536e-05 - timing_s/generate_sequences:21.979598999023438 - timing_s/generation_timing/max:22.295286178588867 - timing_s/generation_timing/min:21.753456115722656 - timing_s/generation_timing/topk_ratio:0.125 - timing_s/gen:39.58543623800506 - timing_s/reward:0.031087818002561107 - timing_s/old_log_prob:17.46088112698635 - timing_s/ref:5.804751824995037 - timing_s/adv:0.003937039989978075 - timing_s/update_actor:57.383965655986685 - timing_s/step:120.27422251200187 - timing_s/stop_profile:6.923600449226797e-05 - timing_per_token_ms/gen:0.6958608511260053 - timing_per_token_ms/ref:0.08569290696637147 - timing_per_token_ms/adv:5.8120727940744256e-05 - timing_per_token_ms/update_actor:0.8471333449857052 - perf/total_num_tokens:67739 - perf/time_per_step:120.27422251200187 - perf/throughput:70.40057980133741 ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
1 parent 72e88ec commit e90f18c

3 files changed

Lines changed: 81 additions & 0 deletions

File tree

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
set -x
2+
3+
python3 -m verl.trainer.main_ppo \
4+
algorithm.adv_estimator=grpo \
5+
data.train_files=$HOME/data/gsm8k/train.parquet \
6+
data.val_files=$HOME/data/gsm8k/test.parquet \
7+
data.train_batch_size=64 \
8+
data.max_prompt_length=512 \
9+
data.max_response_length=1024 \
10+
data.filter_overlong_prompts=True \
11+
data.truncation='error' \
12+
actor_rollout_ref.model.path=ByteDance-Seed/Seed-OSS-36B-Base \
13+
actor_rollout_ref.actor.optim.lr=1e-6 \
14+
actor_rollout_ref.model.use_remove_padding=True \
15+
actor_rollout_ref.model.enable_gradient_checkpointing=True \
16+
actor_rollout_ref.model.use_fused_kernels=True \
17+
actor_rollout_ref.actor.ppo_mini_batch_size=8 \
18+
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 \
19+
actor_rollout_ref.actor.use_kl_loss=True \
20+
actor_rollout_ref.actor.kl_loss_coef=0.001 \
21+
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
22+
actor_rollout_ref.actor.entropy_coeff=0 \
23+
actor_rollout_ref.actor.use_dynamic_bsz=True \
24+
actor_rollout_ref.actor.strategy=fsdp2 \
25+
actor_rollout_ref.rollout.log_prob_use_dynamic_bsz=True \
26+
actor_rollout_ref.actor.fsdp_config.param_offload=True \
27+
actor_rollout_ref.actor.fsdp_config.param_offload=True \
28+
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 \
29+
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
30+
actor_rollout_ref.rollout.name=vllm \
31+
actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
32+
actor_rollout_ref.rollout.n=2 \
33+
actor_rollout_ref.rollout.free_cache_engine=True \
34+
actor_rollout_ref.ref.log_prob_use_dynamic_bsz=True \
35+
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=2 \
36+
actor_rollout_ref.ref.fsdp_config.param_offload=True \
37+
actor_rollout_ref.ref.strategy=fsdp2 \
38+
algorithm.use_kl_in_reward=False \
39+
trainer.critic_warmup=0 \
40+
trainer.logger='["console"]' \
41+
trainer.project_name='verl_grpo_seed_oss_36b' \
42+
trainer.experiment_name='seed_oss_36b' \
43+
trainer.val_before_train=False \
44+
trainer.n_gpus_per_node=8 \
45+
trainer.nnodes=1 \
46+
trainer.save_freq=20 \
47+
trainer.test_freq=5 \
48+
trainer.total_epochs=15 $@
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
set -x
2+
3+
if [ "$#" -lt 2 ]; then
4+
echo "Usage: run_seed_oss_36b_sft.sh <nproc_per_node> <save_path> [other_configs...]"
5+
exit 1
6+
fi
7+
8+
nproc_per_node=$1
9+
save_path=$2
10+
11+
# Shift the arguments so $@ refers to the rest
12+
shift 2
13+
14+
torchrun --standalone --nnodes=1 --nproc_per_node=$nproc_per_node \
15+
-m verl.trainer.fsdp_sft_trainer \
16+
data.train_files=$HOME/data/gsm8k/train.parquet \
17+
data.val_files=$HOME/data/gsm8k/test.parquet \
18+
data.prompt_key=extra_info \
19+
data.response_key=extra_info \
20+
optim.lr=1e-4 \
21+
data.prompt_dict_keys=['question'] \
22+
+data.response_dict_keys=['answer'] \
23+
data.micro_batch_size=4 \
24+
model.partial_pretrain=ByteDance-Seed/Seed-OSS-36B-Base \
25+
trainer.default_local_dir=$save_path \
26+
trainer.project_name=gsm8k-sft \
27+
trainer.experiment_name=gsm8k-sft-seed-oss-36b \
28+
trainer.logger=console \
29+
trainer.total_training_steps=1 \
30+
ulysses_sequence_parallel_size=2 \
31+
use_remove_padding=true $@

verl/utils/flops_counter.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
"minicpmo",
3131
"mistral",
3232
"gemma3_text",
33+
"seed_oss",
3334
}
3435

3536

@@ -130,6 +131,7 @@ def __init__(self, config: PretrainedConfig):
130131
"minicpmo": self._estimate_qwen2_flops,
131132
"mistral": self._estimate_qwen2_flops,
132133
"gemma3_text": self._estimate_gemma3_flops,
134+
"seed_oss": self._estimate_qwen2_flops,
133135
}
134136
self.config = config
135137

0 commit comments

Comments
 (0)