Skip to content

Commit 83062e2

Browse files
svcnvidia-nemo-ciadil-aclaudethomasdhcakoumpa
authored
cp: test: add vLLM deployment tests into r0.4.0 (#1745)
test: add vLLM deployment tests for checkpoint robustness (#1656) * test: add vLLM deployment tests for checkpoint robustness vLLM deployment verification tests that load consolidated checkpoints and compare greedy output token-for-token against HuggingFace. Supports both full comparison and smoke test mode. Depends on checkpoint robustness PR #1606. * Create deploy-test dependency group * Revert deploy test group * Move configs to recipes and create vllm_launcher * Setup deploy environment * Remove duplicate keys * Add scope to vllm deploy test * Drop needs dependency * Use finetune test name for ckpt dir * Make ckpt checking more robust * Pass arguments correctly * Update arguments * Remove unused file --------- Signed-off-by: adil-a <adil.asif2000@hotmail.com> Signed-off-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Adil <47084919+adil-a@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Dong Hyuk Chang <9426164+thomasdhc@users.noreply.github.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
1 parent d81f478 commit 83062e2

28 files changed

Lines changed: 371 additions & 16 deletions

examples/llm_finetune/baichuan/baichuan_2_7b_squad.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,9 @@ lr_scheduler:
100100
min_lr: 1.0e-6
101101

102102
ci:
103-
recipe_owner: adil-a
104103
time: "00:45:00"
104+
vllm_deploy: true
105+
recipe_owner: adil-a
105106
checkpoint_robustness:
106107
hf_kl_threshold: 5e-3
107108
distributed.tp_size: 2

examples/llm_finetune/baichuan/baichuan_2_7b_squad_peft.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,8 +117,9 @@ lr_scheduler:
117117
min_lr: 1.0e-6
118118

119119
ci:
120-
recipe_owner: adil-a
121120
time: "00:45:00"
121+
vllm_deploy: true
122+
recipe_owner: adil-a
122123
checkpoint_robustness:
123124
hf_kl_threshold: 5e-3
124125
trust_remote_code: true

examples/llm_finetune/gemma/gemma_3_270m_squad.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ optimizer:
9292
# min_lr: 1.0e-5
9393

9494
ci:
95+
vllm_deploy: true
9596
recipe_owner: HuiyingLi
9697
time: "00:20:00"
9798
checkpoint_robustness:

examples/llm_finetune/gemma/gemma_3_270m_squad_peft.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ optimizer:
9999
# min_lr: 1.0e-5
100100

101101
ci:
102+
vllm_deploy: true
102103
recipe_owner: HuiyingLi
103104
time: "00:20:00"
104105
checkpoint_robustness:

examples/llm_finetune/gpt_oss/gpt_oss_20b.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,8 @@ ci:
120120
recipe_owner: hemildesai
121121
time: "00:15:00"
122122
node_multiplier: true
123+
vllm_deploy: true
124+
vllm_smoke_test: true
123125
checkpoint_robustness:
124126
hf_kl_threshold: 5e-2
125127
tokenizer_name: openai/gpt-oss-20b

examples/llm_finetune/gpt_oss/gpt_oss_20b_peft.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,8 @@ optimizer:
115115
# min_lr: 1.0e-5
116116

117117
ci:
118+
vllm_deploy: true
119+
vllm_smoke_test: true
118120
recipe_owner: akoumpa
119121
time: "00:15:00"
120122
checkpoint_robustness:

examples/llm_finetune/llama3_2/llama_3_2_3b_instruct_squad.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -94,12 +94,13 @@ optimizer:
9494
weight_decay: 0
9595
# min_lr: 1.0e-5
9696

97+
ci:
98+
vllm_deploy: true
99+
recipe_owner: akoumpa
100+
97101
# Uncomment and configure for W&B logging
98102
# wandb:
99103
# project: <your_wandb_project>
100104
# entity: <your_wandb_entity>
101105
# name: <your_wandb_exp_name>
102106
# save_dir: <your_wandb_save_dir>
103-
104-
ci:
105-
recipe_owner: akoumpa

examples/llm_finetune/llama3_2/llama_3_2_3b_instruct_squad_peft.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,12 +100,13 @@ optimizer:
100100
weight_decay: 0
101101
# min_lr: 1.0e-5
102102

103+
ci:
104+
recipe_owner: akoumpa
105+
vllm_deploy: true
106+
103107
# Uncomment and configure for W&B logging
104108
# wandb:
105109
# project: <your_wandb_project>
106110
# entity: <your_wandb_entity>
107111
# name: <your_wandb_exp_name>
108112
# save_dir: <your_wandb_save_dir>
109-
110-
ci:
111-
recipe_owner: akoumpa

examples/llm_finetune/nemotron/llama3_3_nemotron_super_49B_squad.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ ci:
119119
recipe_owner: akoumpa
120120
nodes: 2
121121
time: "00:45:00"
122+
vllm_deploy: true
122123
checkpoint_robustness:
123124
hf_kl_threshold: 5e-3
124125
distributed.tp_size: 8

examples/llm_finetune/nemotron/llama3_3_nemotron_super_49B_squad_peft.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,9 @@ lr_scheduler:
108108
min_lr: 1.0e-6
109109

110110
ci:
111-
recipe_owner: HuiyingLi
112111
time: "00:45:00"
112+
vllm_deploy: true
113+
recipe_owner: HuiyingLi
113114
checkpoint_robustness:
114115
hf_kl_threshold: 5e-3
115116
trust_remote_code: true

0 commit comments

Comments
 (0)