-
Notifications
You must be signed in to change notification settings - Fork 537
feat: add vLLM-Omni EC2 and SageMaker DLC images #5868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from 62 commits
Commits
Show all changes
65 commits
Select commit
Hold shift + click to select a range
bdb2184
feat: add vLLM-Omni EC2 and SageMaker DLC images
9ab46fc
fix: use AL2023-compatible packages for omni system deps
b8de9c1
fix: only install ffmpeg static binary for omni deps
4567aa2
fix: use SPAL repo for espeak-ng, sox, ffmpeg on AL2023
5e7b23e
fix: use --region instead of --aws-region for pytest
ab2ac24
fix: add sagemaker SDK dep and match existing test pattern
0de9f97
fix: increase stage init timeout for omni model tests
ce54d97
fix: use download-model action for model downloads
6075e81
fix: patch CVE-2026-28414 gradio path traversal in omni image
26db368
fix: use .tar.gz model tarballs for download-model action compatibility
a85d641
fix: use /v1/audio/speech API for TTS smoke test
58309b7
fix: use HuggingFace model IDs directly instead of S3 tarballs
325d917
fix: validate diffusion response without printing full base64 image
aa40386
fix: use ml.g4dn.xlarge for TTS endpoint test (cheaper, 1.7B fits in …
da26690
fix: remove redundant --enforce-eager (vllm-omni enforces it internally)
9c18b3a
fix: use customer-type from config to select smoke test script
7322dce
fix lmiv22 yml and add lmiv23 (#5869)
smouaa c848b67
fix telemetry ingress rules (#5871)
sirutBuasai 871877f
Migrate Xgboost Container Tests to DLC repo (#5860)
Jyothirmaikottu b655007
fix: use download-model action and /models/ path for omni smoke tests
99ceaac
Merge branch 'main' into omni
Yadan-Wei 99628cb
ci: trigger pipeline
2723e31
Merge branch 'main' into omni
Yadan-Wei 02d7291
ci: re-trigger after flux2 model tarball fix
a80c193
fix: SM endpoint test validates deployment only (TTS uses /v1/audio/s…
fd63eba
Revert "fix: SM endpoint test validates deployment only (TTS uses /v1…
8d55aa3
ci: Disable all non-omni PR workflows
3dcc0e9
feat: add SageMaker serve proxy to route /invocations to correct vllm…
2f891a8
feat: SageMaker routing middleware, real entrypoint smoke tests, unit…
6f6421f
feat: pre-built runtime base to skip vLLM compile in PR builds
eb8e6b7
feat: per-model test config with route/request/validate, g5 for endpo…
0fc2d3b
fix: increase SageMaker invoke timeout to 300s for TTS cold start
b48b7a7
fix: retry invoke on timeout instead of unsupported InvocationTimeout…
85772d6
fix: add --port 8080 to EC2 container start (vllm defaults to 8000)
793b823
ci: re-trigger after pre-commit fix
aa2e4fb
Merge branch 'main' into omni
Yadan-Wei 7d8e128
fix format
2cd3eb4
fix: add 30s sleep between retries for torch.compile warmup
7fd7e01
feat: move unit test to test/vllm-omni/sagemaker/, add async endpoint…
4f0e254
fix: run unit test from sagemaker dir to avoid test/__init__.py import
1e459cd
fix: use default-runner for unit test (has test_utils and starlette)
a02f2ca
fix: install test deps and set PYTHONPATH for unit test (matches sani…
9589e12
fix: add starlette to unit test deps (not in test/requirements.txt)
38252ef
feat: add 4 new models (CosyVoice3, Qwen2.5-Omni, BAGEL, Wan2.1), HF …
68e8c6e
fix: revert to S3-cached models only, new HF models need validation f…
6bf7f8e
Merge branch 'main' into omni
Yadan-Wei 86beb26
Merge branch 'main' into omni
Yadan-Wei 1162afd
feat: add CosyVoice3-0.5B and Qwen2.5-Omni-3B smoke tests (S3 cached)
ee5c415
fix: bump new models to g6exl (more RAM), add container log dump on f…
862688d
fix: revert to Qwen3-TTS and FLUX.2 only
4f45282
feat: add CosyVoice3, Wan2.1, BAGEL, Qwen2.5-Omni smoke tests
9605ed9
change instance type
e36fc3a
fix: use absolute path for cosyvoice3 stage config in DLC container
f3a716b
fix path
c7a8a1c
feat: add 4 new models, form data support, endpoint cleanup, more logs
1b03a34
fix: remove CosyVoice3 - transformers doesn't recognize cosyvoice3 mo…
5109c99
fix: use bash array for curl form data to preserve header quoting
b154175
fix: Wan2.1 use /v1/videos (async), /v1/videos/sync not in v0.18.0
cd46502
fix: Wan2.1 validate json_field:id (async API returns JSON, not binary)
b1d1eac
enable all models
0a4e745
Merge branch 'main' into omni
Yadan-Wei 70f032f
Merge branch 'main' into omni
junpuf 98f5f93
Revert "ci: Disable all non-omni PR workflows"
e0e54da
fix: remove CVE-2026-33055 allowlist entry (fixed in uv tar crate 0.4…
00bb406
fix: patch aiohttp CVEs in sglang and vllm Dockerfiles
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| # vLLM-Omni EC2 AL2023 Image Configuration | ||
|
|
||
| image: | ||
| name: "vllm-omni-ec2-amzn2023" | ||
| description: "vLLM-Omni for EC2 instances (AL2023, omni-modality serving)" | ||
|
|
||
| common: | ||
| framework: "vllm-omni" | ||
| framework_version: "0.18.0" | ||
| job_type: "general" | ||
| python_version: "py312" | ||
| cuda_version: "cu129" | ||
| os_version: "amzn2023" | ||
| customer_type: "ec2" | ||
| arch_type: "x86" | ||
| prod_image: "vllm-omni:0.18-gpu-py312-ec2" | ||
| device_type: "gpu" | ||
| contributor: "None" | ||
|
|
||
| release: | ||
| release: false | ||
| force_release: false | ||
| public_registry: false | ||
| private_registry: true | ||
| enable_soci: true | ||
| environment: production | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| # vLLM-Omni Model Test Configuration | ||
| # Tests for omni-modality models (TTS, image generation, video, omni-chat) | ||
| # | ||
| # Each model defines its test_request (sent to /invocations via middleware) | ||
| # and the route for the SageMaker routing middleware. | ||
| # | ||
| # Models use s3_model (pre-cached in S3) downloaded by the download-model action. | ||
|
|
||
| s3_prefix: "s3://dlc-cicd-models/omni-models" | ||
|
|
||
| smoke-test: | ||
| codebuild-fleet: | ||
| # --- TTS models (route: /v1/audio/speech) --- | ||
| - name: "qwen3-tts-1.7b-customvoice" | ||
| s3_model: "qwen3-tts-1.7b-customvoice.tar.gz" | ||
| fleet: "x86-g6xl-runner" | ||
| extra_args: "" | ||
| route: "/v1/audio/speech" | ||
| test_request: '{"input": "Hello, how are you?", "voice": "vivian", "language": "English"}' | ||
| validate: "binary_size_gt:1000" | ||
|
|
||
| # --- Image generation models (route: /v1/images/generations) --- | ||
| - name: "flux2-klein-4b" | ||
| s3_model: "flux2-klein-4b.tar.gz" | ||
| fleet: "x86-g6xl-runner" | ||
| extra_args: "" | ||
| route: "/v1/images/generations" | ||
| test_request: '{"prompt": "a red apple on a white table", "size": "512x512", "n": 1}' | ||
| validate: "json_field:data[0].b64_json" | ||
|
|
||
| # --- Video generation models (route: /v1/videos) --- | ||
| - name: "wan2.1-t2v-1.3b" | ||
| s3_model: "wan2.1-t2v-1.3b.tar.gz" | ||
| fleet: "x86-g6exl-runner" | ||
| extra_args: "" | ||
| route: "/v1/videos" | ||
| content_type: "multipart/form-data" | ||
| test_request: 'prompt=a dog running on a beach&num_frames=17&num_inference_steps=4&size=480x320&seed=42' | ||
| validate: "json_field:id" | ||
|
|
||
| # --- Omni chat models (route: /v1/chat/completions, fallthrough) --- | ||
| # model is big, won't run for now | ||
| # - name: "bagel-7b-mot" | ||
| # s3_model: "bagel-7b-mot.tar.gz" | ||
| # fleet: "x86-g6e4xl-runner" | ||
| # extra_args: "" | ||
| # route: "/v1/chat/completions" | ||
| # test_request: '{"messages": [{"role": "user", "content": [{"type": "text", "text": "<|im_start|>A cute cat<|im_end|>"}]}], "modalities": ["image"], "height": 512, "width": 512, "num_inference_steps": 4, "seed": 42}' | ||
| # validate: "json_field:choices[0].message.content" | ||
|
|
||
| - name: "qwen2.5-omni-3b" | ||
| s3_model: "qwen2.5-omni-3b.tar.gz" | ||
| fleet: "x86-g6e12xl-runner" | ||
| extra_args: "" | ||
| route: "/v1/chat/completions" | ||
| test_request: '{"messages": [{"role": "user", "content": "Say hello in one sentence."}], "max_tokens": 64}' | ||
| validate: "json_field:choices[0].message.content" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| # vLLM-Omni SageMaker AL2023 Image Configuration | ||
|
|
||
| image: | ||
| name: "vllm-omni-sagemaker-amzn2023" | ||
| description: "vLLM-Omni for SageMaker (AL2023, omni-modality serving)" | ||
|
|
||
| common: | ||
| framework: "vllm-omni" | ||
| framework_version: "0.18.0" | ||
| job_type: "general" | ||
| python_version: "py312" | ||
| cuda_version: "cu129" | ||
| os_version: "amzn2023" | ||
| customer_type: "sagemaker" | ||
| arch_type: "x86" | ||
| prod_image: "vllm-omni:0.18-gpu-py312-sagemaker" | ||
| device_type: "gpu" | ||
| contributor: "None" | ||
|
|
||
| release: | ||
| release: false | ||
| force_release: false | ||
| public_registry: false | ||
| private_registry: true | ||
| enable_soci: true | ||
| environment: production |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll use the save repo name "vllm" instead of creating new repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update this section when we have a real prod image.