Skip to content

Commit e781a45

Browse files
authored
feat: add vLLM-Omni EC2 and SageMaker DLC images (#5868)
* feat: add vLLM-Omni EC2 and SageMaker DLC images - Add omni-deps, builder-oss-omni, omni-base, ec2, sagemaker stages to Dockerfile.amzn2023 - Install vllm-omni as pure Python layer on top of vLLM runtime - Add omni entrypoints (vllm serve --omni) for EC2 and SageMaker - Add PR workflows for both EC2 and SageMaker omni images - Add reusable model smoke tests (Qwen3-TTS, FLUX.2-klein-4B) - Add SageMaker endpoint integration test with Qwen3-TTS - System deps: espeak-ng, ffmpeg, sox, libsox-fmt-all for audio/TTS - OSS compliance runs against omni venv separately Signed-off-by: Yadan Wei <yadanwei@amazon.com> * fix: use AL2023-compatible packages for omni system deps - espeak (not espeak-ng) available in AL2023 repos - sox available in AL2023 repos - ffmpeg installed from static build (not in AL2023 repos) - Removed libsox-fmt-all (not available on AL2023) Signed-off-by: Yadan Wei <yadanwei@amazon.com> * fix: only install ffmpeg static binary for omni deps - espeak/sox not available in AL2023 minimal CUDA runtime image - sox binary only needed for Qwen3-TTS 25Hz tokenizer (not 12Hz) - ffmpeg needed by pydub/imageio-ffmpeg for audio/video I/O - Removed dnf install for unavailable packages Signed-off-by: Yadan Wei <yadanwei@amazon.com> * fix: use SPAL repo for espeak-ng, sox, ffmpeg on AL2023 - Upgrade system-release to latest to enable SPAL (requires 2023.9+) - Install espeak-ng, sox, ffmpeg-free from SPAL (Supplementary Packages for Amazon Linux) - Replaces static binary approach with official AL2023 package repo Signed-off-by: Yadan Wei <yadanwei@amazon.com> * fix: use --region instead of --aws-region for pytest Signed-off-by: Yadan Wei <yadanwei@amazon.com> * fix: add sagemaker SDK dep and match existing test pattern - Add test/vllm-omni/sagemaker/requirements.txt with sagemaker>=2,<3 - Install test deps via uv pip matching reusable-vllm-sagemaker-tests pattern - Run pytest from test/ directory with relative path Signed-off-by: Yadan Wei <yadanwei@amazon.com> * fix: increase stage init timeout for omni model tests - Add --stage-init-timeout 600 to server start (TTS models need multi-stage init) - Add stage_init_timeout=600 to offline Omni() calls - Increase server wait loop from 120s to 300s Signed-off-by: Yadan Wei <yadanwei@amazon.com> * fix: use download-model action for model downloads --------- Signed-off-by: Yadan Wei <yadanwei@amazon.com>
1 parent 9afe7fb commit e781a45

20 files changed

Lines changed: 1640 additions & 9 deletions

.github/actions/build-image/action.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,10 @@ inputs:
6969
description: 'Transformers library version (e.g., 4.28.1)'
7070
required: false
7171
default: ''
72+
runtime-base:
73+
description: 'Pre-built runtime base image URI. When set, skips compile stages.'
74+
required: false
75+
default: ''
7276

7377
outputs:
7478
image-uri:
@@ -120,3 +124,4 @@ runs:
120124
INFERENCE_TOOLKIT_VERSION: ${{ inputs.inference-toolkit-version }}
121125
TORCHSERVE_VERSION: ${{ inputs.torchserve-version }}
122126
TRANSFORMERS_VERSION: ${{ inputs.transformers-version }}
127+
RUNTIME_BASE: ${{ inputs.runtime-base }}
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# vLLM-Omni EC2 AL2023 Image Configuration
2+
3+
image:
4+
name: "vllm-omni-ec2-amzn2023"
5+
description: "vLLM-Omni for EC2 instances (AL2023, omni-modality serving)"
6+
7+
common:
8+
framework: "vllm-omni"
9+
framework_version: "0.18.0"
10+
job_type: "general"
11+
python_version: "py312"
12+
cuda_version: "cu129"
13+
os_version: "amzn2023"
14+
customer_type: "ec2"
15+
arch_type: "x86"
16+
prod_image: "vllm-omni:0.18-gpu-py312-ec2"
17+
device_type: "gpu"
18+
contributor: "None"
19+
20+
release:
21+
release: false
22+
force_release: false
23+
public_registry: false
24+
private_registry: true
25+
enable_soci: true
26+
environment: production
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# vLLM-Omni Model Test Configuration
2+
# Tests for omni-modality models (TTS, image generation, video, omni-chat)
3+
#
4+
# Each model defines its test_request (sent to /invocations via middleware)
5+
# and the route for the SageMaker routing middleware.
6+
#
7+
# Models use s3_model (pre-cached in S3) downloaded by the download-model action.
8+
9+
s3_prefix: "s3://dlc-cicd-models/omni-models"
10+
11+
smoke-test:
12+
codebuild-fleet:
13+
# --- TTS models (route: /v1/audio/speech) ---
14+
- name: "qwen3-tts-1.7b-customvoice"
15+
s3_model: "qwen3-tts-1.7b-customvoice.tar.gz"
16+
fleet: "x86-g6xl-runner"
17+
extra_args: ""
18+
route: "/v1/audio/speech"
19+
test_request: '{"input": "Hello, how are you?", "voice": "vivian", "language": "English"}'
20+
validate: "binary_size_gt:1000"
21+
22+
# --- Image generation models (route: /v1/images/generations) ---
23+
- name: "flux2-klein-4b"
24+
s3_model: "flux2-klein-4b.tar.gz"
25+
fleet: "x86-g6xl-runner"
26+
extra_args: ""
27+
route: "/v1/images/generations"
28+
test_request: '{"prompt": "a red apple on a white table", "size": "512x512", "n": 1}'
29+
validate: "json_field:data[0].b64_json"
30+
31+
# --- Video generation models (route: /v1/videos) ---
32+
- name: "wan2.1-t2v-1.3b"
33+
s3_model: "wan2.1-t2v-1.3b.tar.gz"
34+
fleet: "x86-g6exl-runner"
35+
extra_args: ""
36+
route: "/v1/videos"
37+
content_type: "multipart/form-data"
38+
test_request: 'prompt=a dog running on a beach&num_frames=17&num_inference_steps=4&size=480x320&seed=42'
39+
validate: "json_field:id"
40+
41+
# --- Omni chat models (route: /v1/chat/completions, fallthrough) ---
42+
# model is big, won't run for now
43+
# - name: "bagel-7b-mot"
44+
# s3_model: "bagel-7b-mot.tar.gz"
45+
# fleet: "x86-g6e4xl-runner"
46+
# extra_args: ""
47+
# route: "/v1/chat/completions"
48+
# test_request: '{"messages": [{"role": "user", "content": [{"type": "text", "text": "<|im_start|>A cute cat<|im_end|>"}]}], "modalities": ["image"], "height": 512, "width": 512, "num_inference_steps": 4, "seed": 42}'
49+
# validate: "json_field:choices[0].message.content"
50+
51+
- name: "qwen2.5-omni-3b"
52+
s3_model: "qwen2.5-omni-3b.tar.gz"
53+
fleet: "x86-g6e12xl-runner"
54+
extra_args: ""
55+
route: "/v1/chat/completions"
56+
test_request: '{"messages": [{"role": "user", "content": "Say hello in one sentence."}], "max_tokens": 64}'
57+
validate: "json_field:choices[0].message.content"
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# vLLM-Omni SageMaker AL2023 Image Configuration
2+
3+
image:
4+
name: "vllm-omni-sagemaker-amzn2023"
5+
description: "vLLM-Omni for SageMaker (AL2023, omni-modality serving)"
6+
7+
common:
8+
framework: "vllm-omni"
9+
framework_version: "0.18.0"
10+
job_type: "general"
11+
python_version: "py312"
12+
cuda_version: "cu129"
13+
os_version: "amzn2023"
14+
customer_type: "sagemaker"
15+
arch_type: "x86"
16+
prod_image: "vllm-omni:0.18-gpu-py312-sagemaker"
17+
device_type: "gpu"
18+
contributor: "None"
19+
20+
release:
21+
release: false
22+
force_release: false
23+
public_registry: false
24+
private_registry: true
25+
enable_soci: true
26+
environment: production

.github/scripts/build_image.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ CUSTOMER_TYPE="${CUSTOMER_TYPE:-}"
2626
INFERENCE_TOOLKIT_VERSION="${INFERENCE_TOOLKIT_VERSION:-}"
2727
TORCHSERVE_VERSION="${TORCHSERVE_VERSION:-}"
2828
TRANSFORMERS_VERSION="${TRANSFORMERS_VERSION:-}"
29+
RUNTIME_BASE="${RUNTIME_BASE:-}"
2930

3031
# Resolve image URI
3132
CI_IMAGE_URI="${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/ci:${TAG_PR}"
@@ -67,6 +68,13 @@ BUILD_CMD="docker buildx build --progress plain \
6768
--build-arg FRAMEWORK=\"${FRAMEWORK}\" \
6869
--build-arg FRAMEWORK_VERSION=\"${FRAMEWORK_VERSION}\""
6970

71+
# Use pre-built runtime base if available (skips compile stages)
72+
if [[ -n "${RUNTIME_BASE}" ]]; then
73+
echo "Using pre-built runtime base: ${RUNTIME_BASE}"
74+
BUILD_CMD="${BUILD_CMD} \
75+
--build-arg RUNTIME_BASE=\"${RUNTIME_BASE}\""
76+
fi
77+
7078
# Add SageMaker labels if customer-type is 'sagemaker'
7179
if [[ "${CUSTOMER_TYPE}" == "sagemaker" ]]; then
7280
BUILD_CMD="${BUILD_CMD} \

0 commit comments

Comments
 (0)