Skip to content
Merged
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
bdb2184
feat: add vLLM-Omni EC2 and SageMaker DLC images
Apr 2, 2026
9ab46fc
fix: use AL2023-compatible packages for omni system deps
Apr 2, 2026
b8de9c1
fix: only install ffmpeg static binary for omni deps
Apr 2, 2026
4567aa2
fix: use SPAL repo for espeak-ng, sox, ffmpeg on AL2023
Apr 2, 2026
5e7b23e
fix: use --region instead of --aws-region for pytest
Apr 2, 2026
ab2ac24
fix: add sagemaker SDK dep and match existing test pattern
Apr 2, 2026
0de9f97
fix: increase stage init timeout for omni model tests
Apr 2, 2026
ce54d97
fix: use download-model action for model downloads
Apr 2, 2026
6075e81
fix: patch CVE-2026-28414 gradio path traversal in omni image
Apr 2, 2026
26db368
fix: use .tar.gz model tarballs for download-model action compatibility
Apr 2, 2026
a85d641
fix: use /v1/audio/speech API for TTS smoke test
Apr 2, 2026
58309b7
fix: use HuggingFace model IDs directly instead of S3 tarballs
Apr 2, 2026
325d917
fix: validate diffusion response without printing full base64 image
Apr 2, 2026
aa40386
fix: use ml.g4dn.xlarge for TTS endpoint test (cheaper, 1.7B fits in …
Apr 2, 2026
da26690
fix: remove redundant --enforce-eager (vllm-omni enforces it internally)
Apr 2, 2026
9c18b3a
fix: use customer-type from config to select smoke test script
Apr 2, 2026
7322dce
fix lmiv22 yml and add lmiv23 (#5869)
smouaa Apr 2, 2026
c848b67
fix telemetry ingress rules (#5871)
sirutBuasai Apr 2, 2026
871877f
Migrate Xgboost Container Tests to DLC repo (#5860)
Jyothirmaikottu Apr 2, 2026
b655007
fix: use download-model action and /models/ path for omni smoke tests
Apr 3, 2026
99ceaac
Merge branch 'main' into omni
Yadan-Wei Apr 3, 2026
99628cb
ci: trigger pipeline
Apr 3, 2026
2723e31
Merge branch 'main' into omni
Yadan-Wei Apr 3, 2026
02d7291
ci: re-trigger after flux2 model tarball fix
Apr 3, 2026
a80c193
fix: SM endpoint test validates deployment only (TTS uses /v1/audio/s…
Apr 3, 2026
fd63eba
Revert "fix: SM endpoint test validates deployment only (TTS uses /v1…
Apr 3, 2026
8d55aa3
ci: Disable all non-omni PR workflows
Apr 3, 2026
3dcc0e9
feat: add SageMaker serve proxy to route /invocations to correct vllm…
Apr 3, 2026
2f891a8
feat: SageMaker routing middleware, real entrypoint smoke tests, unit…
Apr 3, 2026
6f6421f
feat: pre-built runtime base to skip vLLM compile in PR builds
Apr 3, 2026
eb8e6b7
feat: per-model test config with route/request/validate, g5 for endpo…
Apr 3, 2026
0fc2d3b
fix: increase SageMaker invoke timeout to 300s for TTS cold start
Apr 4, 2026
b48b7a7
fix: retry invoke on timeout instead of unsupported InvocationTimeout…
Apr 4, 2026
85772d6
fix: add --port 8080 to EC2 container start (vllm defaults to 8000)
Apr 4, 2026
793b823
ci: re-trigger after pre-commit fix
Apr 4, 2026
aa2e4fb
Merge branch 'main' into omni
Yadan-Wei Apr 4, 2026
7d8e128
fix format
Apr 4, 2026
2cd3eb4
fix: add 30s sleep between retries for torch.compile warmup
Apr 4, 2026
7fd7e01
feat: move unit test to test/vllm-omni/sagemaker/, add async endpoint…
Apr 6, 2026
4f0e254
fix: run unit test from sagemaker dir to avoid test/__init__.py import
Apr 6, 2026
1e459cd
fix: use default-runner for unit test (has test_utils and starlette)
Apr 6, 2026
a02f2ca
fix: install test deps and set PYTHONPATH for unit test (matches sani…
Apr 6, 2026
9589e12
fix: add starlette to unit test deps (not in test/requirements.txt)
Apr 6, 2026
38252ef
feat: add 4 new models (CosyVoice3, Qwen2.5-Omni, BAGEL, Wan2.1), HF …
Apr 6, 2026
68e8c6e
fix: revert to S3-cached models only, new HF models need validation f…
Apr 6, 2026
6bf7f8e
Merge branch 'main' into omni
Yadan-Wei Apr 6, 2026
86beb26
Merge branch 'main' into omni
Yadan-Wei Apr 6, 2026
1162afd
feat: add CosyVoice3-0.5B and Qwen2.5-Omni-3B smoke tests (S3 cached)
Apr 6, 2026
ee5c415
fix: bump new models to g6exl (more RAM), add container log dump on f…
Apr 6, 2026
862688d
fix: revert to Qwen3-TTS and FLUX.2 only
Apr 6, 2026
4f45282
feat: add CosyVoice3, Wan2.1, BAGEL, Qwen2.5-Omni smoke tests
Apr 6, 2026
9605ed9
change instance type
Apr 6, 2026
e36fc3a
fix: use absolute path for cosyvoice3 stage config in DLC container
Apr 6, 2026
f3a716b
fix path
Apr 6, 2026
c7a8a1c
feat: add 4 new models, form data support, endpoint cleanup, more logs
Apr 6, 2026
1b03a34
fix: remove CosyVoice3 - transformers doesn't recognize cosyvoice3 mo…
Apr 6, 2026
5109c99
fix: use bash array for curl form data to preserve header quoting
Apr 6, 2026
b154175
fix: Wan2.1 use /v1/videos (async), /v1/videos/sync not in v0.18.0
Apr 6, 2026
cd46502
fix: Wan2.1 validate json_field:id (async API returns JSON, not binary)
Apr 7, 2026
b1d1eac
enable all models
Apr 7, 2026
0a4e745
Merge branch 'main' into omni
Yadan-Wei Apr 7, 2026
70f032f
Merge branch 'main' into omni
junpuf Apr 7, 2026
98f5f93
Revert "ci: Disable all non-omni PR workflows"
Apr 7, 2026
e0e54da
fix: remove CVE-2026-33055 allowlist entry (fixed in uv tar crate 0.4…
Apr 7, 2026
00bb406
fix: patch aiohttp CVEs in sglang and vllm Dockerfiles
Apr 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/actions/build-image/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,10 @@ inputs:
description: 'Transformers library version (e.g., 4.28.1)'
required: false
default: ''
runtime-base:
description: 'Pre-built runtime base image URI. When set, skips compile stages.'
required: false
default: ''

outputs:
image-uri:
Expand Down Expand Up @@ -120,3 +124,4 @@ runs:
INFERENCE_TOOLKIT_VERSION: ${{ inputs.inference-toolkit-version }}
TORCHSERVE_VERSION: ${{ inputs.torchserve-version }}
TRANSFORMERS_VERSION: ${{ inputs.transformers-version }}
RUNTIME_BASE: ${{ inputs.runtime-base }}
26 changes: 26 additions & 0 deletions .github/config/vllm-omni-ec2-amzn2023.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# vLLM-Omni EC2 AL2023 Image Configuration

image:
name: "vllm-omni-ec2-amzn2023"
description: "vLLM-Omni for EC2 instances (AL2023, omni-modality serving)"

common:
framework: "vllm-omni"
framework_version: "0.18.0"
job_type: "general"
python_version: "py312"
cuda_version: "cu129"
os_version: "amzn2023"
customer_type: "ec2"
arch_type: "x86"
prod_image: "vllm-omni:0.18-gpu-py312-ec2"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll use the save repo name "vllm" instead of creating new repo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update this section when we have a real prod image.

device_type: "gpu"
contributor: "None"

release:
release: false
force_release: false
public_registry: false
private_registry: true
enable_soci: true
environment: production
57 changes: 57 additions & 0 deletions .github/config/vllm-omni-model-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# vLLM-Omni Model Test Configuration
# Tests for omni-modality models (TTS, image generation, video, omni-chat)
#
# Each model defines its test_request (sent to /invocations via middleware)
# and the route for the SageMaker routing middleware.
#
# Models use s3_model (pre-cached in S3) downloaded by the download-model action.

s3_prefix: "s3://dlc-cicd-models/omni-models"

smoke-test:
codebuild-fleet:
# --- TTS models (route: /v1/audio/speech) ---
- name: "qwen3-tts-1.7b-customvoice"
s3_model: "qwen3-tts-1.7b-customvoice.tar.gz"
fleet: "x86-g6xl-runner"
extra_args: ""
route: "/v1/audio/speech"
test_request: '{"input": "Hello, how are you?", "voice": "vivian", "language": "English"}'
validate: "binary_size_gt:1000"

# --- Image generation models (route: /v1/images/generations) ---
- name: "flux2-klein-4b"
s3_model: "flux2-klein-4b.tar.gz"
fleet: "x86-g6xl-runner"
extra_args: ""
route: "/v1/images/generations"
test_request: '{"prompt": "a red apple on a white table", "size": "512x512", "n": 1}'
validate: "json_field:data[0].b64_json"

# --- Video generation models (route: /v1/videos) ---
- name: "wan2.1-t2v-1.3b"
s3_model: "wan2.1-t2v-1.3b.tar.gz"
fleet: "x86-g6exl-runner"
extra_args: ""
route: "/v1/videos"
content_type: "multipart/form-data"
test_request: 'prompt=a dog running on a beach&num_frames=17&num_inference_steps=4&size=480x320&seed=42'
validate: "json_field:id"

# --- Omni chat models (route: /v1/chat/completions, fallthrough) ---
# model is big, won't run for now
# - name: "bagel-7b-mot"
# s3_model: "bagel-7b-mot.tar.gz"
# fleet: "x86-g6e4xl-runner"
# extra_args: ""
# route: "/v1/chat/completions"
# test_request: '{"messages": [{"role": "user", "content": [{"type": "text", "text": "<|im_start|>A cute cat<|im_end|>"}]}], "modalities": ["image"], "height": 512, "width": 512, "num_inference_steps": 4, "seed": 42}'
# validate: "json_field:choices[0].message.content"

- name: "qwen2.5-omni-3b"
s3_model: "qwen2.5-omni-3b.tar.gz"
fleet: "x86-g6e12xl-runner"
extra_args: ""
route: "/v1/chat/completions"
test_request: '{"messages": [{"role": "user", "content": "Say hello in one sentence."}], "max_tokens": 64}'
validate: "json_field:choices[0].message.content"
26 changes: 26 additions & 0 deletions .github/config/vllm-omni-sagemaker-amzn2023.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# vLLM-Omni SageMaker AL2023 Image Configuration

image:
name: "vllm-omni-sagemaker-amzn2023"
description: "vLLM-Omni for SageMaker (AL2023, omni-modality serving)"

common:
framework: "vllm-omni"
framework_version: "0.18.0"
job_type: "general"
python_version: "py312"
cuda_version: "cu129"
os_version: "amzn2023"
customer_type: "sagemaker"
arch_type: "x86"
prod_image: "vllm-omni:0.18-gpu-py312-sagemaker"
device_type: "gpu"
contributor: "None"

release:
release: false
force_release: false
public_registry: false
private_registry: true
enable_soci: true
environment: production
8 changes: 8 additions & 0 deletions .github/scripts/build_image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ CUSTOMER_TYPE="${CUSTOMER_TYPE:-}"
INFERENCE_TOOLKIT_VERSION="${INFERENCE_TOOLKIT_VERSION:-}"
TORCHSERVE_VERSION="${TORCHSERVE_VERSION:-}"
TRANSFORMERS_VERSION="${TRANSFORMERS_VERSION:-}"
RUNTIME_BASE="${RUNTIME_BASE:-}"

# Resolve image URI
CI_IMAGE_URI="${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/ci:${TAG_PR}"
Expand Down Expand Up @@ -67,6 +68,13 @@ BUILD_CMD="docker buildx build --progress plain \
--build-arg FRAMEWORK=\"${FRAMEWORK}\" \
--build-arg FRAMEWORK_VERSION=\"${FRAMEWORK_VERSION}\""

# Use pre-built runtime base if available (skips compile stages)
if [[ -n "${RUNTIME_BASE}" ]]; then
echo "Using pre-built runtime base: ${RUNTIME_BASE}"
BUILD_CMD="${BUILD_CMD} \
--build-arg RUNTIME_BASE=\"${RUNTIME_BASE}\""
fi

# Add SageMaker labels if customer-type is 'sagemaker'
if [[ "${CUSTOMER_TYPE}" == "sagemaker" ]]; then
BUILD_CMD="${BUILD_CMD} \
Expand Down
13 changes: 2 additions & 11 deletions .github/workflows/pr-base-v1.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,8 @@
name: PR - Base v1

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "docker/base/**"
- "scripts/common/**"
- "test/cuda/**"
- "test/security/data/ecr_scan_allowlist/base/**"
- ".github/config/base-v1.yml"
- ".github/workflows/pr-base-v1.yml"
- "!docs/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
13 changes: 2 additions & 11 deletions .github/workflows/pr-base-v2.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,8 @@
name: PR - Base v2

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "docker/base/**"
- "scripts/common/**"
- "test/cuda/**"
- "test/security/data/ecr_scan_allowlist/base/**"
- ".github/config/base-v2.yml"
- ".github/workflows/pr-base-v2.yml"
- "!docs/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
7 changes: 2 additions & 5 deletions .github/workflows/pr-docs.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
name: PR - Documentations

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "**docs**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
14 changes: 2 additions & 12 deletions .github/workflows/pr-lambda.yml
Original file line number Diff line number Diff line change
@@ -1,18 +1,8 @@
name: PR - Lambda

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "docker/lambda/**"
- "scripts/lambda/**"
- "scripts/common/**"
- "scripts/telemetry/**"
- "test/lambda/**"
- "test/security/data/ecr_scan_allowlist/lambda/**"
- ".github/workflows/pr-lambda.yml"
- "!docs/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
11 changes: 2 additions & 9 deletions .github/workflows/pr-pytorch-ec2.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,8 @@
name: PR - PyTorch EC2

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "docker/pytorch/**"
- "scripts/pytorch/**"
- "test/pytorch/**"
- ".github/workflows/pr-pytorch-ec2.yml"
- "!docs/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
8 changes: 2 additions & 6 deletions .github/workflows/pr-ray-ec2-cpu.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
name: PR - Ray EC2 CPU

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "**ray**"
- "!docs/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
8 changes: 2 additions & 6 deletions .github/workflows/pr-ray-ec2-gpu.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
name: PR - Ray EC2 GPU

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "**ray**"
- "!docs/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
8 changes: 2 additions & 6 deletions .github/workflows/pr-ray-sagemaker-cpu.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
name: PR - Ray SageMaker CPU

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "**ray**"
- "!docs/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
8 changes: 2 additions & 6 deletions .github/workflows/pr-ray-sagemaker-gpu.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
name: PR - Ray SageMaker GPU

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "**ray**"
- "!docs/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
10 changes: 2 additions & 8 deletions .github/workflows/pr-sagemaker-xgboost.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
name: PR - SageMaker XGBoost

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "docker/xgboost/**"
- ".github/config/sagemaker-xgboost.yml"
- ".github/workflows/pr-sagemaker-xgboost.yml"
- "!docs/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
18 changes: 2 additions & 16 deletions .github/workflows/pr-sglang-ec2-amzn2023.yml
Original file line number Diff line number Diff line change
@@ -1,22 +1,8 @@
name: PR - SGLang EC2 AMZN2023

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "docker/sglang/Dockerfile.amzn2023"
- "scripts/sglang/dockerd_entrypoint.sh"
- "scripts/sglang/sagemaker_entrypoint.sh"
- "scripts/common/**"
- "scripts/telemetry/**"
- ".github/config/sglang-ec2-amzn2023.yml"
- ".github/config/sglang-model-tests.yml"
- ".github/workflows/pr-sglang-ec2-amzn2023.yml"
- ".github/workflows/reusable-sglang-model-tests.yml"
- "test/sanity/**"
- "test/telemetry/**"
- "test/sglang/scripts/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
9 changes: 2 additions & 7 deletions .github/workflows/pr-sglang-ec2.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,8 @@
name: PR - SGLang EC2

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "**sglang**"
- "!docs/**"
- "!**amzn2023**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
20 changes: 2 additions & 18 deletions .github/workflows/pr-sglang-sagemaker-amzn2023.yml
Original file line number Diff line number Diff line change
@@ -1,24 +1,8 @@
name: PR - SGLang SageMaker AMZN2023

# Disabled: focusing on omni workflows only
on:
pull_request:
branches: [main]
types: [opened, reopened, synchronize]
paths:
- "docker/sglang/Dockerfile.amzn2023"
- "scripts/sglang/dockerd_entrypoint.sh"
- "scripts/sglang/sagemaker_entrypoint.sh"
- "scripts/common/**"
- "scripts/telemetry/**"
- ".github/config/sglang-sagemaker-amzn2023.yml"
- ".github/workflows/pr-sglang-sagemaker-amzn2023.yml"
- ".github/workflows/reusable-sglang-sagemaker-tests.yml"
- ".github/workflows/reusable-sglang-model-tests.yml"
- ".github/config/sglang-model-tests.yml"
- "test/sanity/**"
- "test/telemetry/**"
- "test/sglang/sagemaker/**"
- "test/sglang/scripts/**"
workflow_dispatch: {}

permissions:
contents: read
Expand Down
Loading
Loading