Skip to content

Commit 2dcde5d

Browse files
authored
Merge pull request #139 from Priya-Quad/main
Adding G4 Wan Recipe
2 parents 415897e + 88d60c7 commit 2dcde5d

1 file changed

Lines changed: 134 additions & 0 deletions

File tree

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Single host inference benchmark of Wan2.2 with Sglang on G4
2+
3+
This recipe shows how to serve and benchmark the Wan-AI/Wan2.2-T2V-A14B & Wan-AI/Wan2.2-I2V-A14B model using [SGLang](https://github.com/sgl-project/sglang/tree/main) on a single GCP G4 VM with RTX PRO 6000 GPUs. For more information on G4 machine types, see the [GCP documentation](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-machine-types).
4+
5+
## Before you begin
6+
7+
### 1. Create a GCP VM with G4 GPUs
8+
9+
First, we will create a Google Cloud Platform (GCP) Virtual Machine (VM) that has the necessary GPU resources.
10+
11+
Make sure you have the following prerequisites:
12+
* [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) is initialized.
13+
* You have a project with a GPU quota. See [Request a quota increase](https://cloud.google.com/docs/quota/view-request#requesting_higher_quota).
14+
* [Enable required APIs](https://console.cloud.google.com/flows/enableapi?apiid=compute.googleapis.com).
15+
16+
The following commands set up environment variables and create a GCE instance. The `MACHINE_TYPE` is set to `g4-standard-384` for a multi-GPU VM (8 GPUs). The boot disk is set to 200GB to accommodate the models and dependencies.
17+
18+
```bash
19+
export VM_NAME="${USER}-g4-sglang-wan2.2"
20+
export PROJECT_ID="your-project-id"
21+
export ZONE="your-zone"
22+
export MACHINE_TYPE="g4-standard-384"
23+
export IMAGE_PROJECT="ubuntu-os-accelerator-images"
24+
export IMAGE_FAMILY="ubuntu-accelerator-2404-amd64-with-nvidia-570"
25+
26+
gcloud compute instances create ${VM_NAME} \
27+
--machine-type=${MACHINE_TYPE} \
28+
--project=${PROJECT_ID} \
29+
--zone=${ZONE} \
30+
--image-project=${IMAGE_PROJECT} \
31+
--image-family=${IMAGE_FAMILY} \
32+
--maintenance-policy=TERMINATE \
33+
--boot-disk-size=200GB
34+
```
35+
36+
### 2. Connect to the VM
37+
38+
Use `gcloud compute ssh` to connect to the newly created instance.
39+
40+
```bash
41+
gcloud compute ssh ${VM_NAME?} --project=${PROJECT_ID?} --zone=${ZONE?}
42+
```
43+
44+
```bash
45+
# Run NVIDIA smi to verify the driver installation and see the available GPUs.
46+
nvidia-smi
47+
```
48+
49+
## Serve a model
50+
51+
### 1. Install Docker
52+
53+
Before you can serve the model, you need to have Docker installed on your VM. You can follow the official documentation to install Docker on Ubuntu:
54+
[Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
55+
56+
After installing Docker, make sure the Docker daemon is running.
57+
58+
### 2. Install NVIDIA Container Toolkit
59+
60+
To enable Docker containers to access the GPU, you need to install the NVIDIA Container Toolkit.
61+
62+
You can follow the official NVIDIA documentation to install the container toolkit:
63+
[NVIDIA Container Toolkit Install Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
64+
65+
### 3. Setup Sglang
66+
67+
This prepares the host for model storage and starts the SGLang Docker container. We mount the /scratch directory to ensure model weights persist on the host disk and enable the --gpus all flag so the container can utilize the G4 hardware.
68+
69+
```bash
70+
# Create a local directory to store model weights and cache
71+
mkdir -p /scratch/cache
72+
73+
# Define the SGLang development image
74+
export IMAGE_URL="lmsysorg/sglang:latest"
75+
76+
# Start the container with GPU support and persistent volume mounts
77+
docker run -it \
78+
--gpus all \
79+
-v /scratch:/scratch \
80+
-v /scratch/cache:/root/.cache \
81+
--ipc=host \
82+
$IMAGE_URL \
83+
/bin/bash
84+
```
85+
86+
### 4. Download the Model Weights
87+
88+
Inside the container, we use the Hugging Face CLI to download the Wan2.2 model files. These are saved to the /scratch mount to prevent data loss when the container is deleted.
89+
90+
```bash
91+
# Download the base model from Hugging Face
92+
apt-get update && apt-get install -y huggingface-cli
93+
94+
huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir /scratch/models/Wan2.2
95+
huggingface-cli download Wan-AI/Wan2.2-I2V-A14B --local-dir /scratch/models/Wan2.2
96+
97+
```
98+
99+
## Run Benchmarks
100+
101+
Use the following commands to test video generation. These examples show how to run the model on a single GPU or across multiple GPUs using Tensor Parallelism (--tp-size). Download Image from internet to run the benchmark to test Image to Video generation.
102+
103+
*Benchmark: Text-to-Video on 1 GPU*
104+
```bash
105+
sglang generate --model-path Wan-AI/Wan2.2-T2V-A14B-Diffusers --dit-layerwise-offload false --text-encoder-cpu-offload false --vae-cpu-offload false --pin-cpu-memory --dit-cpu-offload false --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." --save-output --num-gpus 1 --num-frames 81
106+
```
107+
*Benchmark: Text-to-Video on 4 GPU*
108+
```bash
109+
sglang generate --model-path Wan-AI/Wan2.2-T2V-A14B-Diffusers --dit-layerwise-offload false --text-encoder-cpu-offload false --vae-cpu-offload false --pin-cpu-memory --dit-cpu-offload false --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." --save-output --num-gpus 4 --tp-size 4 --num-frames 93
110+
```
111+
*Benchmark: Image-to-Video on 1 GPU*
112+
```bash
113+
sglang generate --model-path Wan-AI/Wan2.2-I2V-A14B-Diffusers --image-path assets/logo.png --dit-layerwise-offload false --text-encoder-cpu-offload false --vae-cpu-offload false --pin-cpu-memory --dit-cpu-offload false --prompt "A curious raccoon" --save-output --num-gpus 1 --num-frames 81
114+
```
115+
*Benchmark: Image-to-Video on 4 GPU*
116+
```bash
117+
sglang generate --model-path Wan-AI/Wan2.2-I2V-A14B-Diffusers --image-path assets/logo.png --dit-layerwise-offload false --text-encoder-cpu-offload false --vae-cpu-offload false --pin-cpu-memory --dit-cpu-offload false --prompt "A curious raccoon" --save-output --num-gpus 4 --tp-size 4 --num-frames 93
118+
```
119+
120+
## Clean up
121+
122+
### 1. Exit the container
123+
124+
```bash
125+
exit
126+
```
127+
128+
### 2. Delete the VM
129+
130+
This command will delete the GCE instance and all its disks.
131+
132+
```bash
133+
gcloud compute instances delete ${VM_NAME?} --zone=${ZONE?} --project=${PROJECT_ID} --quiet --delete-disks=all
134+
```

0 commit comments

Comments
 (0)