Skip to content

Commit 16ad7f7

Browse files
authored
Merge pull request #243 from wenqinI/a4x-inference-version
add version for a4x inference freamwork
2 parents d2250d8 + d1c9d4c commit 16ad7f7

1 file changed

Lines changed: 11 additions & 11 deletions

File tree

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -92,17 +92,17 @@ Models | GPU Machine Type
9292

9393
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
9494
| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |
95-
| **DeepSeek R1 671B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | vLLM | Inference | GKE | [Link](./inference/a4x/single-host-serving/vllm/README.md)
96-
| **Wan2.2 T2V A14B Diffusers** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | SGLang | Inference | GKE | [Link](./inference/a4x/single-host-serving/sglang/README.md)
97-
| **Wan2.2 I2V A14B Diffusers** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | SGLang | Inference | GKE | [Link](./inference/a4x/single-host-serving/sglang/README.md)
98-
| **DeepSeek R1 671B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md) <br> <br> [Link for Using Google Cloud Storage (GCS) as Storage Option]((./inference/a4x/single-host-serving/tensorrt-llm-gcs/README.md)) <br> <br> [Link for Using Lustre as Storage Option]((./inference/a4x/single-host-serving/tensorrt-llm-lustre/README.md))
99-
| **Llama 3.1 405B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
100-
| **Llama 3.1 70B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
101-
| **Llama 3.1 8B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
102-
| **Qwen 2.5 VL 7B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
103-
| **Qwen 3 235B A22B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
104-
| **Qwen 3 32B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
105-
| **Qwen 3 4B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
95+
| **DeepSeek R1 671B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | vLLM (v0.14.0rc1) | Inference | GKE | [Link](./inference/a4x/single-host-serving/vllm/README.md)
96+
| **Wan2.2 T2V A14B Diffusers** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | SGLang (latest) | Inference | GKE | [Link](./inference/a4x/single-host-serving/sglang/README.md)
97+
| **Wan2.2 I2V A14B Diffusers** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | SGLang (latest) | Inference | GKE | [Link](./inference/a4x/single-host-serving/sglang/README.md)
98+
| **DeepSeek R1 671B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md) <br> <br> [Link for Using Google Cloud Storage (GCS) as Storage Option]((./inference/a4x/single-host-serving/tensorrt-llm-gcs/README.md)) <br> <br> [Link for Using Lustre as Storage Option]((./inference/a4x/single-host-serving/tensorrt-llm-lustre/README.md))
99+
| **Llama 3.1 405B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
100+
| **Llama 3.1 70B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
101+
| **Llama 3.1 8B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
102+
| **Qwen 2.5 VL 7B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
103+
| **Qwen 3 235B A22B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
104+
| **Qwen 3 32B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
105+
| **Qwen 3 4B** | [A4X (NVIDIA GB200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4x-vms) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | [Link](./inference/a4x/single-host-serving/tensorrt-llm/README.md)
106106

107107

108108
### Inference benchmarks G4

0 commit comments

Comments
 (0)