Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
7ed3733
consolidating qwen runtimes and models
YouNeedCryDear Mar 4, 2026
9b018d9
increase model size support to 122B
YouNeedCryDear Mar 4, 2026
a0e975a
add fp8 tp8 runtime for 397B model
YouNeedCryDear Mar 4, 2026
b330586
modify the GPU and CPU resourse request for qwen runtimes
YouNeedCryDear Mar 4, 2026
d39ab1a
improve qwen runtime to cover more models
YouNeedCryDear Mar 5, 2026
09877b1
add qwen3 VL in supported model format
YouNeedCryDear Mar 6, 2026
1acc557
add sample isvc for supported qwen models
YouNeedCryDear Mar 6, 2026
fbfb609
remove old model specific runtimes
YouNeedCryDear Mar 6, 2026
0af048f
use qwen.<MODEL NAME LOWER CASE> as display name for consistency
YouNeedCryDear Mar 6, 2026
87ff941
remove old isvc samples
YouNeedCryDear Mar 9, 2026
fde6a8f
use smg container with grpc
YouNeedCryDear Mar 11, 2026
ecc2d25
fine grind the engine args
YouNeedCryDear Mar 12, 2026
bbb3602
add worker timeout to fp8 runtimes and use http mode for mm runtimes
YouNeedCryDear Mar 17, 2026
d4958f0
adjust qwen runtimes to vllm
YouNeedCryDear Mar 24, 2026
df743b1
combine runtimes for qwen
YouNeedCryDear Mar 24, 2026
19fdb61
add generation config and optimization in engine arg
YouNeedCryDear Mar 31, 2026
8d22067
add more qwen models
YouNeedCryDear Mar 31, 2026
329e135
add 512 as max concurrent request for safeguard
YouNeedCryDear Mar 31, 2026
2cd01cf
include vllm qwen runtimes in kustomize
YouNeedCryDear Mar 31, 2026
25fca34
use -1 for max model len and update smg to 1.4.0
YouNeedCryDear Apr 3, 2026
f2e987a
separate router config and use grpc mode for connection with engine
YouNeedCryDear Apr 27, 2026
7760dcf
add router config to all qwen sample isvc
YouNeedCryDear Apr 27, 2026
e24fb8a
add qwen 3.6 model files
YouNeedCryDear Apr 27, 2026
fc1583c
fix qwen runtimes
YouNeedCryDear Apr 27, 2026
73473f5
fix format
YouNeedCryDear Apr 27, 2026
87f01d7
update image to vllm 0.19.0
YouNeedCryDear Apr 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen-14B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen-14b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen-14b-chat
modelArchitecture: QWenLMHeadModel
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.32.0"
modelParameterSize: 14B
storage:
storageUri: hf://Qwen/Qwen-14B-Chat
path: /raid/models/Qwen/Qwen-14B-Chat
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen-1_8B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen-1-8b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen-1_8b-chat
modelArchitecture: QWenLMHeadModel
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.32.0"
modelParameterSize: 1.8B
storage:
storageUri: hf://Qwen/Qwen-1_8B-Chat
path: /raid/models/Qwen/Qwen-1_8B-Chat
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen-72B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen-72b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen-72b-chat
modelArchitecture: QWenLMHeadModel
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.32.0"
modelParameterSize: 72B
storage:
storageUri: hf://Qwen/Qwen-72B-Chat
path: /raid/models/Qwen/Qwen-72B-Chat
2 changes: 2 additions & 0 deletions config/models/Qwen/Qwen-Image-Edit-Plus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ kind: ClusterBaseModel
metadata:
name: qwen-image-edit-plus
spec:
modelCapabilities:
- IMAGE_TEXT_TO_IMAGE
vendor: Qwen
displayName: qwen.qwen-image-edit-plus
disabled: false
Expand Down
2 changes: 2 additions & 0 deletions config/models/Qwen/Qwen-Image-Edit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ kind: ClusterBaseModel
metadata:
name: qwen-image-edit
spec:
modelCapabilities:
- IMAGE_TEXT_TO_IMAGE
vendor: Qwen
displayName: qwen.qwen-image-edit
disabled: false
Expand Down
2 changes: 2 additions & 0 deletions config/models/Qwen/Qwen-Image.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ kind: ClusterBaseModel
metadata:
name: qwen-image
spec:
modelCapabilities:
- TEXT_TO_IMAGE
vendor: Qwen
displayName: qwen.qwen-image
disabled: false
Expand Down
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen1.5-0.5B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen1-5-0-5b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1.5-0.5b-chat
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.37.0"
modelParameterSize: 0.5B
storage:
storageUri: hf://Qwen/Qwen1.5-0.5B-Chat
path: /raid/models/Qwen/Qwen1.5-0.5B-Chat
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen1.5-1.8B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen1-5-1-8b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1.5-1.8b-chat
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.37.0"
modelParameterSize: 1.8B
storage:
storageUri: hf://Qwen/Qwen1.5-1.8B-Chat
path: /raid/models/Qwen/Qwen1.5-1.8B-Chat
2 changes: 1 addition & 1 deletion config/models/Qwen/Qwen1.5-110B-Chat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1-5-110b-chat
displayName: qwen.qwen1.5-110b-chat
disabled: false
version: "1.0.0"
modelArchitecture: Qwen2ForCausalLM
Expand Down
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen1.5-14B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen1-5-14b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1.5-14b-chat
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.37.0"
modelParameterSize: 14B
storage:
storageUri: hf://Qwen/Qwen1.5-14B-Chat
path: /raid/models/Qwen/Qwen1.5-14B-Chat
2 changes: 1 addition & 1 deletion config/models/Qwen/Qwen1.5-32B-Chat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1-5-32b-chat
displayName: qwen.qwen1.5-32b-chat
disabled: false
version: "1.0.0"
modelArchitecture: Qwen2ForCausalLM
Expand Down
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen1.5-4B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen1-5-4b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1.5-4b-chat
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.37.0"
modelParameterSize: 4B
storage:
storageUri: hf://Qwen/Qwen1.5-4B-Chat
path: /raid/models/Qwen/Qwen1.5-4B-Chat
2 changes: 1 addition & 1 deletion config/models/Qwen/Qwen1.5-72B-Chat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1-5-72b-chat
displayName: qwen.qwen1.5-72b-chat
disabled: false
version: "1.0.0"
modelArchitecture: Qwen2ForCausalLM
Expand Down
2 changes: 1 addition & 1 deletion config/models/Qwen/Qwen1.5-7B-Chat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1-5-7b-chat
displayName: qwen.qwen1.5-7b-chat
disabled: false
version: "1.0.0"
modelArchitecture: Qwen2ForCausalLM
Expand Down
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen1.5-MoE-A2.7B-Chat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen1-5-moe-a2-7b-chat
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen1.5-moe-a2.7b-chat
modelArchitecture: Qwen2MoeForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.39.0.dev0"
modelParameterSize: 14.3B
storage:
storageUri: hf://Qwen/Qwen1.5-MoE-A2.7B-Chat
path: /raid/models/Qwen/Qwen1.5-MoE-A2.7B-Chat
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-0.5B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-0-5b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-0.5b-instruct
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.40.1"
modelParameterSize: 0.5B
storage:
storageUri: hf://Qwen/Qwen2-0.5B-Instruct
path: /raid/models/Qwen/Qwen2-0.5B-Instruct
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-1.5B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-1-5b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-1.5b-instruct
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.40.1"
modelParameterSize: 1.5B
storage:
storageUri: hf://Qwen/Qwen2-1.5B-Instruct
path: /raid/models/Qwen/Qwen2-1.5B-Instruct
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-57B-A14B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-57b-a14b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-57b-a14b-instruct
modelArchitecture: Qwen2MoeForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.40.1"
modelParameterSize: 57B
storage:
storageUri: hf://Qwen/Qwen2-57B-A14B-Instruct
path: /raid/models/Qwen/Qwen2-57B-A14B-Instruct
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-Math-1.5B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-math-1-5b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-math-1.5b-instruct
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.43.1"
modelParameterSize: 1.5B
storage:
storageUri: hf://Qwen/Qwen2-Math-1.5B-Instruct
path: /raid/models/Qwen/Qwen2-Math-1.5B-Instruct
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-Math-72B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-math-72b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-math-72b-instruct
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.43.1"
modelParameterSize: 72B
storage:
storageUri: hf://Qwen/Qwen2-Math-72B-Instruct
path: /raid/models/Qwen/Qwen2-Math-72B-Instruct
22 changes: 22 additions & 0 deletions config/models/Qwen/Qwen2-Math-7B-Instruct.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: ome.io/v1beta1
kind: ClusterBaseModel
metadata:
name: qwen2-math-7b-instruct
spec:
modelCapabilities:
- TEXT_TO_TEXT
vendor: Qwen
displayName: qwen.qwen2-math-7b-instruct
modelArchitecture: Qwen2ForCausalLM
disabled: false
version: "1.0.0"
modelFormat:
name: safetensors
version: "1.0.0"
modelFramework:
name: transformers
version: "4.43.1"
modelParameterSize: 7B
storage:
storageUri: hf://Qwen/Qwen2-Math-7B-Instruct
path: /raid/models/Qwen/Qwen2-Math-7B-Instruct
Loading
Loading