Skip to content

Commit c068d86

Browse files
committed
feat(docker): update Dockerfile and add new vLLM configurations
- Added new provider configurations for vLLM Qwen 3.6 in both thinking and non-thinking modes. - Updated the Dockerfile to include the new configuration files for vLLM Qwen 3.6 and ensure proper setup for deployment.
1 parent 268732f commit c068d86

3 files changed

Lines changed: 370 additions & 1 deletion

File tree

Dockerfile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,8 +171,10 @@ COPY examples/configs/ollama-qwen332b-fp16-tc.provider.yml /opt/pentagi/conf/
171171
COPY examples/configs/ollama-qwq32b-fp16-tc.provider.yml /opt/pentagi/conf/
172172
COPY examples/configs/openrouter.provider.yml /opt/pentagi/conf/
173173
COPY examples/configs/novita.provider.yml /opt/pentagi/conf/
174-
COPY examples/configs/vllm-qwen3.5-27b-fp8.provider.yml /opt/pentagi/conf/
175174
COPY examples/configs/vllm-qwen3.5-27b-fp8-no-think.provider.yml /opt/pentagi/conf/
175+
COPY examples/configs/vllm-qwen3.5-27b-fp8.provider.yml /opt/pentagi/conf/
176+
COPY examples/configs/vllm-qwen3.6-27b-fp8-no-think.provider.yml /opt/pentagi/conf/
177+
COPY examples/configs/vllm-qwen3.6-27b-fp8.provider.yml /opt/pentagi/conf/
176178
COPY examples/configs/vllm-qwen332b-fp16.provider.yml /opt/pentagi/conf/
177179

178180
COPY LICENSE /opt/pentagi/LICENSE
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# Qwen3.6-27B FP8 Provider Configuration - NON-THINKING MODE
2+
# Based on official Qwen recommendations for vLLM inference
3+
# Architecture: Hybrid 75% DeltaNet + 25% Full Attention (48+16 layers)
4+
# Context: 262K native, expandable to 1M with YaRN
5+
# Vision: VLM with Vision Encoder (uses VRAM even for text-only tasks)
6+
#
7+
# Non-thinking mode is disabled via extra_body parameter
8+
# Recommended sampling parameters:
9+
# - General tasks: temp=0.7, top_p=0.8, top_k=20, min_p=0.0, pp=1.5, rp=1.0
10+
# - Reasoning tasks: temp=1.0, top_p=0.95, top_k=20, min_p=0.0, pp=1.5, rp=1.0
11+
12+
simple:
13+
model: "Qwen/Qwen3.6-27B-FP8"
14+
temperature: 0.7
15+
top_k: 20
16+
top_p: 0.8
17+
min_p: 0.0
18+
presence_penalty: 1.5
19+
repetition_penalty: 1.0
20+
n: 1
21+
max_tokens: 32768
22+
extra_body:
23+
chat_template_kwargs:
24+
enable_thinking: false
25+
26+
simple_json:
27+
model: "Qwen/Qwen3.6-27B-FP8"
28+
temperature: 0.7
29+
top_k: 20
30+
top_p: 0.8
31+
min_p: 0.0
32+
presence_penalty: 1.5
33+
repetition_penalty: 1.0
34+
n: 1
35+
max_tokens: 32768
36+
json: true
37+
extra_body:
38+
chat_template_kwargs:
39+
enable_thinking: false
40+
41+
primary_agent:
42+
model: "Qwen/Qwen3.6-27B-FP8"
43+
temperature: 1.0
44+
top_k: 20
45+
top_p: 0.95
46+
min_p: 0.0
47+
presence_penalty: 1.5
48+
repetition_penalty: 1.0
49+
n: 1
50+
max_tokens: 32768
51+
extra_body:
52+
chat_template_kwargs:
53+
enable_thinking: false
54+
55+
assistant:
56+
model: "Qwen/Qwen3.6-27B-FP8"
57+
temperature: 1.0
58+
top_k: 20
59+
top_p: 0.95
60+
min_p: 0.0
61+
presence_penalty: 1.5
62+
repetition_penalty: 1.0
63+
n: 1
64+
max_tokens: 32768
65+
extra_body:
66+
chat_template_kwargs:
67+
enable_thinking: false
68+
69+
generator:
70+
model: "Qwen/Qwen3.6-27B-FP8"
71+
temperature: 1.0
72+
top_k: 20
73+
top_p: 0.95
74+
min_p: 0.0
75+
presence_penalty: 1.5
76+
repetition_penalty: 1.0
77+
n: 1
78+
max_tokens: 32768
79+
extra_body:
80+
chat_template_kwargs:
81+
enable_thinking: false
82+
83+
refiner:
84+
model: "Qwen/Qwen3.6-27B-FP8"
85+
temperature: 1.0
86+
top_k: 20
87+
top_p: 0.95
88+
min_p: 0.0
89+
presence_penalty: 1.5
90+
repetition_penalty: 1.0
91+
n: 1
92+
max_tokens: 32768
93+
extra_body:
94+
chat_template_kwargs:
95+
enable_thinking: false
96+
97+
adviser:
98+
model: "Qwen/Qwen3.6-27B-FP8"
99+
temperature: 1.0
100+
top_k: 20
101+
top_p: 0.95
102+
min_p: 0.0
103+
presence_penalty: 1.5
104+
repetition_penalty: 1.0
105+
n: 1
106+
max_tokens: 32768
107+
extra_body:
108+
chat_template_kwargs:
109+
enable_thinking: false
110+
111+
reflector:
112+
model: "Qwen/Qwen3.6-27B-FP8"
113+
temperature: 1.0
114+
top_k: 20
115+
top_p: 0.95
116+
min_p: 0.0
117+
presence_penalty: 1.5
118+
repetition_penalty: 1.0
119+
n: 1
120+
max_tokens: 32768
121+
extra_body:
122+
chat_template_kwargs:
123+
enable_thinking: false
124+
125+
searcher:
126+
model: "Qwen/Qwen3.6-27B-FP8"
127+
temperature: 0.7
128+
top_k: 20
129+
top_p: 0.8
130+
min_p: 0.0
131+
presence_penalty: 1.5
132+
repetition_penalty: 1.0
133+
n: 1
134+
max_tokens: 32768
135+
extra_body:
136+
chat_template_kwargs:
137+
enable_thinking: false
138+
139+
enricher:
140+
model: "Qwen/Qwen3.6-27B-FP8"
141+
temperature: 0.7
142+
top_k: 20
143+
top_p: 0.8
144+
min_p: 0.0
145+
presence_penalty: 1.5
146+
repetition_penalty: 1.0
147+
n: 1
148+
max_tokens: 32768
149+
extra_body:
150+
chat_template_kwargs:
151+
enable_thinking: false
152+
153+
coder:
154+
model: "Qwen/Qwen3.6-27B-FP8"
155+
temperature: 1.0
156+
top_k: 20
157+
top_p: 0.95
158+
min_p: 0.0
159+
presence_penalty: 1.5
160+
repetition_penalty: 1.0
161+
n: 1
162+
max_tokens: 32768
163+
extra_body:
164+
chat_template_kwargs:
165+
enable_thinking: false
166+
167+
installer:
168+
model: "Qwen/Qwen3.6-27B-FP8"
169+
temperature: 1.0
170+
top_k: 20
171+
top_p: 0.95
172+
min_p: 0.0
173+
presence_penalty: 1.5
174+
repetition_penalty: 1.0
175+
n: 1
176+
max_tokens: 32768
177+
extra_body:
178+
chat_template_kwargs:
179+
enable_thinking: false
180+
181+
pentester:
182+
model: "Qwen/Qwen3.6-27B-FP8"
183+
temperature: 1.0
184+
top_k: 20
185+
top_p: 0.95
186+
min_p: 0.0
187+
presence_penalty: 1.5
188+
repetition_penalty: 1.0
189+
n: 1
190+
max_tokens: 32768
191+
extra_body:
192+
chat_template_kwargs:
193+
enable_thinking: false
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# Qwen3.6-27B FP8 Provider Configuration - THINKING MODE (default)
2+
# Based on official Qwen recommendations for vLLM inference
3+
# Architecture: Hybrid 75% DeltaNet + 25% Full Attention (48+16 layers)
4+
# Context: 262K native, expandable to 1M with YaRN
5+
# Vision: VLM with Vision Encoder (uses VRAM even for text-only tasks)
6+
#
7+
# Thinking mode is enabled by default (no extra_body needed)
8+
# Recommended sampling parameters:
9+
# - General tasks: temp=1.0, top_p=0.95, top_k=20, min_p=0.0, pp=1.5, rp=1.0
10+
# - Precise coding: temp=0.6, top_p=0.95, top_k=20, min_p=0.0, pp=0.0, rp=1.0
11+
#
12+
# Non-thinking mode is disabled via extra_body parameter
13+
# Recommended sampling parameters:
14+
# - General tasks: temp=0.7, top_p=0.8, top_k=20, min_p=0.0, pp=1.5, rp=1.0
15+
# - Reasoning tasks: temp=1.0, top_p=0.95, top_k=20, min_p=0.0, pp=1.5, rp=1.0
16+
17+
simple:
18+
model: "Qwen/Qwen3.6-27B-FP8"
19+
temperature: 0.7
20+
top_k: 20
21+
top_p: 0.8
22+
min_p: 0.0
23+
presence_penalty: 1.5
24+
repetition_penalty: 1.0
25+
n: 1
26+
max_tokens: 32768
27+
extra_body:
28+
chat_template_kwargs:
29+
enable_thinking: false
30+
31+
simple_json:
32+
model: "Qwen/Qwen3.6-27B-FP8"
33+
temperature: 0.7
34+
top_k: 20
35+
top_p: 0.8
36+
min_p: 0.0
37+
presence_penalty: 1.5
38+
repetition_penalty: 1.0
39+
n: 1
40+
max_tokens: 32768
41+
json: true
42+
extra_body:
43+
chat_template_kwargs:
44+
enable_thinking: false
45+
46+
primary_agent:
47+
model: "Qwen/Qwen3.6-27B-FP8"
48+
temperature: 1.0
49+
top_k: 20
50+
top_p: 0.95
51+
min_p: 0.0
52+
presence_penalty: 1.5
53+
repetition_penalty: 1.0
54+
n: 1
55+
max_tokens: 32768
56+
57+
assistant:
58+
model: "Qwen/Qwen3.6-27B-FP8"
59+
temperature: 1.0
60+
top_k: 20
61+
top_p: 0.95
62+
min_p: 0.0
63+
presence_penalty: 1.5
64+
repetition_penalty: 1.0
65+
n: 1
66+
max_tokens: 32768
67+
68+
generator:
69+
model: "Qwen/Qwen3.6-27B-FP8"
70+
temperature: 1.0
71+
top_k: 20
72+
top_p: 0.95
73+
min_p: 0.0
74+
presence_penalty: 1.5
75+
repetition_penalty: 1.0
76+
n: 1
77+
max_tokens: 32768
78+
79+
refiner:
80+
model: "Qwen/Qwen3.6-27B-FP8"
81+
temperature: 1.0
82+
top_k: 20
83+
top_p: 0.95
84+
min_p: 0.0
85+
presence_penalty: 1.5
86+
repetition_penalty: 1.0
87+
n: 1
88+
max_tokens: 32768
89+
90+
adviser:
91+
model: "Qwen/Qwen3.6-27B-FP8"
92+
temperature: 1.0
93+
top_k: 20
94+
top_p: 0.95
95+
min_p: 0.0
96+
presence_penalty: 1.5
97+
repetition_penalty: 1.0
98+
n: 1
99+
max_tokens: 32768
100+
101+
reflector:
102+
model: "Qwen/Qwen3.6-27B-FP8"
103+
temperature: 1.0
104+
top_k: 20
105+
top_p: 0.95
106+
min_p: 0.0
107+
presence_penalty: 1.5
108+
repetition_penalty: 1.0
109+
n: 1
110+
max_tokens: 32768
111+
extra_body:
112+
chat_template_kwargs:
113+
enable_thinking: false
114+
115+
searcher:
116+
model: "Qwen/Qwen3.6-27B-FP8"
117+
temperature: 0.7
118+
top_k: 20
119+
top_p: 0.8
120+
min_p: 0.0
121+
presence_penalty: 1.5
122+
repetition_penalty: 1.0
123+
n: 1
124+
max_tokens: 32768
125+
extra_body:
126+
chat_template_kwargs:
127+
enable_thinking: false
128+
129+
enricher:
130+
model: "Qwen/Qwen3.6-27B-FP8"
131+
temperature: 0.7
132+
top_k: 20
133+
top_p: 0.8
134+
min_p: 0.0
135+
presence_penalty: 1.5
136+
repetition_penalty: 1.0
137+
n: 1
138+
max_tokens: 32768
139+
extra_body:
140+
chat_template_kwargs:
141+
enable_thinking: false
142+
143+
coder:
144+
model: "Qwen/Qwen3.6-27B-FP8"
145+
temperature: 0.6
146+
top_k: 20
147+
top_p: 0.95
148+
min_p: 0.0
149+
presence_penalty: 0.0
150+
repetition_penalty: 1.0
151+
n: 1
152+
max_tokens: 32768
153+
154+
installer:
155+
model: "Qwen/Qwen3.6-27B-FP8"
156+
temperature: 0.6
157+
top_k: 20
158+
top_p: 0.95
159+
min_p: 0.0
160+
presence_penalty: 0.0
161+
repetition_penalty: 1.0
162+
n: 1
163+
max_tokens: 32768
164+
165+
pentester:
166+
model: "Qwen/Qwen3.6-27B-FP8"
167+
temperature: 0.6
168+
top_k: 20
169+
top_p: 0.95
170+
min_p: 0.0
171+
presence_penalty: 0.0
172+
repetition_penalty: 1.0
173+
n: 1
174+
max_tokens: 32768

0 commit comments

Comments
 (0)