Skip to content

Commit 9e10c7d

Browse files
Add Qwen3.5 4b, Disable thinking, fix RGB bug (#2319)
* 4b * alias fix * style * style * update names and drop reasoininh button * take out extra alais hnadling * rgb processing * rgb processing * fix v1 version --------- Co-authored-by: Paweł Pęczek <146137186+PawelPeczek-Roboflow@users.noreply.github.com>
1 parent ee4199b commit 9e10c7d

12 files changed

Lines changed: 311 additions & 11 deletions

File tree

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,39 @@
11
# Qwen 3.5
22

3-
<a href="https://github.com/QwenLM/Qwen3.5" target="_blank">Qwen 3.5-VL</a> is a vision-language model developed by Alibaba.
3+
<a href="https://github.com/QwenLM/Qwen3.5" target="_blank">Qwen 3.5</a> is a vision-language model developed by Alibaba.
44

5-
You can use Qwen 3.5-VL for a range of multimodal tasks, including image understanding, visual question answering, and document analysis. It also supports a "thinking" mode that lets the model generate reasoning tokens before answering.
5+
You can use Qwen 3.5 for a range of multimodal tasks, including image understanding, visual question answering, and document analysis. It also supports a "thinking" mode that lets the model generate reasoning tokens before answering.
66

7-
You can deploy Qwen 3.5-VL with Inference.
7+
You can deploy Qwen 3.5 with Inference.
88

99
### Model Variants
1010

11-
Qwen 3.5-VL is available in two sizes:
11+
Qwen 3.5 is available in three sizes:
1212

1313
| Model ID | Parameters |
1414
|:---------|:-----------|
1515
| `qwen3_5-0.8b` | 0.8B |
1616
| `qwen3_5-2b` | 2B |
17+
| `qwen3_5-4b` | 4B |
1718

1819
### Execution Modes
1920

20-
Qwen 3.5-VL supports both local and remote execution modes when used in workflows:
21+
Qwen 3.5 supports both local and remote execution modes when used in workflows:
2122

2223
- **Local execution**: The model runs directly on your inference server (GPU recommended)
2324
- **Remote execution**: The model can be invoked via HTTP API on a remote inference server
2425

2526
### Installation
2627

27-
To install inference with the extra dependencies necessary to run Qwen 3.5-VL, run
28+
To install inference with the extra dependencies necessary to run Qwen 3.5, run
2829

2930
```pip install "inference[transformers]"```
3031

3132
or
3233

3334
```pip install "inference-gpu[transformers]"```
3435

35-
### How to Use Qwen 3.5-VL
36+
### How to Use Qwen 3.5
3637

3738
Create a new Python file called `app.py` and add the following code:
3839

@@ -61,7 +62,7 @@ Above, replace:
6162
1. `prompt` with the prompt for the model.
6263
2. The image URL with the path to the image that you want to run inference on.
6364

64-
To use Qwen 3.5-VL with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, <a href="https://app.roboflow.com" target="_blank">sign up for a free Roboflow account</a>.
65+
To use Qwen 3.5 with Inference, you will need a Roboflow API key. If you don't already have a Roboflow account, <a href="https://app.roboflow.com" target="_blank">sign up for a free Roboflow account</a>.
6566

6667
Then, run the Python script you have created:
6768

inference/core/entities/requests/inference.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -330,7 +330,7 @@ class LMMInferenceRequest(CVInferenceRequest):
330330
)
331331
enable_thinking: bool = Field(
332332
default=False,
333-
description="If true, enables thinking/reasoning mode for models that support it (e.g. Qwen3.5-VL). The model's reasoning will be included in the response.",
333+
description="If true, enables thinking/reasoning mode for models that support it (e.g. Qwen3.5). The model's reasoning will be included in the response.",
334334
)
335335
max_new_tokens: Optional[int] = Field(
336336
default=None,

inference/core/registries/roboflow.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@
7070
"perception_encoder": ("embed", "perception_encoder"),
7171
"qwen3_5-0.8b": ("lmm", "qwen3_5-0.8b"),
7272
"qwen3_5-2b": ("lmm", "qwen3_5-2b"),
73+
"qwen3_5-4b": ("lmm", "qwen3_5-4b"),
7374
}
7475

7576
STUB_VERSION_ID = "0"

inference/core/workflows/core_steps/loader.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,9 @@
287287
from inference.core.workflows.core_steps.models.foundation.qwen3_5vl.v1 import (
288288
Qwen35VLBlockV1,
289289
)
290+
from inference.core.workflows.core_steps.models.foundation.qwen3_5vl.v2 import (
291+
Qwen35VLBlockV2,
292+
)
290293
from inference.core.workflows.core_steps.models.foundation.qwen3_6_openrouter.v1 import (
291294
Qwen36OpenRouterBlockV1,
292295
)
@@ -932,6 +935,7 @@ def load_blocks() -> List[Type[WorkflowBlock]]:
932935
Qwen25VLBlockV1,
933936
Qwen3VLBlockV1,
934937
Qwen35VLBlockV1,
938+
Qwen35VLBlockV2,
935939
Qwen35OpenRouterBlockV1,
936940
Qwen36OpenRouterBlockV1,
937941
OpenAICompatibleBlockV1,

inference/core/workflows/core_steps/models/foundation/qwen3_5vl/v1.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ class BlockManifest(WorkflowBlockManifest):
5757
"Alibaba",
5858
],
5959
"is_vlm_block": True,
60+
"deprecated": True,
6061
"ui_manifest": {
6162
"section": "model",
6263
"icon": "fal fa-atom",
Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
from typing import List, Literal, Optional, Type, Union
2+
3+
from pydantic import ConfigDict, Field
4+
5+
from inference.core.entities.requests.inference import LMMInferenceRequest
6+
from inference.core.env import (
7+
HOSTED_CORE_MODEL_URL,
8+
LOCAL_INFERENCE_API_URL,
9+
WORKFLOWS_REMOTE_API_TARGET,
10+
)
11+
from inference.core.managers.base import ModelManager
12+
from inference.core.workflows.core_steps.common.entities import StepExecutionMode
13+
from inference.core.workflows.execution_engine.entities.base import (
14+
Batch,
15+
OutputDefinition,
16+
WorkflowImageData,
17+
)
18+
from inference.core.workflows.execution_engine.entities.types import (
19+
DICTIONARY_KIND,
20+
IMAGE_KIND,
21+
ROBOFLOW_MODEL_ID_KIND,
22+
ImageInputField,
23+
Selector,
24+
)
25+
from inference.core.workflows.prototypes.block import (
26+
BlockResult,
27+
WorkflowBlock,
28+
WorkflowBlockManifest,
29+
)
30+
from inference_sdk import InferenceHTTPClient
31+
32+
33+
##########################################################################
34+
# Qwen3.5 Workflow Block Manifest
35+
##########################################################################
36+
class BlockManifest(WorkflowBlockManifest):
37+
model_config = ConfigDict(
38+
json_schema_extra={
39+
"name": "Qwen3.5",
40+
"version": "v2",
41+
"short_description": "Run Qwen3.5 on an image.",
42+
"long_description": (
43+
"This workflow block runs Qwen3.5—a vision language model that accepts an image "
44+
"and an optional text prompt—and returns a text answer based on a conversation template."
45+
),
46+
"license": "Apache-2.0",
47+
"block_type": "model",
48+
"search_keywords": [
49+
"Qwen3.5",
50+
"qwen3.5",
51+
"vision language model",
52+
"VLM",
53+
"Alibaba",
54+
],
55+
"is_vlm_block": True,
56+
"ui_manifest": {
57+
"section": "model",
58+
"icon": "fal fa-atom",
59+
"blockPriority": 5.7,
60+
},
61+
},
62+
protected_namespaces=(),
63+
)
64+
type: Literal["roboflow_core/qwen3_5vl@v2"]
65+
66+
images: Selector(kind=[IMAGE_KIND]) = ImageInputField
67+
prompt: Optional[str] = Field(
68+
default=None,
69+
description="Optional text prompt to provide additional context to Qwen3.5. Otherwise it will just be a default one, which may affect the desired model behavior.",
70+
examples=["What is in this image?"],
71+
)
72+
model_version: Union[
73+
Literal["qwen3_5-0.8b", "qwen3_5-2b", "qwen3_5-4b"],
74+
Selector(kind=[ROBOFLOW_MODEL_ID_KIND]),
75+
str,
76+
] = Field(
77+
default="qwen3_5-0.8b",
78+
description="The Qwen3.5 model to be used for inference.",
79+
examples=["qwen3_5-0.8b", "qwen3_5-2b", "qwen3_5-4b"],
80+
)
81+
82+
system_prompt: Optional[str] = Field(
83+
default=None,
84+
description="Optional system prompt to provide additional context to Qwen3.5.",
85+
examples=["You are a helpful assistant."],
86+
)
87+
88+
max_new_tokens: Optional[int] = Field(
89+
default=None,
90+
description="Maximum number of tokens to generate. If not set, the model's default will be used.",
91+
)
92+
93+
@classmethod
94+
def describe_outputs(cls) -> List[OutputDefinition]:
95+
return [
96+
OutputDefinition(
97+
name="parsed_output",
98+
kind=[DICTIONARY_KIND],
99+
description="A parsed version of the output, provided as a dictionary containing the text.",
100+
),
101+
]
102+
103+
@classmethod
104+
def get_parameters_accepting_batches(cls) -> List[str]:
105+
return ["images"]
106+
107+
@classmethod
108+
def get_execution_engine_compatibility(cls) -> Optional[str]:
109+
return ">=1.3.0,<2.0.0"
110+
111+
@classmethod
112+
def get_supported_model_variants(cls) -> Optional[List[str]]:
113+
return ["qwen3_5-0.8b", "qwen3_5-2b", "qwen3_5-4b"]
114+
115+
116+
##########################################################################
117+
# Qwen3.5 Workflow Block
118+
##########################################################################
119+
class Qwen35VLBlockV2(WorkflowBlock):
120+
def __init__(
121+
self,
122+
model_manager: ModelManager,
123+
api_key: Optional[str],
124+
step_execution_mode: StepExecutionMode,
125+
):
126+
self._model_manager = model_manager
127+
self._api_key = api_key
128+
self._step_execution_mode = step_execution_mode
129+
130+
@classmethod
131+
def get_init_parameters(cls) -> List[str]:
132+
return ["model_manager", "api_key", "step_execution_mode"]
133+
134+
@classmethod
135+
def get_manifest(cls) -> Type[WorkflowBlockManifest]:
136+
return BlockManifest
137+
138+
def run(
139+
self,
140+
images: Batch[WorkflowImageData],
141+
model_version: str,
142+
prompt: Optional[str],
143+
system_prompt: Optional[str],
144+
max_new_tokens: Optional[int] = None,
145+
) -> BlockResult:
146+
if self._step_execution_mode == StepExecutionMode.LOCAL:
147+
return self.run_locally(
148+
images=images,
149+
model_version=model_version,
150+
prompt=prompt,
151+
system_prompt=system_prompt,
152+
max_new_tokens=max_new_tokens,
153+
)
154+
elif self._step_execution_mode == StepExecutionMode.REMOTE:
155+
return self.run_remotely(
156+
images=images,
157+
model_version=model_version,
158+
prompt=prompt,
159+
system_prompt=system_prompt,
160+
max_new_tokens=max_new_tokens,
161+
)
162+
else:
163+
raise ValueError(
164+
f"Unknown step execution mode: {self._step_execution_mode}"
165+
)
166+
167+
def run_remotely(
168+
self,
169+
images: Batch[WorkflowImageData],
170+
model_version: str,
171+
prompt: Optional[str],
172+
system_prompt: Optional[str],
173+
max_new_tokens: Optional[int] = None,
174+
) -> BlockResult:
175+
api_url = (
176+
LOCAL_INFERENCE_API_URL
177+
if WORKFLOWS_REMOTE_API_TARGET != "hosted"
178+
else HOSTED_CORE_MODEL_URL
179+
)
180+
client = InferenceHTTPClient(
181+
api_url=api_url,
182+
api_key=self._api_key,
183+
)
184+
if WORKFLOWS_REMOTE_API_TARGET == "hosted":
185+
client.select_api_v0()
186+
187+
prompt = prompt or "Describe what's in this image."
188+
system_prompt = (
189+
system_prompt
190+
or "You are a Qwen3.5 model that can answer questions about any image."
191+
)
192+
combined_prompt = prompt + "<system_prompt>" + system_prompt
193+
194+
predictions = []
195+
for image in images:
196+
result = client.infer_lmm(
197+
inference_input=image.base64_image,
198+
model_id=model_version,
199+
prompt=combined_prompt,
200+
model_id_in_path=True,
201+
enable_thinking=False,
202+
max_new_tokens=max_new_tokens,
203+
)
204+
response_text = result.get("response", result)
205+
predictions.append({"parsed_output": response_text})
206+
207+
return predictions
208+
209+
def run_locally(
210+
self,
211+
images: Batch[WorkflowImageData],
212+
model_version: str,
213+
prompt: Optional[str],
214+
system_prompt: Optional[str],
215+
max_new_tokens: Optional[int] = None,
216+
) -> BlockResult:
217+
inference_images = [
218+
i.to_inference_format(numpy_preferred=False) for i in images
219+
]
220+
prompt = prompt or "Describe what's in this image."
221+
system_prompt = system_prompt or "You are a helpful assistant."
222+
prompts = [prompt + "<system_prompt>" + system_prompt] * len(inference_images)
223+
self._model_manager.add_model(model_id=model_version, api_key=self._api_key)
224+
225+
predictions = []
226+
for image, single_prompt in zip(inference_images, prompts):
227+
request_kwargs = dict(
228+
api_key=self._api_key,
229+
model_id=model_version,
230+
image=image,
231+
source="workflow-execution",
232+
prompt=single_prompt,
233+
enable_thinking=False,
234+
)
235+
if max_new_tokens is not None:
236+
request_kwargs["max_new_tokens"] = max_new_tokens
237+
request = LMMInferenceRequest(**request_kwargs)
238+
prediction = self._model_manager.infer_from_request_sync(
239+
model_id=model_version, request=request
240+
)
241+
response_text = prediction.response
242+
predictions.append({"parsed_output": response_text})
243+
return predictions

inference/models/utils.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1001,6 +1001,7 @@ def get_roboflow_model(*args, **kwargs):
10011001
for variant in [
10021002
"qwen3_5-0.8b",
10031003
"qwen3_5-2b",
1004+
"qwen3_5-4b",
10041005
"qwen3_5-0.8b-peft",
10051006
"qwen3_5-2b-peft",
10061007
]:

inference_models/docs/models/qwen35.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ Qwen3.5 pre-trained models are available and do **not** require a Roboflow API k
2727
|----------|-------------|
2828
| `qwen3_5-0.8b` | 0.8B parameter model - compact and efficient |
2929
| `qwen3_5-2b` | 2B parameter model - better accuracy |
30+
| `qwen3_5-4b` | 4B parameter model - highest accuracy |
3031

3132
You can also use fine-tuned models from Roboflow by specifying `project/version` as the model ID (requires API key).
3233

inference_models/inference_models/models/qwen3_5/qwen3_5_hf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,8 @@ def pre_process_generation(
181181
enable_thinking: bool = False,
182182
**kwargs,
183183
) -> dict:
184+
if isinstance(images, np.ndarray):
185+
images = images[:, :, ::-1].copy()
184186
# Handle prompt and system prompt parsing logic from original implementation
185187
if prompt is None:
186188
prompt = "Describe what's in this image."

inference_models/inference_models/models/qwen3vl/qwen3vl_hf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,8 @@ def pre_process_generation(
204204
image_size: Optional[Tuple[int, int]] = None,
205205
**kwargs,
206206
) -> dict:
207+
if isinstance(images, np.ndarray):
208+
images = images[:, :, ::-1].copy()
207209
# Handle prompt and system prompt parsing logic from original implementation
208210
if prompt is None:
209211
prompt = "Describe what's in this image."

0 commit comments

Comments
 (0)