You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: apps/site/docs/en/model-common-config.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -212,7 +212,7 @@ MIDSCENE_MODEL_FAMILY="gpt-5"
212
212
:::
213
213
214
214
:::note Model-native thinking
215
-
Midscene disables model-native thinking by default for the best execution speed and stability. To turn it on for any of the models above, set `MIDSCENE_MODEL_REASONING_ENABLED="true"`. Some families accept an extra knob — for example `MIDSCENE_MODEL_REASONING_BUDGET`(Qwen) or `MIDSCENE_MODEL_REASONING_EFFORT` (Doubao). See [Model-Native Thinking Mode](./model-strategy#model-native-thinking-mode).
215
+
Midscene disables model-native thinking by default for the best execution speed and stability. To turn it on for any of the models above, set `MIDSCENE_MODEL_REASONING_ENABLED="true"`. Some families accept extra controls such as `MIDSCENE_MODEL_REASONING_BUDGET`and `MIDSCENE_MODEL_REASONING_EFFORT`. See [Model-Native Thinking Mode](./model-strategy#model-native-thinking-mode).
Copy file name to clipboardExpand all lines: apps/site/docs/en/model-strategy.mdx
+8-3Lines changed: 8 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -139,7 +139,7 @@ Note: The implementation behind `deepThink` may change in the future as Midscene
139
139
140
140
Midscene disables model-native thinking by default to get better execution speed and stability. If the model itself does not support disabling native thinking, Midscene reduces model-native thinking as much as possible by controlling thinking granularity or limiting the thinking budget.
141
141
142
-
You can control model-native thinking with the following reasoning settings.
142
+
You can control model-native thinking with the following reasoning settings. These are unified Midscene-level abstractions, and the parameters actually sent to the model are mapped by model family.
143
143
144
144
-`MIDSCENE_MODEL_REASONING_ENABLED`: Explicitly controls whether model-native thinking is enabled.
@@ -150,16 +150,21 @@ You can control model-native thinking with the following reasoning settings.
150
150
- Doubao: corresponds to `thinking.type` in the Doubao docs.
151
151
- Zhipu GLM: corresponds to `thinking.type` in the Zhipu docs.
152
152
- GPT-5: corresponds to `reasoning_effort` in the OpenAI docs. Midscene uses `medium` when enabled and `none` when disabled.
153
+
- Gemini: when no Gemini-specific knob is configured, Midscene uses OpenAI-compatible `reasoning_effort`, with `medium` when enabled and `minimal` when disabled. When Gemini-specific knobs are configured, Midscene uses them first.
153
154
- Kimi: corresponds to `thinking.type` in the Kimi docs.
154
155
- Xiaomi Mimo: corresponds to `thinking.type` in the Xiaomi Mimo docs.
155
156
-`MIDSCENE_MODEL_REASONING_BUDGET`: Controls the model's thinking budget. The following models currently support this setting:
156
157
- Qwen: corresponds to `thinking_budget` in the Qwen docs.
158
+
- Gemini 2.5 series: corresponds to `thinking_config.thinking_budget` in the Gemini docs.
157
159
-`MIDSCENE_MODEL_REASONING_EFFORT`: Controls the model's thinking effort. The following models currently support this setting:
158
160
- Doubao: corresponds to `reasoning_effort` in the Doubao docs.
159
-
- Gemini: corresponds to `thinking_config.thinking_level` in the Gemini docs.
161
+
- Gemini 3.x series: corresponds to `thinking_config.thinking_level` in the Gemini docs.
160
162
- GPT-5: corresponds to `reasoning_effort` in the OpenAI docs.
161
163
162
-
Note: Different model providers use different parameters to control model-native thinking. For specific values and supported model versions, see the official docs from each model provider. If an explicit reasoning setting is not supported by the current model, Midscene ignores that setting instead of guessing provider-specific private parameters.
164
+
Note:
165
+
166
+
- Different model providers use different parameters to control model-native thinking. For specific values and supported model versions, see the official docs from each model provider. If an explicit reasoning setting is not supported by the current model, Midscene ignores that setting instead of guessing provider-specific private parameters.
167
+
- For Gemini, the OpenAI-compatible `reasoning_effort` and the Gemini-specific `thinking_config` cannot be used together. Gemini only generates thinking summaries through `include_thoughts` inside `thinking_config`, so Midscene sends `include_thoughts` for Gemini only when `MIDSCENE_MODEL_REASONING_BUDGET` or `MIDSCENE_MODEL_REASONING_EFFORT` is explicitly configured.
163
168
164
169
### "MIDSCENE_MODEL_FAMILY is not set to a multimodal model" error
@@ -107,7 +110,7 @@ When DATA_DEMAND is a JSON object, the keys in your response must exactly match
107
110
108
111
109
112
Return in the following XML format:
110
-
<thought>the thinking process of the extraction, less than 300 words. Use ${preferredLanguage} in this field.</thought>
113
+
<observation>brief evidence observed for the extraction, less than 300 words. Use ${preferredLanguage} in this field.</observation>
111
114
<data-json>the extracted data as JSON. Make sure both the value and scheme meet the DATA_DEMAND. If you want to write some description in this field, use the same language as the DATA_DEMAND.</data-json>
112
115
<errors>optional error messages as JSON array, e.g., ["error1", "error2"]</errors>
113
116
@@ -124,7 +127,7 @@ For example, if the DATA_DEMAND is:
124
127
125
128
By viewing the screenshot and page contents, you can extract the following data:
126
129
127
-
<thought>According to the screenshot, i can see ...</thought>
130
+
<observation>According to the screenshot, i can see ...</observation>
128
131
<data-json>
129
132
{
130
133
"name": "John",
@@ -142,7 +145,7 @@ the todo items list, string[]
142
145
143
146
By viewing the screenshot and page contents, you can extract the following data:
144
147
145
-
<thought>According to the screenshot, i can see ...</thought>
148
+
<observation>According to the screenshot, i can see ...</observation>
146
149
<data-json>
147
150
["todo 1", "todo 2", "todo 3"]
148
151
</data-json>
@@ -156,7 +159,7 @@ the page title, string
156
159
157
160
By viewing the screenshot and page contents, you can extract the following data:
158
161
159
-
<thought>According to the screenshot, i can see ...</thought>
162
+
<observation>According to the screenshot, i can see ...</observation>
160
163
<data-json>
161
164
"todo list"
162
165
</data-json>
@@ -172,7 +175,7 @@ If the DATA_DEMAND is:
172
175
173
176
By viewing the screenshot and page contents, you can extract the following data:
174
177
175
-
<thought>According to the screenshot, i can see ...</thought>
178
+
<observation>According to the screenshot, i can see ...</observation>
0 commit comments