You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added GPT-5.4 mini along with correct reasoning level and fast mode.
Fast is translated to priority in the payload upstream. We also include handling now
around if clients send normal OpenAI service levels (flex/priority), and all endpoint
coverage tests surrounding it.
@@ -134,6 +136,12 @@ GPT-5 has a configurable amount of "effort" it can put into thinking, which may
134
136
-`--reasoning-summary` (choice of auto,concise,detailed,none)<br>
135
137
Models like GPT-5 do not return raw thinking content, but instead return thinking summaries. These can also be customised by you.
136
138
139
+
### Fast mode / Service tier
140
+
141
+
-`--service-tier` (choice of fast)<br>
142
+
ChatMock can forward a default `service_tier` to the upstream ChatGPT/Codex backend. This mirrors Codex Fast mode, where `fast` requests the faster tier. You can also override the default per request by sending `"service_tier": "fast"` in either the OpenAI-compatible or Ollama-compatible request body.<br>
143
+
This is also configurable through `CHATGPT_LOCAL_SERVICE_TIER`. ChatMock translates `fast` to the upstream tier name internally, but only forwards it for `gpt-5.4`. `gpt-5.4-mini` and Codex-family models fall back to normal mode. For client compatibility, request values like `"auto"`, `"default"`, and `"flex"` are also treated as normal mode and are not forwarded upstream.
144
+
137
145
### OpenAI Tools
138
146
139
147
-`--enable-web-search`<br>
@@ -160,7 +168,7 @@ You can enable it by starting the server with this parameter, which will allow O
160
168
If your preferred app doesn’t support selecting reasoning effort, or you just want a simpler approach, this parameter exposes each reasoning level as a separate, queryable model. Each reasoning level also appears individually under /v1/models, so model pickers in your favorite chat apps will list all reasoning options as distinct models you can switch between.
161
169
162
170
## Notes
163
-
If you wish to have the fastest responses, I'd recommend setting `--reasoning-effort` to low, and `--reasoning-summary` to none. <br>
171
+
If you wish to have the fastest responses, I'd recommend setting `--reasoning-effort` to low, `--reasoning-summary` to none, and enabling `--service-tier fast` on supported upstream combinations. <br>
164
172
All parameters and choices can be seen by sending `python chatmock.py serve --h`<br>
165
173
The context size of this route is also larger than what you get access to in the regular ChatGPT app.<br>
0 commit comments