Skip to content

Commit 68e5fe4

Browse files
author
Ra's al Ghul
committed
feat: Add GPT-5.4 mini and fast support.
Added GPT-5.4 mini along with correct reasoning level and fast mode. Fast is translated to priority in the payload upstream. We also include handling now around if clients send normal OpenAI service levels (flex/priority), and all endpoint coverage tests surrounding it.
1 parent d8ba913 commit 68e5fe4

13 files changed

Lines changed: 408 additions & 10 deletions

DOCKER.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Set options in `.env` or pass environment variables:
2424
- `CHATGPT_LOCAL_REASONING_EFFORT`: minimal|low|medium|high|xhigh
2525
- `CHATGPT_LOCAL_REASONING_SUMMARY`: auto|concise|detailed|none
2626
- `CHATGPT_LOCAL_REASONING_COMPAT`: legacy|o3|think-tags|current
27+
- `CHATGPT_LOCAL_SERVICE_TIER`: fast to set the default upstream service tier / Fast mode
2728
- `CHATGPT_LOCAL_DEBUG_MODEL`: force model override (e.g., `gpt-5.4`)
2829
- `CHATGPT_LOCAL_CLIENT_ID`: OAuth client id override (rarely needed)
2930
- `CHATGPT_LOCAL_EXPOSE_REASONING_MODELS`: `true|false` to add reasoning model variants to `/v1/models`

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ curl http://127.0.0.1:8000/v1/chat/completions \
101101
- Vision/Image understanding
102102
- Thinking summaries (through thinking tags)
103103
- Thinking effort
104+
- Fast mode / service tier
104105

105106
## Notes & Limits
106107

@@ -110,6 +111,7 @@ curl http://127.0.0.1:8000/v1/chat/completions \
110111

111112
# Supported models
112113
- `gpt-5.4`
114+
- `gpt-5.4-mini`
113115
- `gpt-5.2`
114116
- `gpt-5.1`
115117
- `gpt-5`
@@ -134,6 +136,12 @@ GPT-5 has a configurable amount of "effort" it can put into thinking, which may
134136
- `--reasoning-summary` (choice of auto,concise,detailed,none)<br>
135137
Models like GPT-5 do not return raw thinking content, but instead return thinking summaries. These can also be customised by you.
136138

139+
### Fast mode / Service tier
140+
141+
- `--service-tier` (choice of fast)<br>
142+
ChatMock can forward a default `service_tier` to the upstream ChatGPT/Codex backend. This mirrors Codex Fast mode, where `fast` requests the faster tier. You can also override the default per request by sending `"service_tier": "fast"` in either the OpenAI-compatible or Ollama-compatible request body.<br>
143+
This is also configurable through `CHATGPT_LOCAL_SERVICE_TIER`. ChatMock translates `fast` to the upstream tier name internally. For client compatibility, request values like `"auto"`, `"default"`, and `"flex"` are treated as normal mode and are not forwarded upstream.
144+
137145
### OpenAI Tools
138146

139147
- `--enable-web-search`<br>
@@ -160,7 +168,7 @@ You can enable it by starting the server with this parameter, which will allow O
160168
If your preferred app doesn’t support selecting reasoning effort, or you just want a simpler approach, this parameter exposes each reasoning level as a separate, queryable model. Each reasoning level also appears individually under ⁠/v1/models, so model pickers in your favorite chat apps will list all reasoning options as distinct models you can switch between.
161169

162170
## Notes
163-
If you wish to have the fastest responses, I'd recommend setting `--reasoning-effort` to low, and `--reasoning-summary` to none. <br>
171+
If you wish to have the fastest responses, I'd recommend setting `--reasoning-effort` to low, `--reasoning-summary` to none, and enabling `--service-tier fast` on supported upstream combinations. <br>
164172
All parameters and choices can be seen by sending `python chatmock.py serve --h`<br>
165173
The context size of this route is also larger than what you get access to in the regular ChatGPT app.<br>
166174

chatmock/app.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from .http import build_cors_headers
77
from .routes_openai import openai_bp
88
from .routes_ollama import ollama_bp
9+
from .service_tier import normalize_service_tier
910

1011

1112
def create_app(
@@ -14,6 +15,7 @@ def create_app(
1415
reasoning_effort: str = "medium",
1516
reasoning_summary: str = "auto",
1617
reasoning_compat: str = "think-tags",
18+
service_tier: str | None = None,
1719
debug_model: str | None = None,
1820
expose_reasoning_models: bool = False,
1921
default_web_search: bool = False,
@@ -26,6 +28,7 @@ def create_app(
2628
REASONING_EFFORT=reasoning_effort,
2729
REASONING_SUMMARY=reasoning_summary,
2830
REASONING_COMPAT=reasoning_compat,
31+
SERVICE_TIER=normalize_service_tier(service_tier),
2932
DEBUG_MODEL=debug_model,
3033
BASE_INSTRUCTIONS=BASE_INSTRUCTIONS,
3134
GPT5_CODEX_INSTRUCTIONS=GPT5_CODEX_INSTRUCTIONS,

chatmock/cli.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from .config import CLIENT_ID_DEFAULT
1313
from .limits import RateLimitWindow, compute_reset_at, load_rate_limit_snapshot
1414
from .oauth import OAuthHTTPServer, OAuthHandler, REQUIRED_PORT, URL_BASE
15+
from .service_tier import normalize_service_tier
1516
from .utils import eprint, get_home_dir, load_chatgpt_tokens, parse_jwt_claims, read_auth_file
1617

1718

@@ -267,6 +268,7 @@ def cmd_serve(
267268
reasoning_effort: str,
268269
reasoning_summary: str,
269270
reasoning_compat: str,
271+
service_tier: str | None,
270272
debug_model: str | None,
271273
expose_reasoning_models: bool,
272274
default_web_search: bool,
@@ -277,6 +279,7 @@ def cmd_serve(
277279
reasoning_effort=reasoning_effort,
278280
reasoning_summary=reasoning_summary,
279281
reasoning_compat=reasoning_compat,
282+
service_tier=service_tier,
280283
debug_model=debug_model,
281284
expose_reasoning_models=expose_reasoning_models,
282285
default_web_search=default_web_search,
@@ -330,6 +333,15 @@ def main() -> None:
330333
"'current' is accepted as an alias for 'legacy'"
331334
),
332335
)
336+
p_serve.add_argument(
337+
"--service-tier",
338+
choices=["fast"],
339+
default=normalize_service_tier(os.getenv("CHATGPT_LOCAL_SERVICE_TIER")),
340+
help=(
341+
"Default service tier for upstream ChatGPT requests. "
342+
"Set to 'fast' for Codex-style Fast mode."
343+
),
344+
)
333345
p_serve.add_argument(
334346
"--expose-reasoning-models",
335347
action="store_true",
@@ -366,6 +378,7 @@ def main() -> None:
366378
reasoning_effort=args.reasoning_effort,
367379
reasoning_summary=args.reasoning_summary,
368380
reasoning_compat=args.reasoning_compat,
381+
service_tier=args.service_tier,
369382
debug_model=args.debug_model,
370383
expose_reasoning_models=args.expose_reasoning_models,
371384
default_web_search=args.enable_web_search,

chatmock/model_registry.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,13 @@ class ModelSpec:
4747
allowed_efforts=frozenset(("none", "low", "medium", "high", "xhigh")),
4848
variant_efforts=("xhigh", "high", "medium", "low", "none"),
4949
),
50+
ModelSpec(
51+
public_id="gpt-5.4-mini",
52+
upstream_id="gpt-5.4-mini",
53+
aliases=("gpt5.4-mini", "gpt-5.4-mini-latest"),
54+
allowed_efforts=frozenset(("none", "low", "medium", "high", "xhigh")),
55+
variant_efforts=("xhigh", "high", "medium", "low", "none"),
56+
),
5057
ModelSpec(
5158
public_id="gpt-5.3-codex",
5259
upstream_id="gpt-5.3-codex",

chatmock/routes_ollama.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
build_reasoning_param,
1717
extract_reasoning_from_model_name,
1818
)
19+
from .service_tier import resolve_service_tier, service_tier_error_message
1920
from .transform import convert_ollama_messages, normalize_ollama_tools
2021
from .upstream import normalize_model_name, start_upstream_request
2122
from .utils import convert_chat_messages_to_responses_input, convert_tools_chat_to_responses
@@ -187,6 +188,16 @@ def ollama_chat() -> Response:
187188
_log_json("OUT POST /api/chat", err)
188189
return jsonify(err), 400
189190

191+
service_tier, invalid_service_tier = resolve_service_tier(
192+
payload.get("service_tier"),
193+
current_app.config.get("SERVICE_TIER"),
194+
)
195+
if invalid_service_tier:
196+
err = {"error": service_tier_error_message()}
197+
if verbose:
198+
_log_json("OUT POST /api/chat", err)
199+
return jsonify(err), 400
200+
190201
model = payload.get("model")
191202
raw_messages = payload.get("messages")
192203
messages = convert_ollama_messages(
@@ -267,6 +278,7 @@ def ollama_chat() -> Response:
267278
model_reasoning,
268279
allowed_efforts=allowed_efforts_for_model(model),
269280
),
281+
service_tier=service_tier,
270282
)
271283
if error_resp is not None:
272284
if verbose:
@@ -307,6 +319,7 @@ def ollama_chat() -> Response:
307319
model_reasoning,
308320
allowed_efforts=allowed_efforts_for_model(model),
309321
),
322+
service_tier=service_tier,
310323
)
311324
record_rate_limits_from_response(upstream2)
312325
if err2 is None and upstream2 is not None and upstream2.status_code < 400:
@@ -558,7 +571,7 @@ def _gen():
558571
full_text = f"<think>{rtxt}</think>" + (full_text or "")
559572

560573
out_json = {
561-
"model": normalize_model_name(model),
574+
"model": model_out,
562575
"created_at": created_at,
563576
"message": {"role": "assistant", "content": full_text, **({"tool_calls": tool_calls} if tool_calls else {})},
564577
"done": True,

chatmock/routes_openai.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
build_reasoning_param,
1717
extract_reasoning_from_model_name,
1818
)
19+
from .service_tier import resolve_service_tier, service_tier_error_message
1920
from .upstream import normalize_model_name, start_upstream_request
2021
from .utils import (
2122
convert_chat_messages_to_responses_input,
@@ -93,6 +94,16 @@ def chat_completions() -> Response:
9394
_log_json("OUT POST /v1/chat/completions", err)
9495
return jsonify(err), 400
9596

97+
service_tier, invalid_service_tier = resolve_service_tier(
98+
payload.get("service_tier"),
99+
current_app.config.get("SERVICE_TIER"),
100+
)
101+
if invalid_service_tier:
102+
err = {"error": {"message": service_tier_error_message()}}
103+
if verbose:
104+
_log_json("OUT POST /v1/chat/completions", err)
105+
return jsonify(err), 400
106+
96107
requested_model = payload.get("model")
97108
model = normalize_model_name(requested_model, debug_model)
98109
messages = payload.get("messages")
@@ -187,6 +198,7 @@ def chat_completions() -> Response:
187198
tool_choice=tool_choice,
188199
parallel_tool_calls=parallel_tool_calls,
189200
reasoning_param=reasoning_param,
201+
service_tier=service_tier,
190202
)
191203
if error_resp is not None:
192204
if verbose:
@@ -224,6 +236,7 @@ def chat_completions() -> Response:
224236
tool_choice=safe_choice,
225237
parallel_tool_calls=parallel_tool_calls,
226238
reasoning_param=reasoning_param,
239+
service_tier=service_tier,
227240
)
228241
record_rate_limits_from_response(upstream2)
229242
if err2 is None and upstream2 is not None and upstream2.status_code < 400:
@@ -391,6 +404,16 @@ def completions() -> Response:
391404
_log_json("OUT POST /v1/completions", err)
392405
return jsonify(err), 400
393406

407+
service_tier, invalid_service_tier = resolve_service_tier(
408+
payload.get("service_tier"),
409+
current_app.config.get("SERVICE_TIER"),
410+
)
411+
if invalid_service_tier:
412+
err = {"error": {"message": service_tier_error_message()}}
413+
if verbose:
414+
_log_json("OUT POST /v1/completions", err)
415+
return jsonify(err), 400
416+
394417
requested_model = payload.get("model")
395418
model = normalize_model_name(requested_model, debug_model)
396419
prompt = payload.get("prompt")
@@ -418,6 +441,7 @@ def completions() -> Response:
418441
input_items,
419442
instructions=_instructions_for_model(model),
420443
reasoning_param=reasoning_param,
444+
service_tier=service_tier,
421445
)
422446
if error_resp is not None:
423447
if verbose:

chatmock/service_tier.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
from __future__ import annotations
2+
3+
from typing import Any
4+
5+
6+
VALID_SERVICE_TIERS = ("fast",)
7+
_VALID_SERVICE_TIERS = frozenset(VALID_SERVICE_TIERS)
8+
_NORMAL_SERVICE_TIERS = frozenset(("auto", "default", "flex"))
9+
10+
11+
def normalize_service_tier(value: Any) -> str | None:
12+
if not isinstance(value, str):
13+
return None
14+
tier = value.strip().lower()
15+
if tier == "priority":
16+
return "fast"
17+
if tier in _VALID_SERVICE_TIERS:
18+
return tier
19+
return None
20+
21+
22+
def parse_service_tier(value: Any) -> tuple[str | None, bool]:
23+
if value is None:
24+
return None, False
25+
if isinstance(value, str) and value.strip().lower() in _NORMAL_SERVICE_TIERS:
26+
return None, False
27+
tier = normalize_service_tier(value)
28+
return tier, tier is None
29+
30+
31+
def resolve_service_tier(request_value: Any, default_value: Any) -> tuple[str | None, bool]:
32+
if isinstance(request_value, str) and request_value.strip().lower() in _NORMAL_SERVICE_TIERS:
33+
return None, False
34+
request_tier, invalid = parse_service_tier(request_value)
35+
if invalid:
36+
return None, True
37+
return request_tier or normalize_service_tier(default_value), False
38+
39+
40+
def upstream_service_tier(value: Any) -> str | None:
41+
tier = normalize_service_tier(value)
42+
if tier == "fast":
43+
return "priority"
44+
return tier
45+
46+
47+
def service_tier_error_message() -> str:
48+
return "Invalid service_tier. Expected: fast"

chatmock/upstream.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
from .http import build_cors_headers
1212
from .model_registry import normalize_model_name
1313
from .session import ensure_session_id
14+
from .service_tier import upstream_service_tier
1415
from flask import request as flask_request
1516
from .utils import get_effective_chatgpt_auth
1617

@@ -33,6 +34,7 @@ def start_upstream_request(
3334
tool_choice: Any | None = None,
3435
parallel_tool_calls: bool = False,
3536
reasoning_param: Dict[str, Any] | None = None,
37+
service_tier: str | None = None,
3638
):
3739
access_token, account_id = get_effective_chatgpt_auth()
3840
if not access_token or not account_id:
@@ -81,6 +83,9 @@ def start_upstream_request(
8183

8284
if reasoning_param is not None:
8385
responses_payload["reasoning"] = reasoning_param
86+
upstream_tier = upstream_service_tier(service_tier)
87+
if upstream_tier is not None:
88+
responses_payload["service_tier"] = upstream_tier
8489

8590
verbose = False
8691
try:

0 commit comments

Comments
 (0)