Skip to content

Commit 2935154

Browse files
author
Ra's al Ghul
committed
feat: Add GPT-5.4 mini and fast support.
Added GPT-5.4 mini along with correct reasoning level and fast mode. Fast is translated to priority in the payload upstream. We also include handling now around if clients send normal OpenAI service levels (flex/priority), and all endpoint coverage tests surrounding it.
1 parent d8ba913 commit 2935154

13 files changed

Lines changed: 515 additions & 8 deletions

DOCKER.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Set options in `.env` or pass environment variables:
2424
- `CHATGPT_LOCAL_REASONING_EFFORT`: minimal|low|medium|high|xhigh
2525
- `CHATGPT_LOCAL_REASONING_SUMMARY`: auto|concise|detailed|none
2626
- `CHATGPT_LOCAL_REASONING_COMPAT`: legacy|o3|think-tags|current
27+
- `CHATGPT_LOCAL_SERVICE_TIER`: fast to set the default upstream service tier / Fast mode
2728
- `CHATGPT_LOCAL_DEBUG_MODEL`: force model override (e.g., `gpt-5.4`)
2829
- `CHATGPT_LOCAL_CLIENT_ID`: OAuth client id override (rarely needed)
2930
- `CHATGPT_LOCAL_EXPOSE_REASONING_MODELS`: `true|false` to add reasoning model variants to `/v1/models`

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ curl http://127.0.0.1:8000/v1/chat/completions \
101101
- Vision/Image understanding
102102
- Thinking summaries (through thinking tags)
103103
- Thinking effort
104+
- Fast mode / service tier
104105

105106
## Notes & Limits
106107

@@ -110,6 +111,7 @@ curl http://127.0.0.1:8000/v1/chat/completions \
110111

111112
# Supported models
112113
- `gpt-5.4`
114+
- `gpt-5.4-mini`
113115
- `gpt-5.2`
114116
- `gpt-5.1`
115117
- `gpt-5`
@@ -134,6 +136,12 @@ GPT-5 has a configurable amount of "effort" it can put into thinking, which may
134136
- `--reasoning-summary` (choice of auto,concise,detailed,none)<br>
135137
Models like GPT-5 do not return raw thinking content, but instead return thinking summaries. These can also be customised by you.
136138

139+
### Fast mode / Service tier
140+
141+
- `--service-tier` (choice of fast)<br>
142+
ChatMock can forward a default `service_tier` to the upstream ChatGPT/Codex backend. This mirrors Codex Fast mode, where `fast` requests the faster tier. You can also override the default per request by sending `"service_tier": "fast"` in either the OpenAI-compatible or Ollama-compatible request body.<br>
143+
This is also configurable through `CHATGPT_LOCAL_SERVICE_TIER`. ChatMock translates `fast` to the upstream tier name internally, but only forwards it for `gpt-5.4`. `gpt-5.4-mini` and Codex-family models fall back to normal mode. For client compatibility, request values like `"auto"`, `"default"`, and `"flex"` are also treated as normal mode and are not forwarded upstream.
144+
137145
### OpenAI Tools
138146

139147
- `--enable-web-search`<br>
@@ -160,7 +168,7 @@ You can enable it by starting the server with this parameter, which will allow O
160168
If your preferred app doesn’t support selecting reasoning effort, or you just want a simpler approach, this parameter exposes each reasoning level as a separate, queryable model. Each reasoning level also appears individually under ⁠/v1/models, so model pickers in your favorite chat apps will list all reasoning options as distinct models you can switch between.
161169

162170
## Notes
163-
If you wish to have the fastest responses, I'd recommend setting `--reasoning-effort` to low, and `--reasoning-summary` to none. <br>
171+
If you wish to have the fastest responses, I'd recommend setting `--reasoning-effort` to low, `--reasoning-summary` to none, and enabling `--service-tier fast` on supported upstream combinations. <br>
164172
All parameters and choices can be seen by sending `python chatmock.py serve --h`<br>
165173
The context size of this route is also larger than what you get access to in the regular ChatGPT app.<br>
166174

chatmock/app.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from .http import build_cors_headers
77
from .routes_openai import openai_bp
88
from .routes_ollama import ollama_bp
9+
from .service_tier import normalize_service_tier
910

1011

1112
def create_app(
@@ -14,6 +15,7 @@ def create_app(
1415
reasoning_effort: str = "medium",
1516
reasoning_summary: str = "auto",
1617
reasoning_compat: str = "think-tags",
18+
service_tier: str | None = None,
1719
debug_model: str | None = None,
1820
expose_reasoning_models: bool = False,
1921
default_web_search: bool = False,
@@ -26,6 +28,7 @@ def create_app(
2628
REASONING_EFFORT=reasoning_effort,
2729
REASONING_SUMMARY=reasoning_summary,
2830
REASONING_COMPAT=reasoning_compat,
31+
SERVICE_TIER=normalize_service_tier(service_tier),
2932
DEBUG_MODEL=debug_model,
3033
BASE_INSTRUCTIONS=BASE_INSTRUCTIONS,
3134
GPT5_CODEX_INSTRUCTIONS=GPT5_CODEX_INSTRUCTIONS,

chatmock/cli.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from .config import CLIENT_ID_DEFAULT
1313
from .limits import RateLimitWindow, compute_reset_at, load_rate_limit_snapshot
1414
from .oauth import OAuthHTTPServer, OAuthHandler, REQUIRED_PORT, URL_BASE
15+
from .service_tier import normalize_service_tier
1516
from .utils import eprint, get_home_dir, load_chatgpt_tokens, parse_jwt_claims, read_auth_file
1617

1718

@@ -267,6 +268,7 @@ def cmd_serve(
267268
reasoning_effort: str,
268269
reasoning_summary: str,
269270
reasoning_compat: str,
271+
service_tier: str | None,
270272
debug_model: str | None,
271273
expose_reasoning_models: bool,
272274
default_web_search: bool,
@@ -277,6 +279,7 @@ def cmd_serve(
277279
reasoning_effort=reasoning_effort,
278280
reasoning_summary=reasoning_summary,
279281
reasoning_compat=reasoning_compat,
282+
service_tier=service_tier,
280283
debug_model=debug_model,
281284
expose_reasoning_models=expose_reasoning_models,
282285
default_web_search=default_web_search,
@@ -330,6 +333,15 @@ def main() -> None:
330333
"'current' is accepted as an alias for 'legacy'"
331334
),
332335
)
336+
p_serve.add_argument(
337+
"--service-tier",
338+
choices=["fast"],
339+
default=normalize_service_tier(os.getenv("CHATGPT_LOCAL_SERVICE_TIER")),
340+
help=(
341+
"Default service tier for upstream ChatGPT requests. "
342+
"Set to 'fast' for Codex-style Fast mode."
343+
),
344+
)
333345
p_serve.add_argument(
334346
"--expose-reasoning-models",
335347
action="store_true",
@@ -366,6 +378,7 @@ def main() -> None:
366378
reasoning_effort=args.reasoning_effort,
367379
reasoning_summary=args.reasoning_summary,
368380
reasoning_compat=args.reasoning_compat,
381+
service_tier=args.service_tier,
369382
debug_model=args.debug_model,
370383
expose_reasoning_models=args.expose_reasoning_models,
371384
default_web_search=args.enable_web_search,

chatmock/model_registry.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,13 @@ class ModelSpec:
4747
allowed_efforts=frozenset(("none", "low", "medium", "high", "xhigh")),
4848
variant_efforts=("xhigh", "high", "medium", "low", "none"),
4949
),
50+
ModelSpec(
51+
public_id="gpt-5.4-mini",
52+
upstream_id="gpt-5.4-mini",
53+
aliases=("gpt5.4-mini", "gpt-5.4-mini-latest"),
54+
allowed_efforts=frozenset(("none", "low", "medium", "high", "xhigh")),
55+
variant_efforts=("xhigh", "high", "medium", "low", "none"),
56+
),
5057
ModelSpec(
5158
public_id="gpt-5.3-codex",
5259
upstream_id="gpt-5.3-codex",

chatmock/routes_ollama.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
build_reasoning_param,
1717
extract_reasoning_from_model_name,
1818
)
19+
from .service_tier import effective_service_tier_for_model, resolve_service_tier, service_tier_error_message
1920
from .transform import convert_ollama_messages, normalize_ollama_tools
2021
from .upstream import normalize_model_name, start_upstream_request
2122
from .utils import convert_chat_messages_to_responses_input, convert_tools_chat_to_responses
@@ -187,7 +188,18 @@ def ollama_chat() -> Response:
187188
_log_json("OUT POST /api/chat", err)
188189
return jsonify(err), 400
189190

191+
service_tier, invalid_service_tier = resolve_service_tier(
192+
payload.get("service_tier"),
193+
current_app.config.get("SERVICE_TIER"),
194+
)
195+
if invalid_service_tier:
196+
err = {"error": service_tier_error_message()}
197+
if verbose:
198+
_log_json("OUT POST /api/chat", err)
199+
return jsonify(err), 400
200+
190201
model = payload.get("model")
202+
service_tier = effective_service_tier_for_model(model, service_tier)
191203
raw_messages = payload.get("messages")
192204
messages = convert_ollama_messages(
193205
raw_messages, payload.get("images") if isinstance(payload.get("images"), list) else None
@@ -267,6 +279,7 @@ def ollama_chat() -> Response:
267279
model_reasoning,
268280
allowed_efforts=allowed_efforts_for_model(model),
269281
),
282+
service_tier=service_tier,
270283
)
271284
if error_resp is not None:
272285
if verbose:
@@ -307,6 +320,7 @@ def ollama_chat() -> Response:
307320
model_reasoning,
308321
allowed_efforts=allowed_efforts_for_model(model),
309322
),
323+
service_tier=service_tier,
310324
)
311325
record_rate_limits_from_response(upstream2)
312326
if err2 is None and upstream2 is not None and upstream2.status_code < 400:
@@ -558,7 +572,7 @@ def _gen():
558572
full_text = f"<think>{rtxt}</think>" + (full_text or "")
559573

560574
out_json = {
561-
"model": normalize_model_name(model),
575+
"model": model_out,
562576
"created_at": created_at,
563577
"message": {"role": "assistant", "content": full_text, **({"tool_calls": tool_calls} if tool_calls else {})},
564578
"done": True,

chatmock/routes_openai.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
build_reasoning_param,
1717
extract_reasoning_from_model_name,
1818
)
19+
from .service_tier import effective_service_tier_for_model, resolve_service_tier, service_tier_error_message
1920
from .upstream import normalize_model_name, start_upstream_request
2021
from .utils import (
2122
convert_chat_messages_to_responses_input,
@@ -93,8 +94,19 @@ def chat_completions() -> Response:
9394
_log_json("OUT POST /v1/chat/completions", err)
9495
return jsonify(err), 400
9596

97+
service_tier, invalid_service_tier = resolve_service_tier(
98+
payload.get("service_tier"),
99+
current_app.config.get("SERVICE_TIER"),
100+
)
101+
if invalid_service_tier:
102+
err = {"error": {"message": service_tier_error_message()}}
103+
if verbose:
104+
_log_json("OUT POST /v1/chat/completions", err)
105+
return jsonify(err), 400
106+
96107
requested_model = payload.get("model")
97108
model = normalize_model_name(requested_model, debug_model)
109+
service_tier = effective_service_tier_for_model(model, service_tier)
98110
messages = payload.get("messages")
99111
if messages is None and isinstance(payload.get("prompt"), str):
100112
messages = [{"role": "user", "content": payload.get("prompt") or ""}]
@@ -187,6 +199,7 @@ def chat_completions() -> Response:
187199
tool_choice=tool_choice,
188200
parallel_tool_calls=parallel_tool_calls,
189201
reasoning_param=reasoning_param,
202+
service_tier=service_tier,
190203
)
191204
if error_resp is not None:
192205
if verbose:
@@ -224,6 +237,7 @@ def chat_completions() -> Response:
224237
tool_choice=safe_choice,
225238
parallel_tool_calls=parallel_tool_calls,
226239
reasoning_param=reasoning_param,
240+
service_tier=service_tier,
227241
)
228242
record_rate_limits_from_response(upstream2)
229243
if err2 is None and upstream2 is not None and upstream2.status_code < 400:
@@ -391,8 +405,19 @@ def completions() -> Response:
391405
_log_json("OUT POST /v1/completions", err)
392406
return jsonify(err), 400
393407

408+
service_tier, invalid_service_tier = resolve_service_tier(
409+
payload.get("service_tier"),
410+
current_app.config.get("SERVICE_TIER"),
411+
)
412+
if invalid_service_tier:
413+
err = {"error": {"message": service_tier_error_message()}}
414+
if verbose:
415+
_log_json("OUT POST /v1/completions", err)
416+
return jsonify(err), 400
417+
394418
requested_model = payload.get("model")
395419
model = normalize_model_name(requested_model, debug_model)
420+
service_tier = effective_service_tier_for_model(model, service_tier)
396421
prompt = payload.get("prompt")
397422
if isinstance(prompt, list):
398423
prompt = "".join([p if isinstance(p, str) else "" for p in prompt])
@@ -418,6 +443,7 @@ def completions() -> Response:
418443
input_items,
419444
instructions=_instructions_for_model(model),
420445
reasoning_param=reasoning_param,
446+
service_tier=service_tier,
421447
)
422448
if error_resp is not None:
423449
if verbose:

chatmock/service_tier.py

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
from __future__ import annotations
2+
3+
from typing import Any
4+
5+
from .model_registry import normalize_model_name
6+
7+
8+
VALID_SERVICE_TIERS = ("fast",)
9+
_VALID_SERVICE_TIERS = frozenset(VALID_SERVICE_TIERS)
10+
_NORMAL_SERVICE_TIERS = frozenset(("auto", "default", "flex"))
11+
12+
13+
def normalize_service_tier(value: Any) -> str | None:
14+
if not isinstance(value, str):
15+
return None
16+
tier = value.strip().lower()
17+
if tier == "priority":
18+
return "fast"
19+
if tier in _VALID_SERVICE_TIERS:
20+
return tier
21+
return None
22+
23+
24+
def parse_service_tier(value: Any) -> tuple[str | None, bool]:
25+
if value is None:
26+
return None, False
27+
if isinstance(value, str) and value.strip().lower() in _NORMAL_SERVICE_TIERS:
28+
return None, False
29+
tier = normalize_service_tier(value)
30+
return tier, tier is None
31+
32+
33+
def resolve_service_tier(request_value: Any, default_value: Any) -> tuple[str | None, bool]:
34+
if isinstance(request_value, str) and request_value.strip().lower() in _NORMAL_SERVICE_TIERS:
35+
return None, False
36+
request_tier, invalid = parse_service_tier(request_value)
37+
if invalid:
38+
return None, True
39+
return request_tier or normalize_service_tier(default_value), False
40+
41+
42+
def effective_service_tier_for_model(model: str | None, value: Any) -> str | None:
43+
tier = normalize_service_tier(value)
44+
if tier != "fast":
45+
return tier
46+
if normalize_model_name(model) == "gpt-5.4":
47+
return tier
48+
return None
49+
50+
51+
def upstream_service_tier(model: str | None, value: Any) -> str | None:
52+
tier = effective_service_tier_for_model(model, value)
53+
if tier == "fast":
54+
return "priority"
55+
return tier
56+
57+
58+
def service_tier_error_message() -> str:
59+
return "Invalid service_tier. Expected: fast"

chatmock/upstream.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
from .http import build_cors_headers
1212
from .model_registry import normalize_model_name
1313
from .session import ensure_session_id
14+
from .service_tier import upstream_service_tier
1415
from flask import request as flask_request
1516
from .utils import get_effective_chatgpt_auth
1617

@@ -33,6 +34,7 @@ def start_upstream_request(
3334
tool_choice: Any | None = None,
3435
parallel_tool_calls: bool = False,
3536
reasoning_param: Dict[str, Any] | None = None,
37+
service_tier: str | None = None,
3638
):
3739
access_token, account_id = get_effective_chatgpt_auth()
3840
if not access_token or not account_id:
@@ -81,6 +83,9 @@ def start_upstream_request(
8183

8284
if reasoning_param is not None:
8385
responses_payload["reasoning"] = reasoning_param
86+
upstream_tier = upstream_service_tier(model, service_tier)
87+
if upstream_tier is not None:
88+
responses_payload["service_tier"] = upstream_tier
8489

8590
verbose = False
8691
try:

0 commit comments

Comments
 (0)