Preserve HTTPException responses in error handlers#491
Conversation
raullenchai
left a comment
There was a problem hiding this comment.
Thanks for the PR! Useful improvement — the OpenAI-style envelope is a real compatibility gap. Supply-chain audit is clean (2 files, no deps / CI / install hooks / network / exec / encoded payloads). A few changes needed before this can merge.
Blocking
1. Lint fails — ruff check reports 4 errors
tests/test_config_and_middleware.py:358 I001 — import block is unsorted
tests/test_config_and_middleware.py:416 W292 — no newline at end of file
vllm_mlx/server.py:40 I001 — import block is unsorted
vllm_mlx/server.py:404 I001 — import block is unsorted
All four are auto-fixable: ruff check --fix vllm_mlx/server.py tests/test_config_and_middleware.py.
2. ruff format --check would reformat both files
Run ruff format vllm_mlx/server.py tests/test_config_and_middleware.py after the lint fix above.
3. Test asserts on a copy of the handler, not the production handler
tests/test_config_and_middleware.py:357-398 (TestHTTPExceptionHandler._make_app_with_handler) defines a fresh FastAPI() app and registers a hand-copied duplicate of the production handler (lines 363-388 are a verbatim copy of vllm_mlx/server.py:407-432). The test then asserts on that copy, not the real handler.
This means if someone changes the production handler (adds a new status code, changes the envelope shape, drops param), this test still passes. The test is asserting that the test's own helper works.
Two clean fixes:
Option A — register the real handler on a test app:
from vllm_mlx.server import _http_exception_handler
def _make_app_with_handler(self):
app = FastAPI()
app.add_exception_handler(HTTPException, _http_exception_handler)
@app.get("/rate-limit")
async def rate_limit():
raise HTTPException(429, "Rate limit exceeded", headers={"Retry-After": "60"})
return appOption B — exercise an existing production route that raises HTTPException (e.g., one of the vllm_mlx/middleware/auth.py:155 401 paths) via TestClient(app) from vllm_mlx.server. This also covers middleware-raised exceptions.
I'd lean Option A — it's the smallest delta to your current code and pins the production handler directly. Either is acceptable.
NITs (nice to have, not blockers)
4. Import placement
The original from fastapi import FastAPI, Request was removed from vllm_mlx/server.py:46 and re-added at line ~132 (after from .tool_parsers import ToolParserManager). It works at runtime because app = FastAPI() is at line 365 (after the new import location), but:
- It splits the FastAPI imports across two locations —
from fastapi.middleware.cors import CORSMiddlewarestays at line 47, but the mainfrom fastapi import ...is now 80+ lines later. - It's a PEP 8 violation (E402); the repo
ignores E402 in ruff config so it's silent, but consistency is better.
Suggest: put HTTPException and JSONResponse into the existing top-of-file import block (line 46-47 area) alongside the existing fastapi imports.
5. Anthropic clients still get the wrong shape
This handler applies globally, so /v1/messages HTTPException paths (e.g. vllm_mlx/routes/anthropic.py:211, 219) will now return OpenAI-shaped errors. Anthropic SDKs expect {"type": "error", "error": {"type": "...", "message": "..."}}. This isn't a regression (the previous {"detail": "..."} was wrong for Anthropic too), but the PR description should call out this limitation, and a follow-up PR could add route-aware formatting based on request.url.path.startswith("/v1/messages").
6. Only one of the seven status codes is tested
The handler maps 6 codes (400/401/403/404/409/429) plus a fallback api_error. The test only covers 429. At minimum, add coverage for one mapped code (e.g., 401 with an Authorization: Bearer wrong-key request that goes through real auth middleware) and the fallback api_error (e.g., an HTTPException with a code not in the map).
7. No headers=None test
HTTPException without an explicit headers arg gives exc.headers = None. JSONResponse(headers=None) works correctly (verified locally), but worth one assertion to pin that contract — most HTTPException raises in the codebase don't pass headers.
Other observations (no action required from you)
- CI hasn't run on this PR yet (
statusCheckRollup: []) — external PRs need a maintainer to approve the first workflow run. Once you push the fixes above I'll trigger it. - After CI passes and the test-quality issue is addressed, I can run our pr_validate (full lint + diff-aware tests + DeepSeek review) on the branch.
Once items 1-3 are addressed I'm happy to merge. Thanks again!
What does this PR do?
Adds a dedicated FastAPI
HTTPExceptionhandler that returns OpenAI-style error envelopes for client-facing errors while preserving status codes and headers.Examples:
Before:
{"detail":"messages must not be empty"}After:
{ "error": { "message": "messages must not be empty", "type": "invalid_request_error", "code": null, "param": null } }The change also preserves response headers such as
Retry-Afterfor rate-limited requests.Additionally, adds tests covering the new HTTP exception response behavior.
Why is this needed?
Partially addresses the consolidated onboarding issue regarding OpenAI API compatibility.
Rapid-MLX currently returns FastAPI's default error format (
{"detail": ...}) forHTTPExceptionpaths such as validation failures, authentication errors, and rate-limit responses. Many OpenAI-compatible SDKs expect errors under anerrorobject and may inspect fields such aserror.typeorerror.code.This change standardizes HTTPException responses into an OpenAI-style envelope while preserving existing status codes and headers.
AI assistance disclosure
ChatGPT assisted with reviewing the implementation approach, identifying the appropriate FastAPI exception handling pattern, and suggesting test coverage.
I implemented the changes, reviewed all generated suggestions, cleaned up imports, manually inspected the final diff, and verified that the modified files contained only the intended changes.
Files touched:
vllm_mlx/server.pytests/test_config_and_middleware.pyTest plan
python3 -m py_compile vllm_mlx/server.pygit diff --checkChecklist
python3 -m pytest tests/ -x)ruff check && ruff format --check)python3 -m scripts.pr_validate.pr_validate <PR#>