Preserve HTTPException responses in error handlers by parth-951 · Pull Request #491 · raullenchai/Rapid-MLX

parth-951 · 2026-05-31T10:48:21Z

What does this PR do?

Adds a dedicated FastAPI HTTPException handler that returns OpenAI-style error envelopes for client-facing errors while preserving status codes and headers.

Examples:

Before:

{"detail":"messages must not be empty"}

After:

{
  "error": {
    "message": "messages must not be empty",
    "type": "invalid_request_error",
    "code": null,
    "param": null
  }
}

The change also preserves response headers such as Retry-After for rate-limited requests.

Additionally, adds tests covering the new HTTP exception response behavior.

Why is this needed?

Partially addresses the consolidated onboarding issue regarding OpenAI API compatibility.

Rapid-MLX currently returns FastAPI's default error format ({"detail": ...}) for HTTPException paths such as validation failures, authentication errors, and rate-limit responses. Many OpenAI-compatible SDKs expect errors under an error object and may inspect fields such as error.type or error.code.

This change standardizes HTTPException responses into an OpenAI-style envelope while preserving existing status codes and headers.

AI assistance disclosure

ChatGPT assisted with reviewing the implementation approach, identifying the appropriate FastAPI exception handling pattern, and suggesting test coverage.

I implemented the changes, reviewed all generated suggestions, cleaned up imports, manually inspected the final diff, and verified that the modified files contained only the intended changes.

Files touched:

vllm_mlx/server.py
tests/test_config_and_middleware.py

Test plan

Added tests covering OpenAI-style HTTPException responses.
Ran python3 -m py_compile vllm_mlx/server.py
Ran git diff --check
Reviewed final diff for unintended file modifications.
Attempted to run pytest locally on Windows, but the project depends on Apple's MLX package which is not available on Windows.

Checklist

Tests pass locally (python3 -m pytest tests/ -x)
Lint passes (ruff check && ruff format --check)
Self-validated with python3 -m scripts.pr_validate.pr_validate <PR#>
If new tests touch a critical code path (parser / scheduler / security), I've spot-checked that they fail when the corresponding production line is broken
Updated README/docs if applicable
No breaking changes to existing API

raullenchai

Thanks for the PR! Useful improvement — the OpenAI-style envelope is a real compatibility gap. Supply-chain audit is clean (2 files, no deps / CI / install hooks / network / exec / encoded payloads). A few changes needed before this can merge.

Blocking

1. Lint fails — `ruff check` reports 4 errors

tests/test_config_and_middleware.py:358 I001 — import block is unsorted
tests/test_config_and_middleware.py:416 W292 — no newline at end of file
vllm_mlx/server.py:40                I001 — import block is unsorted
vllm_mlx/server.py:404               I001 — import block is unsorted

All four are auto-fixable: ruff check --fix vllm_mlx/server.py tests/test_config_and_middleware.py.

2. `ruff format --check` would reformat both files

Run ruff format vllm_mlx/server.py tests/test_config_and_middleware.py after the lint fix above.

3. Test asserts on a copy of the handler, not the production handler

tests/test_config_and_middleware.py:357-398 (TestHTTPExceptionHandler._make_app_with_handler) defines a fresh FastAPI() app and registers a hand-copied duplicate of the production handler (lines 363-388 are a verbatim copy of vllm_mlx/server.py:407-432). The test then asserts on that copy, not the real handler.

This means if someone changes the production handler (adds a new status code, changes the envelope shape, drops param), this test still passes. The test is asserting that the test's own helper works.

Two clean fixes:

Option A — register the real handler on a test app:

from vllm_mlx.server import _http_exception_handler

def _make_app_with_handler(self):
    app = FastAPI()
    app.add_exception_handler(HTTPException, _http_exception_handler)
    @app.get("/rate-limit")
    async def rate_limit():
        raise HTTPException(429, "Rate limit exceeded", headers={"Retry-After": "60"})
    return app

Option B — exercise an existing production route that raises HTTPException (e.g., one of the vllm_mlx/middleware/auth.py:155 401 paths) via TestClient(app) from vllm_mlx.server. This also covers middleware-raised exceptions.

I'd lean Option A — it's the smallest delta to your current code and pins the production handler directly. Either is acceptable.

NITs (nice to have, not blockers)

4. Import placement

The original from fastapi import FastAPI, Request was removed from vllm_mlx/server.py:46 and re-added at line ~132 (after from .tool_parsers import ToolParserManager). It works at runtime because app = FastAPI() is at line 365 (after the new import location), but:

It splits the FastAPI imports across two locations — from fastapi.middleware.cors import CORSMiddleware stays at line 47, but the main from fastapi import ... is now 80+ lines later.
It's a PEP 8 violation (E402); the repo ignores E402 in ruff config so it's silent, but consistency is better.

Suggest: put HTTPException and JSONResponse into the existing top-of-file import block (line 46-47 area) alongside the existing fastapi imports.

5. Anthropic clients still get the wrong shape

This handler applies globally, so /v1/messages HTTPException paths (e.g. vllm_mlx/routes/anthropic.py:211, 219) will now return OpenAI-shaped errors. Anthropic SDKs expect {"type": "error", "error": {"type": "...", "message": "..."}}. This isn't a regression (the previous {"detail": "..."} was wrong for Anthropic too), but the PR description should call out this limitation, and a follow-up PR could add route-aware formatting based on request.url.path.startswith("/v1/messages").

6. Only one of the seven status codes is tested

The handler maps 6 codes (400/401/403/404/409/429) plus a fallback api_error. The test only covers 429. At minimum, add coverage for one mapped code (e.g., 401 with an Authorization: Bearer wrong-key request that goes through real auth middleware) and the fallback api_error (e.g., an HTTPException with a code not in the map).

7. No `headers=None` test

HTTPException without an explicit headers arg gives exc.headers = None. JSONResponse(headers=None) works correctly (verified locally), but worth one assertion to pin that contract — most HTTPException raises in the codebase don't pass headers.

Other observations (no action required from you)

CI hasn't run on this PR yet (statusCheckRollup: []) — external PRs need a maintainer to approve the first workflow run. Once you push the fixes above I'll trigger it.
After CI passes and the test-quality issue is addressed, I can run our pr_validate (full lint + diff-aware tests + DeepSeek review) on the branch.

Once items 1-3 are addressed I'm happy to merge. Thanks again!

Preserve HTTPException responses in error handlers

f6b5b56

raullenchai requested changes May 31, 2026

View reviewed changes

Address review feedback on HTTPException handler

3704f32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve HTTPException responses in error handlers#491

Preserve HTTPException responses in error handlers#491
parth-951 wants to merge 2 commits into
raullenchai:mainfrom
parth-951:feat/request-size-and-openai-errors

parth-951 commented May 31, 2026

Uh oh!

raullenchai left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

parth-951 commented May 31, 2026

What does this PR do?

Why is this needed?

AI assistance disclosure

Test plan

Checklist

Uh oh!

raullenchai left a comment

Choose a reason for hiding this comment

Blocking

1. Lint fails — ruff check reports 4 errors

2. ruff format --check would reformat both files

3. Test asserts on a copy of the handler, not the production handler

NITs (nice to have, not blockers)

4. Import placement

5. Anthropic clients still get the wrong shape

6. Only one of the seven status codes is tested

7. No headers=None test

Other observations (no action required from you)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Lint fails — `ruff check` reports 4 errors

2. `ruff format --check` would reformat both files

7. No `headers=None` test