Skip to content

Preserve HTTPException responses in error handlers#491

Open
parth-951 wants to merge 2 commits into
raullenchai:mainfrom
parth-951:feat/request-size-and-openai-errors
Open

Preserve HTTPException responses in error handlers#491
parth-951 wants to merge 2 commits into
raullenchai:mainfrom
parth-951:feat/request-size-and-openai-errors

Conversation

@parth-951
Copy link
Copy Markdown

What does this PR do?

Adds a dedicated FastAPI HTTPException handler that returns OpenAI-style error envelopes for client-facing errors while preserving status codes and headers.

Examples:

Before:

{"detail":"messages must not be empty"}

After:

{
  "error": {
    "message": "messages must not be empty",
    "type": "invalid_request_error",
    "code": null,
    "param": null
  }
}

The change also preserves response headers such as Retry-After for rate-limited requests.

Additionally, adds tests covering the new HTTP exception response behavior.

Why is this needed?

Partially addresses the consolidated onboarding issue regarding OpenAI API compatibility.

Rapid-MLX currently returns FastAPI's default error format ({"detail": ...}) for HTTPException paths such as validation failures, authentication errors, and rate-limit responses. Many OpenAI-compatible SDKs expect errors under an error object and may inspect fields such as error.type or error.code.

This change standardizes HTTPException responses into an OpenAI-style envelope while preserving existing status codes and headers.

AI assistance disclosure

ChatGPT assisted with reviewing the implementation approach, identifying the appropriate FastAPI exception handling pattern, and suggesting test coverage.

I implemented the changes, reviewed all generated suggestions, cleaned up imports, manually inspected the final diff, and verified that the modified files contained only the intended changes.

Files touched:

  • vllm_mlx/server.py
  • tests/test_config_and_middleware.py

Test plan

  • Added tests covering OpenAI-style HTTPException responses.
  • Ran python3 -m py_compile vllm_mlx/server.py
  • Ran git diff --check
  • Reviewed final diff for unintended file modifications.
  • Attempted to run pytest locally on Windows, but the project depends on Apple's MLX package which is not available on Windows.

Checklist

  • Tests pass locally (python3 -m pytest tests/ -x)
  • Lint passes (ruff check && ruff format --check)
  • Self-validated with python3 -m scripts.pr_validate.pr_validate <PR#>
  • If new tests touch a critical code path (parser / scheduler / security), I've spot-checked that they fail when the corresponding production line is broken
  • Updated README/docs if applicable
  • No breaking changes to existing API

Copy link
Copy Markdown
Owner

@raullenchai raullenchai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Useful improvement — the OpenAI-style envelope is a real compatibility gap. Supply-chain audit is clean (2 files, no deps / CI / install hooks / network / exec / encoded payloads). A few changes needed before this can merge.

Blocking

1. Lint fails — ruff check reports 4 errors

tests/test_config_and_middleware.py:358 I001 — import block is unsorted
tests/test_config_and_middleware.py:416 W292 — no newline at end of file
vllm_mlx/server.py:40                I001 — import block is unsorted
vllm_mlx/server.py:404               I001 — import block is unsorted

All four are auto-fixable: ruff check --fix vllm_mlx/server.py tests/test_config_and_middleware.py.

2. ruff format --check would reformat both files

Run ruff format vllm_mlx/server.py tests/test_config_and_middleware.py after the lint fix above.

3. Test asserts on a copy of the handler, not the production handler

tests/test_config_and_middleware.py:357-398 (TestHTTPExceptionHandler._make_app_with_handler) defines a fresh FastAPI() app and registers a hand-copied duplicate of the production handler (lines 363-388 are a verbatim copy of vllm_mlx/server.py:407-432). The test then asserts on that copy, not the real handler.

This means if someone changes the production handler (adds a new status code, changes the envelope shape, drops param), this test still passes. The test is asserting that the test's own helper works.

Two clean fixes:

Option A — register the real handler on a test app:

from vllm_mlx.server import _http_exception_handler

def _make_app_with_handler(self):
    app = FastAPI()
    app.add_exception_handler(HTTPException, _http_exception_handler)
    @app.get("/rate-limit")
    async def rate_limit():
        raise HTTPException(429, "Rate limit exceeded", headers={"Retry-After": "60"})
    return app

Option B — exercise an existing production route that raises HTTPException (e.g., one of the vllm_mlx/middleware/auth.py:155 401 paths) via TestClient(app) from vllm_mlx.server. This also covers middleware-raised exceptions.

I'd lean Option A — it's the smallest delta to your current code and pins the production handler directly. Either is acceptable.

NITs (nice to have, not blockers)

4. Import placement

The original from fastapi import FastAPI, Request was removed from vllm_mlx/server.py:46 and re-added at line ~132 (after from .tool_parsers import ToolParserManager). It works at runtime because app = FastAPI() is at line 365 (after the new import location), but:

  • It splits the FastAPI imports across two locations — from fastapi.middleware.cors import CORSMiddleware stays at line 47, but the main from fastapi import ... is now 80+ lines later.
  • It's a PEP 8 violation (E402); the repo ignores E402 in ruff config so it's silent, but consistency is better.

Suggest: put HTTPException and JSONResponse into the existing top-of-file import block (line 46-47 area) alongside the existing fastapi imports.

5. Anthropic clients still get the wrong shape

This handler applies globally, so /v1/messages HTTPException paths (e.g. vllm_mlx/routes/anthropic.py:211, 219) will now return OpenAI-shaped errors. Anthropic SDKs expect {"type": "error", "error": {"type": "...", "message": "..."}}. This isn't a regression (the previous {"detail": "..."} was wrong for Anthropic too), but the PR description should call out this limitation, and a follow-up PR could add route-aware formatting based on request.url.path.startswith("/v1/messages").

6. Only one of the seven status codes is tested

The handler maps 6 codes (400/401/403/404/409/429) plus a fallback api_error. The test only covers 429. At minimum, add coverage for one mapped code (e.g., 401 with an Authorization: Bearer wrong-key request that goes through real auth middleware) and the fallback api_error (e.g., an HTTPException with a code not in the map).

7. No headers=None test

HTTPException without an explicit headers arg gives exc.headers = None. JSONResponse(headers=None) works correctly (verified locally), but worth one assertion to pin that contract — most HTTPException raises in the codebase don't pass headers.


Other observations (no action required from you)

  • CI hasn't run on this PR yet (statusCheckRollup: []) — external PRs need a maintainer to approve the first workflow run. Once you push the fixes above I'll trigger it.
  • After CI passes and the test-quality issue is addressed, I can run our pr_validate (full lint + diff-aware tests + DeepSeek review) on the branch.

Once items 1-3 are addressed I'm happy to merge. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants