feat: cli OpenAI-compatible API response_format support#884
feat: cli OpenAI-compatible API response_format support#884markstur wants to merge 5 commits intogenerative-computing:mainfrom
response_format support#884Conversation
- Added `JsonSchemaFormat` model to represent JSON schema definitions
- Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
- Used field alias to avoid conflict with Pydantic's `schema` method
- Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
- Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
- Modified `make_chat_endpoint()` to:
- Parse `response_format` from requests
- Convert `json_schema` type to Pydantic models using the utility function
- Detect if the serve function accepts a `format` parameter using `inspect.signature()`
- Pass the generated Pydantic model as `format=` parameter to serve functions that support it
- Handle backward compatibility with serve functions that don't accept `format`
- Added proper error handling for invalid schemas
- Test json_schema format is converted to Pydantic model and passed to serve
- Test json_object format doesn't pass a schema
- Test text format doesn't pass a schema
- Test error handling for missing json_schema field
- Test error handling for invalid JSON schemas
- Test backward compatibility with serve functions without format parameter
- Test optional fields in JSON schemas
When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method
This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
response_format support
jakelorocco
left a comment
There was a problem hiding this comment.
I have a broader question that is touched on in my comments below: If we trust our backend provider to properly handle our structured output requests, why do we do any validation on our side? (Because module.serve might do something funky?)
* elif/pass not needed. Comment * set finish reason from result Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
|
OpenAI will return a If we're trying to be OpenAI compatible (which is the intent here) my default response would be to say we should emulate that behaviour -- though I am uncomfortable in some ways with returning syntactically invalid json.. I guess the rationale is 'we returned what we could'. In some ways it comes down to our 'contract' with callers For streaming this is a key difference as we fail validation (after the data has been streamed) We also have the inconsistency between streaming (here) and non-streaming. For the latter we're already returning the same as OpenAI since we missed doing any validation... I'd say to remove the validation -- then we have the same behaviour and it's down to the format parameter and how the backend constrained decoding works. I think this is what @jakelorocco describes I think this also means my early comment about ADDING streaming validatino is something I no longer feel is the right way forward. In fact the opposite -- remove |
|
+1 on emulating openai as faithfully as possible |
* inspect server() to set accepts_format and is_async in make_chat_endpoint instead of on every request * improved support for json_schema that can be converted to pydantic * no output validation. The module is given a format when applicable, but instead of throwing an error if the formatting is incorrect, the results are sent to the client. * consolidated redundant blocks of kwargs by building a dict of kwargs needed. * improved setting of finish_reason * added tests * improved comments/docstrings where clarification needed Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
|
removed the validation so now it is more consistent with what you might expect from an OpenAI API server that will return what it has even if it is truncated or otherwise broken. |
| value.upper() if value and value[0].isalpha() else f"VALUE_{index}" | ||
| ): value | ||
| for index, value in enumerate(enum_values) | ||
| } |
There was a problem hiding this comment.
String-enum members collide when two values upper-case to the same name. The dict comprehension at L126-131 keys each member on value.upper(), so ["open", "OPEN"] produces a 1-member enum — one value is silently dropped, and validating the dropped value fails with ValidationError even though the schema declared it.
Reproduced against the branch:
m = json_schema_to_pydantic(
{"type": "object",
"properties": {"s": {"type": "string", "enum": ["open", "OPEN"]}},
"required": ["s"]},
)
m.model_validate({"s": "open"}) # ValidationErrorCase-variant enums are uncommon in hand-written OpenAI schemas but routine in schemas migrated from other sources (OpenAPI, Java/TS codebases, ["AM", "PM"], etc.). The failure is silent at conversion time and only surfaces as a mismatched validation later.
Two options, either is fine:
- Dedupe member names — track seen names, append
_1,_2, … on collision. Minimal change, preserves the named-Enumform. - Collapse to
Literal— the non-string branch at L134 already usesLiteral[tuple(enum_values)], which is case-exact, needs no name generation, and would let L122-132 disappear. No public contract depends on theEnumclass being named.
Worth a regression test asserting both "open" and "OPEN" validate when both are declared.
| raise ValueError(f"{_format_path(path)} enum must be an array") | ||
| return _enum_annotation(enum_values, path) | ||
|
|
||
| field_type = normalized_schema.get("type", "string") |
There was a problem hiding this comment.
L286: field_type = normalized_schema.get("type", "string") silently defaults any typeless schema node to str. JSON Schema with no type keyword is valid and means "any type"; defaulting to str narrows that silently.
Reproduced:
json_schema_to_pydantic({
"type": "object",
"properties": {"data": {"description": "anything"}},
"required": ["data"],
}).model_validate({"data": 42}) # ValidationError: expected str{"description": "..."} without type is a real pattern in schemas pasted from OpenAPI specs where types are contextual. The converter already raises ValueError for unrecognised types ({"type": "integr"} → "unsupported JSON schema type"), so this is the only silently-wrong fallthrough left in the type dispatch.
Two options:
- Raise on missing
type— strictest and matches OpenAI's own strict-mode requirement that every node declares a type. Most aligned with the project's fail-fast policy. - Default to
Any— spec-faithful; treats missing type as "any JSON value" and lets Pydantic validate nothing specific.
Either is a clear improvement over the current silent narrowing. Lower severity than the enum collision — OpenAI-style schemas usually do declare type on every node — but worth fixing while the converter is fresh.
| if normalized_schema.get("type") != "object": | ||
| raise ValueError(f"{_format_path(path)} must be an object schema") | ||
|
|
||
| cache_key = f"{current_model_name}:{id(object_schema)}" |
There was a problem hiding this comment.
L354: cache_key = f"{current_model_name}:{id(object_schema)}" — the cache never protects against recursive $ref. _normalize_schema returns a fresh dict on every ref resolution, so each re-entry to _object_schema_to_model for the same $def gets a new id(), a new cache miss, and another recursion.
Reproduced:
json_schema_to_pydantic({
"type": "object",
"$defs": {"Node": {
"type": "object",
"properties": {"val": {"type": "integer"},
"child": {"$ref": "#/$defs/Node"}},
"required": ["val"],
}},
"properties": {"root": {"$ref": "#/$defs/Node"}},
"required": ["root"],
}) # RecursionErrorRecursive schemas are rare in OpenAI-style structured output (strict mode historically disallowed them) but a valid JSON Schema pattern for tree/linked-list outputs. The current behaviour is a server-side crash — caught by the endpoint's except Exception and surfaced as a generic 500.
Minimum fix: detect cycles by tracking in-flight ref names in a set and raise ValueError("recursive $ref is not supported") — turns a 500 into a clean 400 and adds one line to the "Still unsupported" list in the docstring. Supporting recursion properly would need Pydantic forward refs + model_rebuild(); probably not worth it for v1.
Severity: lower than findings 1 and 2 — unlikely to hit OpenAI users in practice — but trivial to fix and improves the error surface.
| if choices and len(choices) > 0: | ||
| finish_reason = choices[0].get("finish_reason") | ||
| if finish_reason in valid_reasons: | ||
| return finish_reason |
There was a problem hiding this comment.
extract_finish_reason handles Ollama (chat_response.done_reason) and OpenAI-direct (oai_chat_response["choices"][0]["finish_reason"]) but silently skips LiteLLM and returns the default "stop" for every LiteLLM-backed request.
LiteLLM stores its response at a different key and in a different shape — mellea/backends/litellm.py:443,693:
mot._meta["litellm_chat_response"] = chunk.choices[0].model_dump()This is a flat Choice dict ({"finish_reason": "length", "index": 0, "message": {...}}), not wrapped in a choices array, so neither existing branch matches.
Not a regression — pre-PR the code hardcoded "stop" for every backend, so LiteLLM users are no worse off. But the PR's stated goal is "use backend finish_reason", and LiteLLM is a first-class backend in Mellea; leaving it defaulting to "stop" means clients that branch on finish_reason == "length" to paginate will loop forever on LiteLLM-fronted models.
Proposed addition (after L46):
# LiteLLM backend stores a flat Choice.model_dump() dict
litellm_response = output._meta.get("litellm_chat_response")
if isinstance(litellm_response, dict):
finish_reason = litellm_response.get("finish_reason")
if finish_reason in valid_reasons:
return finish_reasonHuggingFace backend is also uncovered but has no OpenAI-equivalent finish signal to extract, so fall-through to "stop" is reasonable there. Worth calling out in the docstring.
| assert call_args.kwargs["format"] is None | ||
|
|
||
| # Verify response is successful | ||
| assert isinstance(response, ChatCompletion) |
There was a problem hiding this comment.
L608-609 (and L635-636 in the text-format sibling test) use a conditional assert that never fires:
if "format" in call_args.kwargs:
assert call_args.kwargs["format"] is NoneThe shared mock_module fixture uses Mock(), and inspect.signature(Mock().serve) returns (*args, **kwargs) with parameters ["args", "kwargs"] — "format" is never in sig.parameters, so accepts_format in make_chat_endpoint is always False for this fixture, and format is never added to serve_kwargs. The if branch is unreachable; the assert proves nothing.
Reproduced:
>>> inspect.signature(Mock().serve).parameters
mappingproxy({"args": <Parameter "*args">, "kwargs": <Parameter "**kwargs">})
>>> "format" in inspect.signature(Mock().serve).parameters
FalseNet effect: these two tests only verify that endpoint() returns a ChatCompletion when response_format is json_object / text — the "no format model is passed" behaviour they claim to test is not tested at all.
Fix options:
- Unconditional assert —
assert "format" not in call_args.kwargs. Locks in the mock's actual behaviour (noformatin params → noformatin call). - Use a real function with
format=param — e.g.def fake_serve(input, requirements=None, model_options=None, format=None): ...and assertformat is Noneunconditionally. More faithful to the real code path.
Option 2 is closer to what the test docstring claims; option 1 is a one-line fix.
| return create_openai_error_response( | ||
| status_code=400, | ||
| message=f"Invalid JSON schema: {e!s}", | ||
| error_type="invalid_request_error", |
There was a problem hiding this comment.
Scoped nit on the schema-conversion error handling (the broader except Exception handler at L252 is pre-existing, not in scope for this PR).
L181 catches only ValueError from json_schema_to_pydantic, but the converter can also surface non-ValueError exceptions for pathological schemas that should still be client errors:
create_model(...)can raisePydanticUserError/PydanticSchemaGenerationErroron schemas that produce an internally-inconsistent model.Enum(...)in_enum_annotationcan raiseTypeErrorfor certain member-name edge cases (see the enum-collision comment above).
These currently fall through to the generic 500 handler and surface to the client as "Internal server error: ..." rather than a clean 400 "Invalid JSON schema".
Minimal fix — widen this block only:
except (ValueError, TypeError) as e:
return create_openai_error_response(
status_code=400,
message=f"Invalid JSON schema: {e!s}",
error_type="invalid_request_error",
param="response_format.json_schema.schema",
)Or import the Pydantic error types explicitly and include them in the tuple. Either way keeps the scope tight to schema-conversion failures and leaves the outer handler alone.
| """JSON Schema definition when type is 'json_schema'.""" | ||
|
|
||
|
|
||
| class StreamOptions(BaseModel): |
There was a problem hiding this comment.
Minor: ResponseFormat accepts {"type": "json_schema", "json_schema": null} at Pydantic-parse time and defers the "json_schema is required when type=json_schema" check to the endpoint at app.py:169-175. The endpoint does catch it and returns a clean 400, so this is a code-organisation concern rather than a functional bug.
A model_validator on ResponseFormat would co-locate the invariant with the type it constrains and shrink the endpoint body by a branch:
from pydantic import model_validator
class ResponseFormat(BaseModel):
type: Literal["text", "json_object", "json_schema"]
json_schema: JsonSchemaFormat | None = None
@model_validator(mode="after")
def _require_json_schema(self) -> "ResponseFormat":
if self.type == "json_schema" and self.json_schema is None:
raise ValueError("json_schema is required when type is 'json_schema'")
return selfThe endpoint 's existing except RequestValidationError handler (via validation_exception_handler) will then turn the ValueError into a 400 with the same shape as other validation errors, and the explicit block at app.py:169-175 can go away.
Not blocking — happy to ship as-is if the author prefers endpoint-level validation.
planetf1
left a comment
There was a problem hiding this comment.
Suggesting some changes. I think the main one is around the enum clash -- the others add value over what was there before, so the comments are more about completeness rather than regression - so I think it would be good to fix up, but minded to approve when you are comfortable with the changes. nothing else is a blocker
Misc PR
Type of PR
Description
Testing
Attribution