You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: update constrained grammars with vLLM structured output support
Update the compatibility notice to include vLLM alongside llama.cpp.
Add a vLLM-specific section with examples for all three supported
methods: json_schema, json_object, and grammar (via xgrammar).
Ref: #6857
The `chat` endpoint supports the `grammar` parameter, which allows users to specify a grammar in Backus-Naur Form (BNF). This feature enables the Large Language Model (LLM) to generate outputs adhering to a user-defined schema, such as `JSON`, `YAML`, or any other format that can be defined using BNF. For more details about BNF, see [Backus-Naur Form on Wikipedia](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form).
11
11
12
12
{{% notice note %}}
13
-
**Compatibility Notice:** This feature is only supported by models that use the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend. For a complete list of compatible models, refer to the [Model Compatibility]({{%relref "reference/compatibility-table" %}}) page. For technical details, see the related pull requests: [PR #1773](https://github.com/ggerganov/llama.cpp/pull/1773) and [PR #1887](https://github.com/ggerganov/llama.cpp/pull/1887).
13
+
**Compatibility Notice:** Grammar and structured output support is available for the following backends:
14
+
-**llama.cpp** — supports the `grammar` parameter (GBNF syntax) and `response_format` with `json_schema`/`json_object`
15
+
-**vLLM** — supports the `grammar` parameter (via xgrammar), `response_format` with `json_schema` (native JSON schema enforcement), and `json_object`
16
+
17
+
For a complete list of compatible models, refer to the [Model Compatibility]({{%relref "reference/compatibility-table" %}}) page.
14
18
{{% /notice %}}
15
19
16
20
## Setup
@@ -66,6 +70,59 @@ For more complex grammars, you can define multi-line BNF rules. The grammar pars
66
70
- Character classes (`[a-z]`)
67
71
- String literals (`"text"`)
68
72
73
+
## vLLM Backend
74
+
75
+
The vLLM backend supports structured output via three methods:
76
+
77
+
### JSON Schema (recommended)
78
+
79
+
Use the OpenAI-compatible `response_format` parameter with `json_schema` to enforce a specific JSON structure:
0 commit comments