Skip to content

Commit 9f4f01f

Browse files
committed
docs: split max_tokens field and add provider mapping section
1 parent 963d1e9 commit 9f4f01f

2 files changed

Lines changed: 29 additions & 4 deletions

File tree

docs/en/latest/plugins/ai-proxy-multi.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,9 @@ In addition, the Plugin also supports logging LLM request information in the acc
8181
| logging.payloads | boolean | False | false | | If true, log request and response payload. |
8282
| instances.override | object | False | | | Override setting. |
8383
| instances.override.endpoint | string | False | | | LLM provider endpoint to replace the default endpoint with. If not configured, the Plugin uses the default OpenAI endpoint `https://api.openai.com/v1/chat/completions`. |
84-
| instances.override.request_body | object | False | | | Request body overrides. Supports `max_tokens` (integer) to set the maximum number of output tokens. APISIX automatically maps this to the correct field name for each provider (e.g. `max_completion_tokens` for OpenAI, `max_output_tokens` for Responses API, `max_tokens` for most other providers). By default, client request fields take priority and override values only fill in missing fields. Set `request_body_force_override` to `true` to let override values forcefully overwrite client fields. |
85-
| instances.override.request_body_force_override | boolean | False | false | | When `false` (default), client request body fields take priority and override values only fill in missing fields. When `true`, override values forcefully overwrite client request body fields. |
84+
| instances.override.request_body | object | False | | | Request body overrides. See [Provider-aware `max_tokens` mapping](./ai-proxy.md#provider-aware-max_tokens-mapping) in the `ai-proxy` documentation for how the contained fields are forwarded to each provider. |
85+
| instances.override.request_body.max_tokens | integer | False | | ≥ 1 | Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name (e.g. `max_completion_tokens` for OpenAI Chat Completions, `max_output_tokens` for OpenAI Responses API, `max_tokens` for most other providers). By default, client request fields take priority and the override value only fills in when the client did not set it; set `instances.override.request_body_force_override` to `true` to forcefully overwrite the client value. |
86+
| instances.override.request_body_force_override | boolean | False | false | | When `false` (default), client request body fields take priority and `instances.override.request_body` values only fill in missing fields. When `true`, `instances.override.request_body` values forcefully overwrite client request body fields. |
8687
| instances.checks | object | False | | | Health check configurations. Note that at the moment, OpenAI, DeepSeek, and AIMLAPI do not provide an official health check endpoint. Other LLM services that you can configure under `openai-compatible` provider may have available health check endpoints. |
8788
| instances.checks.active | object | True | | | Active health check configurations. |
8889
| instances.checks.active.type | string | False | http | [http, https, tcp] | Type of health check connection. |

docs/en/latest/plugins/ai-proxy.md

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,9 @@ In addition, the Plugin also supports logging LLM request information in the acc
6666
| options.model | string | False | | | Name of the LLM model, such as `gpt-4` or `gpt-3.5`. Refer to the LLM provider's API documentation for available models. |
6767
| override | object | False | | | Override setting. |
6868
| override.endpoint | string | False | | | Custom LLM provider endpoint, required when `provider` is `openai-compatible`. |
69-
| override.request_body | object | False | | | Request body overrides. Supports `max_tokens` (integer) to set the maximum number of output tokens. APISIX automatically maps this to the correct field name for each provider (e.g. `max_completion_tokens` for OpenAI, `max_output_tokens` for Responses API, `max_tokens` for most other providers). By default, client request fields take priority and override values only fill in missing fields. Set `request_body_force_override` to `true` to let override values forcefully overwrite client fields. |
70-
| override.request_body_force_override | boolean | False | false | | When `false` (default), client request body fields take priority and override values only fill in missing fields. When `true`, override values forcefully overwrite client request body fields. |
69+
| override.request_body | object | False | | | Request body overrides. See [Provider-aware `max_tokens` mapping](#provider-aware-max_tokens-mapping) for how the contained fields are forwarded to each provider. |
70+
| override.request_body.max_tokens | integer | False | | ≥ 1 | Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name (e.g. `max_completion_tokens` for OpenAI Chat Completions, `max_output_tokens` for OpenAI Responses API, `max_tokens` for most other providers). By default, client request fields take priority and the override value only fills in when the client did not set it; set `override.request_body_force_override` to `true` to forcefully overwrite the client value. |
71+
| override.request_body_force_override | boolean | False | false | | When `false` (default), client request body fields take priority and `override.request_body` values only fill in missing fields. When `true`, `override.request_body` values forcefully overwrite client request body fields. |
7172
| logging | object | False | | | Logging configurations. Does not affect `error.log`. |
7273
| logging.summaries | boolean | False | false | | If true, logs request LLM model, duration, request, and response tokens. |
7374
| logging.payloads | boolean | False | false | | If true, logs request and response payload. |
@@ -77,6 +78,29 @@ In addition, the Plugin also supports logging LLM request information in the acc
7778
| keepalive_pool | integer | False | 30 | | Keepalive pool size for the LLM service connection. |
7879
| ssl_verify | boolean | False | true | | If true, verifies the LLM service's certificate. |
7980

81+
## Provider-aware `max_tokens` mapping
82+
83+
LLM providers and API endpoints disagree on the field name used to cap the number of output tokens. Configuring `override.request_body.max_tokens` lets you set a single value in APISIX and have it forwarded under the field name expected by each provider.
84+
85+
The table below shows the upstream field name APISIX rewrites `max_tokens` to for each `provider` and endpoint:
86+
87+
| Provider | OpenAI Chat Completions | OpenAI Responses API |
88+
| ------------------- | -------------------------- | -------------------- |
89+
| `openai` | `max_completion_tokens` | `max_output_tokens` |
90+
| `openai-compatible` | `max_tokens` | `max_output_tokens` |
91+
| `deepseek` | `max_tokens` ||
92+
| `anthropic` | `max_tokens` ||
93+
| `gemini` | `max_completion_tokens` ||
94+
| `azure-openai` | `max_tokens` ||
95+
| `openrouter` | `max_tokens` ||
96+
| `aimlapi` | `max_tokens` ||
97+
| `vertex-ai` | `max_completion_tokens` ||
98+
99+
Priority between client request and override is controlled by `override.request_body_force_override`:
100+
101+
- `false` (default): if the client request body already contains the provider-specific field, it is preserved; the override value only fills in when the field is missing.
102+
- `true`: the override value forcefully overwrites the field in the client request body.
103+
80104
## Examples
81105

82106
The examples below demonstrate how you can configure `ai-proxy` for different scenarios.

0 commit comments

Comments
 (0)