You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| instances.override.endpoint | string | False ||| LLM provider endpoint to replace the default endpoint with. If not configured, the Plugin uses the default OpenAI endpoint `https://api.openai.com/v1/chat/completions`. |
84
-
| instances.override.request_body | object | False ||| Request body overrides. Supports `max_tokens` (integer) to set the maximum number of output tokens. APISIX automatically maps this to the correct field name for each provider (e.g. `max_completion_tokens` for OpenAI, `max_output_tokens` for Responses API, `max_tokens` for most other providers). By default, client request fields take priority and override values only fill in missing fields. Set `request_body_force_override` to `true` to let override values forcefully overwrite client fields. |
85
-
| instances.override.request_body_force_override | boolean | False | false || When `false` (default), client request body fields take priority and override values only fill in missing fields. When `true`, override values forcefully overwrite client request body fields. |
84
+
| instances.override.request_body | object | False ||| Request body overrides. See [Provider-aware `max_tokens` mapping](./ai-proxy.md#provider-aware-max_tokens-mapping) in the `ai-proxy` documentation for how the contained fields are forwarded to each provider. |
85
+
| instances.override.request_body.max_tokens | integer | False || ≥ 1 | Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name (e.g. `max_completion_tokens` for OpenAI Chat Completions, `max_output_tokens` for OpenAI Responses API, `max_tokens` for most other providers). By default, client request fields take priority and the override value only fills in when the client did not set it; set `instances.override.request_body_force_override` to `true` to forcefully overwrite the client value. |
86
+
| instances.override.request_body_force_override | boolean | False | false || When `false` (default), client request body fields take priority and `instances.override.request_body` values only fill in missing fields. When `true`, `instances.override.request_body` values forcefully overwrite client request body fields. |
86
87
| instances.checks | object | False ||| Health check configurations. Note that at the moment, OpenAI, DeepSeek, and AIMLAPI do not provide an official health check endpoint. Other LLM services that you can configure under `openai-compatible` provider may have available health check endpoints. |
87
88
| instances.checks.active | object | True ||| Active health check configurations. |
88
89
| instances.checks.active.type | string | False | http |[http, https, tcp]| Type of health check connection. |
Copy file name to clipboardExpand all lines: docs/en/latest/plugins/ai-proxy.md
+26-2Lines changed: 26 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,8 +66,9 @@ In addition, the Plugin also supports logging LLM request information in the acc
66
66
| options.model | string | False ||| Name of the LLM model, such as `gpt-4` or `gpt-3.5`. Refer to the LLM provider's API documentation for available models. |
| override.endpoint | string | False ||| Custom LLM provider endpoint, required when `provider` is `openai-compatible`. |
69
-
| override.request_body | object | False ||| Request body overrides. Supports `max_tokens` (integer) to set the maximum number of output tokens. APISIX automatically maps this to the correct field name for each provider (e.g. `max_completion_tokens` for OpenAI, `max_output_tokens` for Responses API, `max_tokens` for most other providers). By default, client request fields take priority and override values only fill in missing fields. Set `request_body_force_override` to `true` to let override values forcefully overwrite client fields. |
70
-
| override.request_body_force_override | boolean | False | false || When `false` (default), client request body fields take priority and override values only fill in missing fields. When `true`, override values forcefully overwrite client request body fields. |
69
+
| override.request_body | object | False ||| Request body overrides. See [Provider-aware `max_tokens` mapping](#provider-aware-max_tokens-mapping) for how the contained fields are forwarded to each provider. |
70
+
| override.request_body.max_tokens | integer | False || ≥ 1 | Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name (e.g. `max_completion_tokens` for OpenAI Chat Completions, `max_output_tokens` for OpenAI Responses API, `max_tokens` for most other providers). By default, client request fields take priority and the override value only fills in when the client did not set it; set `override.request_body_force_override` to `true` to forcefully overwrite the client value. |
71
+
| override.request_body_force_override | boolean | False | false || When `false` (default), client request body fields take priority and `override.request_body` values only fill in missing fields. When `true`, `override.request_body` values forcefully overwrite client request body fields. |
71
72
| logging | object | False ||| Logging configurations. Does not affect `error.log`. |
| logging.payloads | boolean | False | false || If true, logs request and response payload. |
@@ -77,6 +78,29 @@ In addition, the Plugin also supports logging LLM request information in the acc
77
78
| keepalive_pool | integer | False | 30 || Keepalive pool size for the LLM service connection. |
78
79
| ssl_verify | boolean | False | true || If true, verifies the LLM service's certificate. |
79
80
81
+
## Provider-aware `max_tokens` mapping
82
+
83
+
LLM providers and API endpoints disagree on the field name used to cap the number of output tokens. Configuring `override.request_body.max_tokens` lets you set a single value in APISIX and have it forwarded under the field name expected by each provider.
84
+
85
+
The table below shows the upstream field name APISIX rewrites `max_tokens` to for each `provider` and endpoint:
Priority between client request and override is controlled by `override.request_body_force_override`:
100
+
101
+
-`false` (default): if the client request body already contains the provider-specific field, it is preserved; the override value only fills in when the field is missing.
102
+
-`true`: the override value forcefully overwrites the field in the client request body.
103
+
80
104
## Examples
81
105
82
106
The examples below demonstrate how you can configure `ai-proxy` for different scenarios.
0 commit comments