You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| instances.override.endpoint | string | False ||| LLM provider endpoint to replace the default endpoint with. If not configured, the Plugin uses the default OpenAI endpoint `https://api.openai.com/v1/chat/completions`. |
84
-
| instances.override.request_body | object | False ||| Request body overrides. See [Provider-aware `max_tokens` mapping](./ai-proxy.md#provider-aware-max_tokens-mapping) in the `ai-proxy` documentation for how the contained fields are forwarded to each provider. |
85
-
| instances.override.request_body.max_tokens | integer | False || ≥ 1 | Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name (e.g. `max_completion_tokens` for OpenAI Chat Completions, `max_output_tokens` for OpenAI Responses API, `max_tokens` for most other providers). By default, client request fields take priority and the override value only fills in when the client did not set it; set `instances.override.request_body_force_override` to `true` to forcefully overwrite the client value. |
86
-
| instances.override.request_body_force_override | boolean | False | false || When `false` (default), client request body fields take priority and `instances.override.request_body` values only fill in missing fields. When `true`, `instances.override.request_body` values forcefully overwrite client request body fields. |
84
+
| instances.override.llm_options | object | False ||| Provider-aware LLM options. See [Provider-aware `max_tokens` mapping](./ai-proxy.md#provider-aware-max_tokens-mapping) in the `ai-proxy` documentation. |
85
+
| instances.override.llm_options.max_tokens | integer | False || ≥ 1 | Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name. Always force-overwrites the client value. |
86
+
| instances.override.request_body | object | False ||| Per target-protocol request body overrides. See [Per-protocol request body override](./ai-proxy.md#per-protocol-request-body-override) in the `ai-proxy` documentation. |
87
+
| instances.override.request_body_force_override | boolean | False | false || When `false` (default), client request body fields take priority and `instances.override.request_body` values only fill in missing fields. When `true`, `instances.override.request_body` values forcefully overwrite client fields. Does not affect `instances.override.llm_options`. |
87
88
| instances.checks | object | False ||| Health check configurations. Note that at the moment, OpenAI, DeepSeek, and AIMLAPI do not provide an official health check endpoint. Other LLM services that you can configure under `openai-compatible` provider may have available health check endpoints. |
88
89
| instances.checks.active | object | True ||| Active health check configurations. |
89
90
| instances.checks.active.type | string | False | http |[http, https, tcp]| Type of health check connection. |
Copy file name to clipboardExpand all lines: docs/en/latest/plugins/ai-proxy.md
+18-6Lines changed: 18 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,9 +66,10 @@ In addition, the Plugin also supports logging LLM request information in the acc
66
66
| options.model | string | False ||| Name of the LLM model, such as `gpt-4` or `gpt-3.5`. Refer to the LLM provider's API documentation for available models. |
| override.endpoint | string | False ||| Custom LLM provider endpoint, required when `provider` is `openai-compatible`. |
69
-
| override.request_body | object | False ||| Request body overrides. See [Provider-aware `max_tokens` mapping](#provider-aware-max_tokens-mapping) for how the contained fields are forwarded to each provider. |
70
-
| override.request_body.max_tokens | integer | False || ≥ 1 | Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name (e.g. `max_completion_tokens` for OpenAI Chat Completions, `max_output_tokens` for OpenAI Responses API, `max_tokens` for most other providers). By default, client request fields take priority and the override value only fills in when the client did not set it; set `override.request_body_force_override` to `true` to forcefully overwrite the client value. |
71
-
| override.request_body_force_override | boolean | False | false || When `false` (default), client request body fields take priority and `override.request_body` values only fill in missing fields. When `true`, `override.request_body` values forcefully overwrite client request body fields. |
| override.llm_options.max_tokens | integer | False || ≥ 1 | Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name (e.g. `max_completion_tokens` for OpenAI Chat Completions, `max_output_tokens` for OpenAI Responses API, `max_tokens` for most other providers). Always force-overwrites the client value. |
71
+
| override.request_body | object | False ||| Per target-protocol request body overrides. Keys are target protocol names (`openai-chat`, `openai-responses`, `openai-embeddings`, `anthropic-messages`); values are partial request bodies that are deep-merged into the outgoing body (objects merged recursively, arrays and scalars replaced wholesale). See [Per-protocol request body override](#per-protocol-request-body-override). |
72
+
| override.request_body_force_override | boolean | False | false || When `false` (default), client request body fields take priority and `override.request_body` values only fill in missing fields. When `true`, `override.request_body` values forcefully overwrite client fields. Does not affect `override.llm_options`, which always force-overwrites. |
72
73
| logging | object | False ||| Logging configurations. Does not affect `error.log`. |
| logging.payloads | boolean | False | false || If true, logs request and response payload. |
@@ -82,7 +83,7 @@ In addition, the Plugin also supports logging LLM request information in the acc
82
83
83
84
## Provider-aware `max_tokens` mapping
84
85
85
-
LLM providers and API endpoints disagree on the field name used to cap the number of output tokens. Configuring `override.request_body.max_tokens` lets you set a single value in APISIX and have it forwarded under the field name expected by each provider/endpoint.
86
+
LLM providers and API endpoints disagree on the field name used to cap the number of output tokens. Configuring `override.llm_options.max_tokens` lets you set a single value in APISIX and have it forwarded under the field name expected by each provider/endpoint. `llm_options` always force-overwrites the client value.
86
87
87
88
The table below shows, for each `provider` and target API endpoint, the upstream field name APISIX rewrites `max_tokens` to. A `—` means the provider does not expose that endpoint.
88
89
@@ -100,10 +101,21 @@ The table below shows, for each `provider` and target API endpoint, the upstream
100
101
101
102
¹ When `provider` is `openai` and the target is the Chat Completions endpoint, APISIX always rewrites to `max_completion_tokens` and removes any `max_tokens` field from the request body — `max_tokens` has been deprecated in favor of `max_completion_tokens` by OpenAI.
102
103
104
+
## Per-protocol request body override
105
+
106
+
`override.request_body` provides fine-grained, per-protocol control over the outgoing request body. Keys are target protocol names (`openai-chat`, `openai-responses`, `openai-embeddings`, `anthropic-messages`); values are partial JSON objects that are deep-merged into the outgoing body after protocol conversion.
107
+
108
+
Merge semantics:
109
+
110
+
- Both sides are plain objects (string-keyed) → recursive merge.
111
+
- Otherwise (scalar, array, type mismatch) → patch value replaces target value wholesale.
112
+
103
113
Priority between client request and override is controlled by `override.request_body_force_override`:
104
114
105
-
-`false` (default): if the client request body already sets the corresponding field, it is preserved; the override value only fills in when the field is missing.
106
-
-`true`: the override value forcefully overwrites the field in the client request body.
115
+
-`false` (default): if the client request body already sets the field, it is preserved; the override value only fills in when the field is missing.
116
+
-`true`: the override value forcefully overwrites the client field.
117
+
118
+
When both `llm_options` and `request_body` are configured, `llm_options` is applied first (always force), then `request_body` deep-merges on top. This means `request_body` can override fields set by `llm_options`.
0 commit comments