Skip to content

Commit a8a9ca3

Browse files
Ai gateway (#2937)
* feat: add AI API usage tracking and management components * feat: implement OpenAI API interceptor with usage tracking and token limits * feat: improve concurrency for API rate limits and enhance error handling - Introduced thread-safe `AtomicLong` for token management. - Synchronized reset logic in `AiApiLimit`. - Improved error handling and null checks in OpenAI API interactions. - Default-initialized user list in `SimpleAiApiStore`. - Fixed getter for `AiApiStore` in interceptor. * refactor: make `tokens` in `AiApiLimit` final to improve immutability * feat: modularize AI providers and enhance OpenAI interceptor - Removed `AiUtil` and replaced with modular `AiProvider` interface. - Added provider implementations: `Claude`, `OpenAI`. - Updated `OpenAIAPIInterceptor` to use configurable providers and enforce model restrictions. - Introduced `NoAiApiLimit` for simplified limit management. - Enhanced error handling with model validation in `OpenAiApiUtil`. * feat: implement modular AI provider framework with request/response abstraction - Added `AiApiRequest` and `AiApiResponse` abstractions for request/response handling. - Introduced `AbstractAiApiRequest` and `AbstractAiApiResponse` as base classes. - Implemented providers: `Google`, `OpenAI`, and `Claude` with concrete request/response handling. - Updated `AiProvider` to handle request/response creation. - Refactored `OpenAIAPIInterceptor` to leverage request/response abstraction and enforce contract restrictions. - Enhanced `JsonUtil` with helper methods for JSON body parsing and updates. - Updated `AiApiStore` and related classes for improved usage tracking and user abstraction. * feat: enhance token limit management and error handling across AI components - Updated `checkLimit` method to consider input and output tokens. - Improved token calculation logic in AI request providers. - Enhanced JSON parsing in `JsonUtil` with Optional for safer operations. - Added detailed error handling in `OpenAIAPIInterceptor` for invalid requests. - Refined token estimation logic with safety margins and JSON structure considerations. * feat: improve concurrency and logging for API rate limit management - Synchronized token management methods in `AiApiLimit` to ensure thread safety. - Adjusted log levels for `SimpleAiApiStore` to reduce verbosity. - Added PostgreSQL dependency to the distribution. - Updated logging configuration to set debug level for AI interceptors. * docs: clarify parameter description in `checkLimit` method * feat: add SSE event parsing and improve token handling - Introduced `SSEUtil` for parsing Server-Sent Events (SSE) from chunks. - Enhanced `AbstractAiApiRequest` to handle JSON requests conditionally. - Deprecated and replaced `max_output_tokens` usage in specific providers. - Improved stream support in `OpenAiAiRequest` with response usage tracking. - Refactored token limit logic in `OpenAIAPIInterceptor` for better flow. * refactor: rename AI classes and interfaces for consistency - Renamed `AiApiRequest` to `LLMRequest` and `AiApiResponse` to `LLMResponse`. - Updated providers (`Google`, `OpenAI`, `Claude`) to align with `LLMProvider` interface. - Refactored `OpenAIAPIInterceptor` to `LLMGatewayInterceptor` and related utilities. - Removed `SSEUtil` and replaced with `SSEParser`. - Improved streaming and token usage handling in `AbstractLLMResponse`. * refactor: remove unused `terminalEvent` method from `LLMApiUtil` * feat: refactor LLM APIs and improve SSE-driven event handling - Modularized LLM responses for `Claude` and `OpenAI` providers. - Replaced `LLMResponse` interface and `AbstractLLMResponse` with updated abstractions. - Added `ChatCompletionsSSEParser` for advanced SSE chunk handling. - Introduced specific SSE event classes: `ChatCompletionEvent`, `ChatCompletionDoneEvent`, `ResponsesApiEvent`. - Renamed and restructured classes for consistency in AI namespace. - Improved token usage tracking and event-based streaming. * refactor: remove `ChatCompletionsSSEParser` and unused token usage tracking logic - Deleted `ChatCompletionsSSEParser` and related classes/methods. - Simplified `ChatCompletionEvent` by removing token usage parsing. - Updated tool extraction logic in `AbstractLLMRequest` to handle function-specific tools. * chore: add TODO for handling client-provided API key if config key is absent * feat: enhance LLM response handling with modular classes and improved usage tracking - Added `OpenAiLLMResponsesAPIResponse` for handling OpenAI Responses API. - Refactored `OpenAiLLMResponse` to `OpenAiChatCompletionsLLMResponse`. - Improved token usage calculations and SSE event processing. - Updated `OpenAIProvider` to differentiate between Responses API and Chat Completions. * feat: add logging for non-JSON requests in `AbstractLLMRequest` - Introduced SLF4J logger to `AbstractLLMRequest`. - Added log message for handling non-JSON requests. - Improved exception handling with informative runtime error. * refactor: replace `System.out.println` with proper debug logging in OpenAI response handlers * feat: introduce Basic LLM Gateway tutorial and enhance OpenAI LLM handling - Added `10-Basic-LLM-Gateway.yaml` tutorial for setting up a basic LLM gateway. - Introduced new classes `AbstractOpenAiLLMRequest` and `OpenAiLLMChatCompletionsRequest` for modularizing token estimation and API handling. - Improved token usage tracking with client-requested max output tokens. - Added detailed inline documentation across AI-related classes for better maintainability. - Updated `membrane.cmd` and `membrane.sh` for enhanced gateway setup. * refactor: remove redundant semicolon in `processTerminalEvent` method * refactor: simplify logic for API response handling and token usage tracking - Removed redundant `isResponsesAPI` variable in `OpenAIProvider`. - Optimized tool extraction in `OpenAiLLMResponsesRequest` and `OpenAiLLMChatCompletionsRequest`. - Updated `AiApiLimit` to support unlimited tokens with `MAX_VALUE`. - Replaced `token` with `apiKey` in `AiApiUser` along with added `tokens` field. - Improved JSON parsing logic in `JsonUtil` with better exception handling and logging. - Adjusted output token parameter naming in multiple request classes for consistency. * refactor: remove outdated AI API limit management and centralize error handling - Deleted `AiApiLimit` and `NoAiApiLimit` classes, consolidating token management into `SimpleAiApiStore`. - Introduced `LLMErrorCreator` and its implementations (`OpenAiErrorCreator`, etc.) for reusable error generation. - Refactored `LLMGatewayInterceptor` to utilize provider-specific error creators, simplifying token and model validation. - Enhanced `SimpleAiApiStore` with token reset functionality and user-specific token tracking. - Updated tutorials and examples to align with this refactored approach. * refactor: enhance token usage tracking and improve inline documentation - Added `resetTokensUsedInPeriod` for user-specific token reset in `SimpleAiApiStore`. - Improved inline documentation for methods across `AiApiUser`, `AiApiStore`, and `LLMGatewayInterceptor`. - Updated parameter descriptions for clarity and consistency. * feat: enhance error handling, synchronization, and token tracking - Added synchronized blocks to `SimpleAiApiStore` for thread-safe access to user data. - Introduced `invalidRequestError` to `LLMErrorCreator` and implemented it in `OpenAiErrorCreator`. - Allowed unlimited tokens for users with `MAX_VALUE` in `AiApiUser`. - Simplified logic in `LLMGatewayInterceptor` for token and model validation. - Updated tutorial JSON with test input for validation. * feat: extend LLM Gateway with Claude support and improved error handling - Added Claude-specific error handling with `ClaudeErrorCreator` and `ClaudeErrorResponse`. - Introduced `10-Basic-LLM-Gateway.yaml` tutorial for Claude integration. - Enhanced token usage tracking in `ClaudeLLMResponse`. - Updated examples and tutorials to support both OpenAI and Claude. * refactor: improve parameter documentation in `LLMErrorCreator` * chore: add Apache 2.0 license headers to core files * feat: add Google Gemini and enhance Claude tutorials with API key sharing and token limit examples - Added `10-Basic-LLM-Gateway.yaml` and `20-Sharing-API-Keys.yaml` tutorials for Google Gemini. - Enhanced Claude tutorials with improved key handling and token limit examples. - Introduced `GoogleErrorCreator` for detailed error handling in Google LLM Gateway. - Updated `LLMGatewayInterceptor` and token tracking logic to reflect effective max token handling. - Modified existing OpenAI and Claude examples for consistency and clarity. * feat: add AI LLM Gateway tests for Claude, OpenAI, and Google Gemini tutorials - Introduced `AbstractAiTutorialTest` base class and provider-specific extensions for easier test creation. - Added integration tests for basic gateway setups and API key sharing for Claude, OpenAI, and Google Gemini. - Simulated upstream mock APIs to enable testing token limits, key handling, and input/output transformations. * feat: improve token handling, configuration validation, and examples for LLM Gateway - Ensure thread-safe access to users in `SimpleAiApiStore` with `List.copyOf`. - Introduce `visibleRemaining` to handle non-negative token values in `GoogleErrorCreator`. - Add configuration validation in `LLMGatewayInterceptor` to enforce API key substitution. - Enhance token limit handling to adjust output tokens dynamically in `LLMGatewayInterceptor`. - Update Google and Claude tutorials with clearer instructions for API key usage and token limits. * feat: add streaming integration tests for OpenAI in LLM Gateway - Introduced `StreamingOpenAiLLMGatewayTutorialTest` with SSE mocking and validation. - Added JSON fixtures (`stream.json`, `max-output-stream.json`) for testing streaming requests. - Enhanced base test framework to support `text/event-stream` responses. - Updated `LLMGatewayInterceptor` to handle streaming scenarios with capped tokens. * feat: standardize API key handling and logging in LLM Gateway tests - Replaced raw API key placeholders with `TEST_API_KEY` constant in tutorial tests to ensure consistency. - Added `TEST_API_KEY` to `AbstractAiTutorialTest` for upstream key substitution verification. - Updated `log4j2.xml` to limit logging to `com.predic8.membrane.core.interceptor.ai`. - Introduced PostgreSQL dependency in `pom.xml` for future enhancements. * Update tutorial to use Anthropic-specific API key and headers * Fix handling of invalid max output token requests in LLMGatewayInterceptor * refactor: migrate OpenAI provider to Chat Completions framework and add policies support - Unified OpenAI and Chat Completions error handling under `ChatCompletionsErrorCreator`. - Deprecated older OpenAI-specific classes in favor of `ChatCompletions` equivalents. - Introduced detailed usage policies handling in `LLMGatewayInterceptor`. - Updated YAML tutorials to reflect the new `policies` configuration model. * feat: introduce `Policies` class and update LLM Gateway to support policy-based token and model restrictions - Added `Policies` class for defining restrictions on models, input tokens, and output tokens in the LLM Gateway. - Replaced `maxInputTokens` and `maxOutputTokens` fields in `LLMGatewayInterceptor` with `Policies`. - Updated YAML tutorials (OpenAI, Claude, Google) to use the new `policies` configuration. * feat: refactor policies and introduce system prompt support - Replaced `Policies` class implementation with `DefaultPolicies` and `NullPolicies` for enhanced flexibility. - Added `SystemPrompt` class to support dynamic system prompt management in LLM Gateway. - Updated `LLMGatewayInterceptor` to delegate policy enforcement and system prompt handling to respective components. - Extended providers (OpenAI, Claude, Google Gemini) with standardized system prompt methods (`getSystemPrompt`, `setSystemPrompt`, `removeSystemPrompt`). - Enhanced test coverage with `AbstractLLMRequestTest` for API key handling and bearer token case insensitivity. * feat: extend `SystemPrompt` with new actions and update tests - Added `setSystemPrompt`, `removeSystemPrompt`, and `isChatCompletion` methods for enhanced prompt management. - Refactored `SystemPrompt.Action` to remove unused `REJECT` action. - Updated `AbstractLLMRequestTest` to validate new `SystemPrompt` behaviors. --------- Co-authored-by: Christian Gördes <christian.goerdes@outlook.de> Co-authored-by: Christian Gördes <118011644+christiangoerdes@users.noreply.github.com>
1 parent 2f20108 commit a8a9ca3

115 files changed

Lines changed: 6149 additions & 48 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

annot/src/main/java/com/predic8/membrane/annot/yaml/parsing/binding/ObjectBinder.java

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,7 @@
3333
import java.util.List;
3434
import java.util.Objects;
3535

36-
import static com.predic8.membrane.annot.yaml.McYamlIntrospector.findRequiredSetters;
37-
import static com.predic8.membrane.annot.yaml.McYamlIntrospector.findSingleSetterOrNullForAnnotation;
38-
import static com.predic8.membrane.annot.yaml.McYamlIntrospector.getSingleChildSetter;
39-
import static com.predic8.membrane.annot.yaml.McYamlIntrospector.isCollapsed;
40-
import static com.predic8.membrane.annot.yaml.McYamlIntrospector.isNoEnvelope;
36+
import static com.predic8.membrane.annot.yaml.McYamlIntrospector.*;
4137
import static com.predic8.membrane.annot.yaml.NodeValidationUtils.ensureMappingStart;
4238

4339
public final class ObjectBinder {
@@ -49,7 +45,8 @@ public final class ObjectBinder {
4945

5046
public static <T> T bind(ParsingContext<?> pc, Class<T> clazz, JsonNode node) throws ConfigurationParsingException {
5147
try {
52-
T configObj = clazz.getConstructor().newInstance();
48+
T configObj = instantiate(clazz);
49+
5350
BeanDefinition currentBeanDefinition = BeanDefinitionContext.current();
5451
if (currentBeanDefinition != null && pc.getRegistry() != null) {
5552
pc.getRegistry().rememberBeanDefinition(configObj, currentBeanDefinition);
@@ -102,6 +99,14 @@ public static <T> T bind(ParsingContext<?> pc, Class<T> clazz, JsonNode node) th
10299
}
103100
}
104101

102+
private static <T> @NotNull T instantiate(Class<T> clazz) throws InvocationTargetException, InstantiationException, IllegalAccessException {
103+
try {
104+
return clazz.getConstructor().newInstance();
105+
} catch (NoSuchMethodException e) {
106+
throw new ConfigurationParsingException("Class %s does not have a public no-arg constructor.".formatted(clazz.getName()));
107+
}
108+
}
109+
105110
private static <T> @NotNull T handleCollapsed(ParsingContext<?> ctx, Class<T> clazz, JsonNode node, T configObj) {
106111
if (node.isNull())
107112
throw new ConfigurationParsingException("Collapsed element must not be null.");
@@ -117,7 +122,6 @@ private static <T> T handleNoEnvelopeList(ParsingContext<?> pc, Class<T> clazz,
117122
return configObj;
118123
}
119124

120-
@SuppressWarnings("ConstantValue")
121125
private static <T> void applyCollapsedScalar(Class<T> clazz, JsonNode node, T target) {
122126
Method attributeSetter = findSingleSetterOrNullForAnnotation(clazz, MCAttribute.class);
123127
Method textSetter = findSingleSetterOrNullForAnnotation(clazz, MCTextContent.class);
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
/* Copyright 2026 predic8 GmbH, www.predic8.com
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License. */
14+
15+
package com.predic8.membrane.core.interceptor.llmgateway;
16+
17+
import com.fasterxml.jackson.databind.JsonNode;
18+
import com.fasterxml.jackson.databind.ObjectMapper;
19+
import com.predic8.membrane.core.util.http.SSEParser;
20+
import com.predic8.membrane.core.util.json.JsonUtil;
21+
import org.slf4j.Logger;
22+
import org.slf4j.LoggerFactory;
23+
24+
public abstract class AbstractLLMEvent {
25+
26+
private static final Logger log = LoggerFactory.getLogger(AbstractLLMEvent.class);
27+
28+
protected static final ObjectMapper om = new ObjectMapper();
29+
30+
protected final JsonNode json;
31+
32+
protected AbstractLLMEvent(JsonNode json) {
33+
this.json = json;
34+
}
35+
36+
public abstract String getType();
37+
38+
public JsonNode getJson() {
39+
return json;
40+
}
41+
42+
public static AbstractLLMEvent create(SSEParser.SSEEvent sse) {
43+
44+
if ("[DONE]".equals(sse.data())) {
45+
return new ChatCompletionDoneEvent();
46+
}
47+
48+
var opt = JsonUtil.getJsonObject(sse.data());
49+
if (opt.isEmpty()) {
50+
log.info("Unknown event format: {}", sse.data());
51+
}
52+
53+
var json = opt.get();
54+
55+
// Responses API
56+
if (json.has("type")) {
57+
return new ResponsesApiEvent(json);
58+
}
59+
60+
// Chat Completions API
61+
if ("chat.completion.chunk".equals(json.path("object").asText())) {
62+
return new ChatCompletionEvent(json);
63+
}
64+
65+
log.debug("Unknown event format: {}", json);
66+
67+
return null;
68+
}
69+
}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
/* Copyright 2026 predic8 GmbH, www.predic8.com
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License. */
14+
15+
package com.predic8.membrane.core.interceptor.llmgateway;
16+
17+
import com.fasterxml.jackson.databind.node.NullNode;
18+
19+
public class ChatCompletionDoneEvent extends AbstractLLMEvent {
20+
21+
public ChatCompletionDoneEvent() {
22+
super(NullNode.getInstance());
23+
}
24+
25+
@Override
26+
public String getType() {
27+
return "chat.completion.done";
28+
}
29+
}
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
/* Copyright 2026 predic8 GmbH, www.predic8.com
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License. */
14+
15+
package com.predic8.membrane.core.interceptor.llmgateway;
16+
17+
import com.fasterxml.jackson.databind.JsonNode;
18+
import org.slf4j.Logger;
19+
import org.slf4j.LoggerFactory;
20+
21+
public class ChatCompletionEvent extends AbstractLLMEvent {
22+
23+
private static final Logger log = LoggerFactory.getLogger(ChatCompletionEvent.class);
24+
25+
public ChatCompletionEvent(JsonNode json) {
26+
super(json);
27+
28+
parseChoices(json);
29+
30+
}
31+
32+
33+
private static void parseChoices(JsonNode json) {
34+
for (JsonNode choice : json.path("choices")) {
35+
36+
JsonNode delta = choice.path("delta");
37+
38+
if (delta.has("content")) {
39+
log.debug("Content delta: {}",
40+
delta.path("content").asText());
41+
}
42+
43+
if (delta.has("tool_calls")) {
44+
45+
for (JsonNode tc : delta.path("tool_calls")) {
46+
47+
JsonNode fn = tc.path("function");
48+
49+
if (fn.has("name")) {
50+
log.debug("Tool call name delta: {}",
51+
fn.path("name").asText());
52+
}
53+
54+
if (fn.has("arguments")) {
55+
log.debug("Tool call arguments delta: {}",
56+
fn.path("arguments").asText());
57+
}
58+
}
59+
}
60+
61+
String finishReason = choice.path("finish_reason").asText(null);
62+
63+
if (finishReason != null && !"null".equals(finishReason)) {
64+
log.debug("Finish reason: {}", finishReason);
65+
}
66+
}
67+
}
68+
69+
@Override
70+
public String getType() {
71+
return "chat.completion.chunk";
72+
}
73+
74+
public JsonNode getChoices() {
75+
return json.path("choices");
76+
}
77+
}
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
/* Copyright 2026 predic8 GmbH, www.predic8.com
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License. */
14+
15+
package com.predic8.membrane.core.interceptor.llmgateway;
16+
17+
import com.predic8.membrane.annot.MCAttribute;
18+
import com.predic8.membrane.annot.MCElement;
19+
import com.predic8.membrane.core.exchange.Exchange;
20+
import com.predic8.membrane.core.interceptor.Outcome;
21+
import com.predic8.membrane.core.interceptor.llmgateway.provider.LLMErrorCreator;
22+
import com.predic8.membrane.core.interceptor.llmgateway.provider.LLMRequest;
23+
import org.slf4j.Logger;
24+
import org.slf4j.LoggerFactory;
25+
26+
import java.util.List;
27+
28+
import static com.predic8.membrane.core.interceptor.Outcome.CONTINUE;
29+
import static com.predic8.membrane.core.interceptor.Outcome.RETURN;
30+
31+
/**
32+
* @description LLM Gateway policies for token usage and model restrictions.
33+
*/
34+
@MCElement(name = "policies", id="llm-gateway-policies")
35+
public class DefaultPolicies implements Policies {
36+
37+
private static final Logger log = LoggerFactory.getLogger(LLMGatewayInterceptor.class);
38+
39+
private LLMErrorCreator errorCreator;
40+
41+
private List<String> models;
42+
private int maxOutputTokens;
43+
private int maxInputTokens;
44+
45+
public void init(LLMErrorCreator errorCreator) {
46+
this.errorCreator = errorCreator;
47+
}
48+
49+
public Outcome handleRequest(LLMRequest aiReq, Exchange exc) {
50+
51+
var requestedMaxOutputTokens = aiReq.getRequestedMaxOutputTokens();
52+
var inputTokens = aiReq.estimateInputTokens();
53+
54+
if (maxOutputTokens > 0) {
55+
if (requestedMaxOutputTokens <= 0) {
56+
log.info("No max. output requested. Setting limit to {}.", maxOutputTokens);
57+
aiReq.setMaxOutputTokens(maxOutputTokens);
58+
} else if (requestedMaxOutputTokens > maxOutputTokens) {
59+
log.info("Requested max. output tokens {} exceed the limit. Setting limit to {}.", requestedMaxOutputTokens, maxOutputTokens);
60+
aiReq.setMaxOutputTokens(maxOutputTokens);
61+
}
62+
}
63+
64+
if (maxInputTokens != 0) {
65+
if (inputTokens > maxInputTokens) {
66+
log.info("Input tokens {} exceed the limit of {}.", inputTokens, maxInputTokens);
67+
exc.setResponse(errorCreator.inputTokensExceeded(maxInputTokens, inputTokens));
68+
return RETURN;
69+
}
70+
}
71+
72+
if (models != null) {
73+
var model = aiReq.getModel();
74+
if (!models.contains(model)) {
75+
exc.setResponse(errorCreator.modelNotAllowed(model, models));
76+
return RETURN;
77+
}
78+
}
79+
80+
return CONTINUE;
81+
}
82+
83+
public List<String> getModels() {
84+
return models;
85+
}
86+
87+
/**
88+
* @param models List of models that can be used by the gateway.
89+
* @desciption Restricts the models that can be used by the gateway.
90+
* @default null (no restriction)
91+
*/
92+
@MCAttribute
93+
public void setModels(List<String> models) {
94+
this.models = models;
95+
}
96+
97+
98+
public int getMaxOutputTokens() {
99+
return maxOutputTokens;
100+
}
101+
102+
/**
103+
* @param maxOutputTokens Maximum number of tokens the LLM should use to generate a response.
104+
* @description Maximum number of tokens the LLM should use to generate a response. This is just a hint that the gateway
105+
* sends to the LLM provider. The provider may use a different limit.
106+
* @default 0 (unlimited)
107+
*/
108+
@MCAttribute
109+
public void setMaxOutputTokens(int maxOutputTokens) {
110+
this.maxOutputTokens = maxOutputTokens;
111+
}
112+
113+
public int getMaxInputTokens() {
114+
return maxInputTokens;
115+
}
116+
117+
/**
118+
* @param maxInputTokens Maximum number of tokens that a request can use.
119+
* @description Restricts token usage for the input. The size of the input is estimated by gateway based on the request size.
120+
* Actual token usage may be deviate from this value.
121+
*/
122+
@MCAttribute
123+
public void setMaxInputTokens(int maxInputTokens) {
124+
this.maxInputTokens = maxInputTokens;
125+
}
126+
}

0 commit comments

Comments
 (0)