refactor(gatekeeper): Drop LiteLLM#494
Conversation
55df4de to
7551529
Compare
|
99% of the difficulty here is figuring out what the config schema should be! We have prior art with authentication: class Config(BaseSettings):
auth: AuthConfig | None = None
class AuthConfig(BaseSettings):
"""Authentication configuration."""
provider: AuthProvider | None = None
google: GoogleAuthConfig | None = None
github: GitHubAuthConfig | None = None
jwt: JWTAuthConfig | None = None
introspection: IntrospectionAuthConfig | None = None
class GoogleAuthConfig(BaseSettings):
"""Google OAuth authentication configuration."""
client_id: str
client_secret: SecretStrSo one writes: LINUX_MCP_AUTH__PROVIDER=google
LINUX_MCP_AUTH__GOOGLE__CLIENT_ID="<...>"
LINUX_MCP_AUTH__GOOGLE__SECRET="<...>"Or potentially: LINUX_MCP_AUTH='{ "provider": "google", "google": { "client_id": "<...>", "client_secret": "<...>" }}'Which is slightly different than what you did here because the google specific options move under the nested google object, while you instead document what settings are specific to particular providers or backends. The extra factors for the gatekeeper model are:
I'm also not in love with the name "provider" and "backend" - how is openrouter a "backend" when it routes things to somewhere else? I think "inference_gateway" or "gateway" would be a pretty descriptive name, but when I tried to figure out how that maps into config, I couldn't figure out a good handling for "openai compatible". The proposal I'm going to make is that we just have a provider - which we autodetect where possible (
And some provides have provider-specific options. Bunch of worked examples: LINUX_MCP_GATEKEEPER__MODEL="gpt-5.4"
LINUX_MCP_GATEKEEPER__PROVIDER="openai" # would be autodetected
LINUX_MCP_GATEKEEPER__MODEL="openai/gpt-5.4"
LINUX_MCP_GATEKEEPER__PROVIDER="openrouter"
LINUX_MCP_GATEKEEPER__MODEL="gemini-3.5-flash"
LINUX_MCP_GATEKEEPER__REASONING_EFFORT="minimal"
LINUX_MCP_GATEKEEPER__PROVIDER="vertex_ai"
LINUX_MCP_GATEKEEPER__VERTEX_AI__PROJECT="rhel-lightspeed"
LINUX_MCP_GATEKEEPER__MODEL="qwen/qwen3.5-8b"
LINUX_MCP_GATEKEEPER__TEMPERATURE="0"
LINUX_MCP_GATEKEEPER__PROVIDER="openrouter"
LINUX_MCP_GATEKEEPER__OPENROUTER__QUANTIZATION="fp4"
LINUX_MCP_GATEKEEPER__MODEL="qwen/qwen3.5-8b"
LINUX_MCP_GATEKEEPER__PROVIDER="openai_compat"
LINUX_MCP_GATEKEEPER__OPENAI_COMPAT__BASE_URL="http://localhost:8080/v1"
LINUX_MCP_GATEKEEPER__OPENAI_COMPAT__TEMPLATE_KWARGS='{ "enable_thinking": true }'The last is clunky, but clear. Maybe better as "generic" rather than "openai_compat"? What about things like needing to use the anthropic API when using claude models through vertex_ai? My feeling is that we just autodetect and implement that internally. Because it's not exactly the same - there are often a few quirks. (Anthropic docs]). We could have a config parameter What if different endpoints want to share the same config - let's say that some other backend supported template_kwargs. Is that an argument for putting this type of thing on the base class - and documenting restrictions. Perhaps - I'm convincible - but IMO nesting parameters is clearer. We can always duplicate. I definitely want to keep something like the vertex AI location nested. |
owtaylor
left a comment
There was a problem hiding this comment.
Some comments. Not a full review to let you figure out what you want for config and implement.
| LINUX_MCP_GATEKEEPER__BACKEND=vertex | ||
| vertex_location="${VERTEXAI_LOCATION:-global}" | ||
| vertex_openapi_base="https://aiplatform.googleapis.com/v1/projects/${VERTEXAI_PROJECT}/locations/${vertex_location}/endpoints/openapi" |
There was a problem hiding this comment.
I think we can do this generically for vertexai + openai API instead of having it here. Since we do a very similar thing for the google api and the anthropic api. (Do we need to support API base overrides for vertex_ai ... there are likely use cases, but maybe it can wait until someone asks for it?)
|
I rewrote the config, added support for cost back in, switched to async HTTP client, restored max token capping, and removed chat completion API. They are all additional commits on this PR, since we squash on merge anyway. Hopefully they help with the review. |
37cc909 to
442d35f
Compare
670dc91 to
674c62b
Compare
Drop LightLLM and replace with custom, in-tree clients. The combination of custom clients maintain support for OpenAI compatible providers with both Responses and Chat Completion endpoints, Anthropic Messages API, and Gemini’s generateContent. Both direct API and GCP/Vertex AI backends are supported for APIs that are provided by Vertex AI. There is a slight regression in that we no longer support OpenRouter’s native API and custom settings.
Add first-class support for the Vertex AI provider in the gatekeeper system. The configuration structure has been refactored to accommodate separate settings for each provider type.
Move provider-specific methods out of http_utils into their relevant client file.
Passing it down into each provider's completion function.
- Introduced a new pricing module to compute costs based on token usage across different providers. - Added functionality to extract usage statistics from responses for OpenAI, Anthropic, Gemini, and OpenRouter. - Updated GatekeeperCompletion and GatekeeperStats models to include token counts and cost sources. - Updated documentation to clarify cost estimation logic and configuration options. - Added tests to ensure accurate cost calculations and usage extraction across various clients.
- Changed dependency from `requests` to `httpx`. - Updated HTTP client from `requests` to `httpx` for asynchronous capabilities. - Converted completion functions for OpenAI, Anthropic, Gemini, OpenRouter, and Vertex AI to async. - Added a new check in documentation to ensure async functions are only decorated when `asyncio_mode` is not set to auto.
674c62b to
e85f661
Compare
Drop LightLLM and replace with custom, in-tree clients. The combination of custom clients maintain support for OpenAI compatible providers with both Responses and Chat Completion endpoints, Anthropic Messages API, and Gemini’s generateContent. Both direct API and GCP/Vertex AI backends are supported for APIs that are provided by Vertex AI.
There is a slight regression in that we no longer support OpenRouter’s native API and custom settings.