Skip to content

refactor(gatekeeper): Drop LiteLLM#494

Open
subpop wants to merge 7 commits into
rhel-lightspeed:mainfrom
subpop:native-openai-api-client
Open

refactor(gatekeeper): Drop LiteLLM#494
subpop wants to merge 7 commits into
rhel-lightspeed:mainfrom
subpop:native-openai-api-client

Conversation

@subpop

@subpop subpop commented Jun 2, 2026

Copy link
Copy Markdown
Member

Drop LightLLM and replace with custom, in-tree clients. The combination of custom clients maintain support for OpenAI compatible providers with both Responses and Chat Completion endpoints, Anthropic Messages API, and Gemini’s generateContent. Both direct API and GCP/Vertex AI backends are supported for APIs that are provided by Vertex AI.

There is a slight regression in that we no longer support OpenRouter’s native API and custom settings.

@subpop subpop requested a review from a team as a code owner June 2, 2026 13:49
@subpop subpop force-pushed the native-openai-api-client branch 2 times, most recently from 55df4de to 7551529 Compare June 2, 2026 16:08
@owtaylor

owtaylor commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

99% of the difficulty here is figuring out what the config schema should be!

We have prior art with authentication:

class Config(BaseSettings):
    auth: AuthConfig | None = None

class AuthConfig(BaseSettings):
    """Authentication configuration."""

    provider: AuthProvider | None = None
    google: GoogleAuthConfig | None = None
    github: GitHubAuthConfig | None = None
    jwt: JWTAuthConfig | None = None
    introspection: IntrospectionAuthConfig | None = None

class GoogleAuthConfig(BaseSettings):
    """Google OAuth authentication configuration."""

    client_id: str
    client_secret: SecretStr

So one writes:

LINUX_MCP_AUTH__PROVIDER=google
LINUX_MCP_AUTH__GOOGLE__CLIENT_ID="<...>"
LINUX_MCP_AUTH__GOOGLE__SECRET="<...>"

Or potentially:

LINUX_MCP_AUTH='{ "provider": "google", "google": { "client_id": "<...>", "client_secret": "<...>" }}'

Which is slightly different than what you did here because the google specific options move under the nested google object, while you instead document what settings are specific to particular providers or backends.

The extra factors for the gatekeeper model are:

  • There are some options that are shared between different models (temperature, reasoning_effort)
  • We have the split between the "provider" and "backend"

I'm also not in love with the name "provider" and "backend" - how is openrouter a "backend" when it routes things to somewhere else? I think "inference_gateway" or "gateway" would be a pretty descriptive name, but when I tried to figure out how that maps into config, I couldn't figure out a good handling for "openai compatible".

The proposal I'm going to make is that we just have a provider - which we autodetect where possible (gpt-*, claude-*)

  • openai
  • anthropic
  • openai_compatible
  • vertex_ai
  • openrouter

And some provides have provider-specific options. Bunch of worked examples:

LINUX_MCP_GATEKEEPER__MODEL="gpt-5.4"
LINUX_MCP_GATEKEEPER__PROVIDER="openai"  # would be autodetected

LINUX_MCP_GATEKEEPER__MODEL="openai/gpt-5.4"
LINUX_MCP_GATEKEEPER__PROVIDER="openrouter"

LINUX_MCP_GATEKEEPER__MODEL="gemini-3.5-flash"
LINUX_MCP_GATEKEEPER__REASONING_EFFORT="minimal"
LINUX_MCP_GATEKEEPER__PROVIDER="vertex_ai"
LINUX_MCP_GATEKEEPER__VERTEX_AI__PROJECT="rhel-lightspeed"

LINUX_MCP_GATEKEEPER__MODEL="qwen/qwen3.5-8b"
LINUX_MCP_GATEKEEPER__TEMPERATURE="0"
LINUX_MCP_GATEKEEPER__PROVIDER="openrouter"
LINUX_MCP_GATEKEEPER__OPENROUTER__QUANTIZATION="fp4"

LINUX_MCP_GATEKEEPER__MODEL="qwen/qwen3.5-8b"
LINUX_MCP_GATEKEEPER__PROVIDER="openai_compat"
LINUX_MCP_GATEKEEPER__OPENAI_COMPAT__BASE_URL="http://localhost:8080/v1"
LINUX_MCP_GATEKEEPER__OPENAI_COMPAT__TEMPLATE_KWARGS='{ "enable_thinking": true }'

The last is clunky, but clear. Maybe better as "generic" rather than "openai_compat"?

What about things like needing to use the anthropic API when using claude models through vertex_ai? My feeling is that we just autodetect and implement that internally. Because it's not exactly the same - there are often a few quirks. (Anthropic docs]). We could have a config parameter LINUX_MCP_GATEKEEPER__API=anthropic - but I'm not sure there is any real use case.

What if different endpoints want to share the same config - let's say that some other backend supported template_kwargs. Is that an argument for putting this type of thing on the base class - and documenting restrictions. Perhaps - I'm convincible - but IMO nesting parameters is clearer. We can always duplicate. I definitely want to keep something like the vertex AI location nested.

@owtaylor owtaylor left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Not a full review to let you figure out what you want for config and implement.

Comment thread eval/gatekeeper/standard-evals.sh Outdated
Comment on lines +320 to +322
LINUX_MCP_GATEKEEPER__BACKEND=vertex
vertex_location="${VERTEXAI_LOCATION:-global}"
vertex_openapi_base="https://aiplatform.googleapis.com/v1/projects/${VERTEXAI_PROJECT}/locations/${vertex_location}/endpoints/openapi"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do this generically for vertexai + openai API instead of having it here. Since we do a very similar thing for the google api and the anthropic api. (Do we need to support API base overrides for vertex_ai ... there are likely use cases, but maybe it can wait until someone asks for it?)

Comment thread src/linux_mcp_server/gatekeeper/anthropic_client.py Outdated
Comment thread src/linux_mcp_server/gatekeeper/check_run_script.py
Comment thread src/linux_mcp_server/gatekeeper/check_run_script.py Outdated
Comment thread src/linux_mcp_server/gatekeeper/http_utils.py Outdated
Comment thread src/linux_mcp_server/gatekeeper/openai_client.py Outdated
@subpop

subpop commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

I rewrote the config, added support for cost back in, switched to async HTTP client, restored max token capping, and removed chat completion API. They are all additional commits on this PR, since we squash on merge anyway. Hopefully they help with the review.

@subpop subpop force-pushed the native-openai-api-client branch 2 times, most recently from 37cc909 to 442d35f Compare June 18, 2026 16:55
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.19298% with 89 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/linux_mcp_server/gatekeeper/openai_client.py 71.42% 14 Missing and 2 partials ⚠️
src/linux_mcp_server/gatekeeper/pricing.py 87.70% 7 Missing and 8 partials ⚠️
...rc/linux_mcp_server/gatekeeper/anthropic_client.py 76.59% 4 Missing and 7 partials ⚠️
src/linux_mcp_server/gatekeeper/gcp_auth.py 54.16% 11 Missing ⚠️
src/linux_mcp_server/gatekeeper/llm.py 79.48% 4 Missing and 4 partials ⚠️
src/linux_mcp_server/gatekeeper/usage.py 72.41% 4 Missing and 4 partials ⚠️
src/linux_mcp_server/gatekeeper/schema.py 82.05% 3 Missing and 4 partials ⚠️
src/linux_mcp_server/gatekeeper/gemini_client.py 86.04% 3 Missing and 3 partials ⚠️
...c/linux_mcp_server/gatekeeper/openrouter_client.py 91.80% 2 Missing and 3 partials ⚠️
...rc/linux_mcp_server/gatekeeper/vertex_ai_client.py 98.63% 0 Missing and 1 partial ⚠️
... and 1 more
Flag Coverage Δ
unittests 96.33% <92.19%> (-1.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/linux_mcp_server/config.py 99.30% <100.00%> (+0.10%) ⬆️
...rc/linux_mcp_server/gatekeeper/check_run_script.py 100.00% <100.00%> (ø)
src/linux_mcp_server/gatekeeper/http_utils.py 100.00% <100.00%> (ø)
src/linux_mcp_server/models.py 100.00% <100.00%> (ø)
tests/conftest.py 95.50% <100.00%> (+0.05%) ⬆️
tests/gatekeeper/test_anthropic_client.py 100.00% <100.00%> (ø)
tests/gatekeeper/test_gcp_auth.py 100.00% <100.00%> (ø)
tests/gatekeeper/test_gemini_client.py 100.00% <100.00%> (ø)
tests/gatekeeper/test_http_utils.py 100.00% <100.00%> (ø)
tests/gatekeeper/test_llm.py 100.00% <100.00%> (ø)
... and 19 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@subpop subpop force-pushed the native-openai-api-client branch 2 times, most recently from 670dc91 to 674c62b Compare June 22, 2026 14:46
subpop added 7 commits June 23, 2026 16:30
Drop LightLLM and replace with custom, in-tree clients. The combination of custom clients maintain support for OpenAI compatible providers with both Responses and Chat Completion endpoints, Anthropic Messages API, and Gemini’s generateContent. Both direct API and GCP/Vertex AI backends are supported for APIs that are provided by Vertex AI.

There is a slight regression in that we no longer support OpenRouter’s native API and custom settings.
Add first-class support for the Vertex AI provider in the gatekeeper system. The configuration structure has been refactored to accommodate separate settings for each provider type.
Move provider-specific methods out of http_utils into their relevant
client file.
Passing it down into each provider's completion function.
- Introduced a new pricing module to compute costs based on token usage across different providers.
- Added functionality to extract usage statistics from responses for OpenAI, Anthropic, Gemini, and OpenRouter.
- Updated GatekeeperCompletion and GatekeeperStats models to include token counts and cost sources.
- Updated documentation to clarify cost estimation logic and configuration options.
- Added tests to ensure accurate cost calculations and usage extraction across various clients.
- Changed dependency from `requests` to `httpx`.
- Updated HTTP client from `requests` to `httpx` for asynchronous capabilities.
- Converted completion functions for OpenAI, Anthropic, Gemini, OpenRouter, and Vertex AI to async.
- Added a new check in documentation to ensure async functions are only decorated when `asyncio_mode` is not set to auto.
@subpop subpop force-pushed the native-openai-api-client branch from 674c62b to e85f661 Compare June 23, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants