refactor(gatekeeper): Drop LiteLLM by subpop · Pull Request #494 · rhel-lightspeed/linux-mcp-server

subpop · 2026-06-02T13:49:35Z

Drop LightLLM and replace with custom, in-tree clients. The combination of custom clients maintain support for OpenAI compatible providers with both Responses and Chat Completion endpoints, Anthropic Messages API, and Gemini’s generateContent. Both direct API and GCP/Vertex AI backends are supported for APIs that are provided by Vertex AI.

There is a slight regression in that we no longer support OpenRouter’s native API and custom settings.

owtaylor · 2026-06-03T19:26:56Z

99% of the difficulty here is figuring out what the config schema should be!

We have prior art with authentication:

class Config(BaseSettings):
    auth: AuthConfig | None = None

class AuthConfig(BaseSettings):
    """Authentication configuration."""

    provider: AuthProvider | None = None
    google: GoogleAuthConfig | None = None
    github: GitHubAuthConfig | None = None
    jwt: JWTAuthConfig | None = None
    introspection: IntrospectionAuthConfig | None = None

class GoogleAuthConfig(BaseSettings):
    """Google OAuth authentication configuration."""

    client_id: str
    client_secret: SecretStr

So one writes:

LINUX_MCP_AUTH__PROVIDER=google
LINUX_MCP_AUTH__GOOGLE__CLIENT_ID="<...>"
LINUX_MCP_AUTH__GOOGLE__SECRET="<...>"

Or potentially:

LINUX_MCP_AUTH='{ "provider": "google", "google": { "client_id": "<...>", "client_secret": "<...>" }}'

Which is slightly different than what you did here because the google specific options move under the nested google object, while you instead document what settings are specific to particular providers or backends.

The extra factors for the gatekeeper model are:

There are some options that are shared between different models (temperature, reasoning_effort)
We have the split between the "provider" and "backend"

I'm also not in love with the name "provider" and "backend" - how is openrouter a "backend" when it routes things to somewhere else? I think "inference_gateway" or "gateway" would be a pretty descriptive name, but when I tried to figure out how that maps into config, I couldn't figure out a good handling for "openai compatible".

The proposal I'm going to make is that we just have a provider - which we autodetect where possible (gpt-*, claude-*)

openai
anthropic
openai_compatible
vertex_ai
openrouter

And some provides have provider-specific options. Bunch of worked examples:

LINUX_MCP_GATEKEEPER__MODEL="gpt-5.4"
LINUX_MCP_GATEKEEPER__PROVIDER="openai"  # would be autodetected

LINUX_MCP_GATEKEEPER__MODEL="openai/gpt-5.4"
LINUX_MCP_GATEKEEPER__PROVIDER="openrouter"

LINUX_MCP_GATEKEEPER__MODEL="gemini-3.5-flash"
LINUX_MCP_GATEKEEPER__REASONING_EFFORT="minimal"
LINUX_MCP_GATEKEEPER__PROVIDER="vertex_ai"
LINUX_MCP_GATEKEEPER__VERTEX_AI__PROJECT="rhel-lightspeed"

LINUX_MCP_GATEKEEPER__MODEL="qwen/qwen3.5-8b"
LINUX_MCP_GATEKEEPER__TEMPERATURE="0"
LINUX_MCP_GATEKEEPER__PROVIDER="openrouter"
LINUX_MCP_GATEKEEPER__OPENROUTER__QUANTIZATION="fp4"

LINUX_MCP_GATEKEEPER__MODEL="qwen/qwen3.5-8b"
LINUX_MCP_GATEKEEPER__PROVIDER="openai_compat"
LINUX_MCP_GATEKEEPER__OPENAI_COMPAT__BASE_URL="http://localhost:8080/v1"
LINUX_MCP_GATEKEEPER__OPENAI_COMPAT__TEMPLATE_KWARGS='{ "enable_thinking": true }'

The last is clunky, but clear. Maybe better as "generic" rather than "openai_compat"?

What about things like needing to use the anthropic API when using claude models through vertex_ai? My feeling is that we just autodetect and implement that internally. Because it's not exactly the same - there are often a few quirks. (Anthropic docs]). We could have a config parameter LINUX_MCP_GATEKEEPER__API=anthropic - but I'm not sure there is any real use case.

What if different endpoints want to share the same config - let's say that some other backend supported template_kwargs. Is that an argument for putting this type of thing on the base class - and documenting restrictions. Perhaps - I'm convincible - but IMO nesting parameters is clearer. We can always duplicate. I definitely want to keep something like the vertex AI location nested.

owtaylor

Some comments. Not a full review to let you figure out what you want for config and implement.

owtaylor · 2026-06-03T19:42:24Z

+        LINUX_MCP_GATEKEEPER__BACKEND=vertex
+        vertex_location="${VERTEXAI_LOCATION:-global}"
+        vertex_openapi_base="https://aiplatform.googleapis.com/v1/projects/${VERTEXAI_PROJECT}/locations/${vertex_location}/endpoints/openapi"


I think we can do this generically for vertexai + openai API instead of having it here. Since we do a very similar thing for the google api and the anthropic api. (Do we need to support API base overrides for vertex_ai ... there are likely use cases, but maybe it can wait until someone asks for it?)

subpop · 2026-06-18T15:51:45Z

I rewrote the config, added support for cost back in, switched to async HTTP client, restored max token capping, and removed chat completion API. They are all additional commits on this PR, since we squash on merge anyway. Hopefully they help with the review.

codecov · 2026-06-18T16:58:16Z

Codecov Report

❌ Patch coverage is 92.19298% with 89 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/linux_mcp_server/gatekeeper/openai_client.py	71.42%	14 Missing and 2 partials ⚠️
src/linux_mcp_server/gatekeeper/pricing.py	87.70%	7 Missing and 8 partials ⚠️
...rc/linux_mcp_server/gatekeeper/anthropic_client.py	76.59%	4 Missing and 7 partials ⚠️
src/linux_mcp_server/gatekeeper/gcp_auth.py	54.16%	11 Missing ⚠️
src/linux_mcp_server/gatekeeper/llm.py	79.48%	4 Missing and 4 partials ⚠️
src/linux_mcp_server/gatekeeper/usage.py	72.41%	4 Missing and 4 partials ⚠️
src/linux_mcp_server/gatekeeper/schema.py	82.05%	3 Missing and 4 partials ⚠️
src/linux_mcp_server/gatekeeper/gemini_client.py	86.04%	3 Missing and 3 partials ⚠️
...c/linux_mcp_server/gatekeeper/openrouter_client.py	91.80%	2 Missing and 3 partials ⚠️
...rc/linux_mcp_server/gatekeeper/vertex_ai_client.py	98.63%	0 Missing and 1 partial ⚠️
... and 1 more

Flag	Coverage Δ
unittests	`96.33% <92.19%> (-1.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/linux_mcp_server/config.py	`99.30% <100.00%> (+0.10%)`	⬆️
...rc/linux_mcp_server/gatekeeper/check_run_script.py	`100.00% <100.00%> (ø)`
src/linux_mcp_server/gatekeeper/http_utils.py	`100.00% <100.00%> (ø)`
src/linux_mcp_server/models.py	`100.00% <100.00%> (ø)`
tests/conftest.py	`95.50% <100.00%> (+0.05%)`	⬆️
tests/gatekeeper/test_anthropic_client.py	`100.00% <100.00%> (ø)`
tests/gatekeeper/test_gcp_auth.py	`100.00% <100.00%> (ø)`
tests/gatekeeper/test_gemini_client.py	`100.00% <100.00%> (ø)`
tests/gatekeeper/test_http_utils.py	`100.00% <100.00%> (ø)`
tests/gatekeeper/test_llm.py	`100.00% <100.00%> (ø)`
... and 19 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Drop LightLLM and replace with custom, in-tree clients. The combination of custom clients maintain support for OpenAI compatible providers with both Responses and Chat Completion endpoints, Anthropic Messages API, and Gemini’s generateContent. Both direct API and GCP/Vertex AI backends are supported for APIs that are provided by Vertex AI. There is a slight regression in that we no longer support OpenRouter’s native API and custom settings.

Add first-class support for the Vertex AI provider in the gatekeeper system. The configuration structure has been refactored to accommodate separate settings for each provider type.

Move provider-specific methods out of http_utils into their relevant client file.

Passing it down into each provider's completion function.

- Introduced a new pricing module to compute costs based on token usage across different providers. - Added functionality to extract usage statistics from responses for OpenAI, Anthropic, Gemini, and OpenRouter. - Updated GatekeeperCompletion and GatekeeperStats models to include token counts and cost sources. - Updated documentation to clarify cost estimation logic and configuration options. - Added tests to ensure accurate cost calculations and usage extraction across various clients.

- Changed dependency from `requests` to `httpx`. - Updated HTTP client from `requests` to `httpx` for asynchronous capabilities. - Converted completion functions for OpenAI, Anthropic, Gemini, OpenRouter, and Vertex AI to async. - Added a new check in documentation to ensure async functions are only decorated when `asyncio_mode` is not set to auto.

subpop requested a review from a team as a code owner June 2, 2026 13:49

subpop force-pushed the native-openai-api-client branch 2 times, most recently from 55df4de to 7551529 Compare June 2, 2026 16:08

owtaylor reviewed Jun 3, 2026

View reviewed changes

subpop force-pushed the native-openai-api-client branch 2 times, most recently from 37cc909 to 442d35f Compare June 18, 2026 16:55

subpop force-pushed the native-openai-api-client branch 2 times, most recently from 670dc91 to 674c62b Compare June 22, 2026 14:46

subpop added 7 commits June 23, 2026 16:30

feat(gatekeeper): add Vertex AI support and refactor configuration

38913ed

Add first-class support for the Vertex AI provider in the gatekeeper system. The configuration structure has been refactored to accommodate separate settings for each provider type.

refactor: reorganize provider-specific funcions

424325c

Move provider-specific methods out of http_utils into their relevant client file.

fix: Restore GATEKEEPER_MAX_TOKENS

66d4c32

Passing it down into each provider's completion function.

refactor(gatekeeper): Drop Chat Completions API from OpenAI client

e85f661

subpop force-pushed the native-openai-api-client branch from 674c62b to e85f661 Compare June 23, 2026 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(gatekeeper): Drop LiteLLM#494

refactor(gatekeeper): Drop LiteLLM#494
subpop wants to merge 7 commits into
rhel-lightspeed:mainfrom
subpop:native-openai-api-client

subpop commented Jun 2, 2026

Uh oh!

owtaylor commented Jun 3, 2026

Uh oh!

owtaylor left a comment

Uh oh!

owtaylor Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

subpop commented Jun 18, 2026

Uh oh!

codecov Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

subpop commented Jun 2, 2026

Uh oh!

owtaylor commented Jun 3, 2026

Uh oh!

owtaylor left a comment

Choose a reason for hiding this comment

Uh oh!

owtaylor Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

subpop commented Jun 18, 2026

Uh oh!

codecov Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 18, 2026 •

edited

Loading