Skip to content

feat(config): Make the gatekeeper model more configurable#465

Merged
owtaylor merged 1 commit into
rhel-lightspeed:mainfrom
owtaylor:gatekeeper-model-configuration
May 24, 2026
Merged

feat(config): Make the gatekeeper model more configurable#465
owtaylor merged 1 commit into
rhel-lightspeed:mainfrom
owtaylor:gatekeeper-model-configuration

Conversation

@owtaylor
Copy link
Copy Markdown
Contributor

Move the gatekeeper configuration into a nested member off our config object, so LINUX_MCP_GATEKEEPER_MODEL becomes LINUX_MCP_GATEKEEPER__MODEL (LINUX_MCP_GATEKEEPER_MODEL is supported as a deprecated alias.)

Add controls for:
reasoning_effort: turn off or down reasoning often make models
perform better for us.
structured_output: e.g. for gemma-4-31b-it, turning off
response_format is needed to keep the model from going into
infinite looop.
temperature: Anthropc models need a non-zero temperature to
enable reasoning.
quantization: OpenRouter mixes together models with different
quantization in a single model name - specifying a specific
quantization is needed for clean benchmarking data.

@owtaylor owtaylor requested a review from a team as a code owner May 18, 2026 16:42
@github-actions
Copy link
Copy Markdown

For team members: test commit 0b02672 in internal GitLab

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 97.50000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/linux_mcp_server/server.py 0.00% 1 Missing and 1 partial ⚠️
src/linux_mcp_server/config.py 96.87% 0 Missing and 1 partial ⚠️
Flag Coverage Δ
unittests 97.23% <97.50%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rc/linux_mcp_server/gatekeeper/check_run_script.py 100.00% <100.00%> (ø)
tests/conftest.py 95.45% <100.00%> (ø)
tests/gatekeeper/test_check_run_script.py 100.00% <100.00%> (ø)
tests/test_config.py 100.00% <100.00%> (ø)
src/linux_mcp_server/config.py 99.09% <96.87%> (-0.91%) ⬇️
src/linux_mcp_server/server.py 92.99% <0.00%> (-0.80%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Jazzcort Jazzcort self-requested a review May 19, 2026 21:00
Copy link
Copy Markdown
Contributor

@Jazzcort Jazzcort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the duplicate conditions, the others are just my thoughts and questions. 😁 Also need to resolve some conflicts

Comment thread eval/gatekeeper/run-eval.py Outdated
raise typer.Exit(code=1)

if "LINUX_MCP_GATEKEEPER_MODEL" not in os.environ:
if "LINUX_MCP_GATEKEEPER__MODEL" not in os.environ and "LINUX_MCP_GATEKEEPER__MODEL" not in os.environ:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate conditions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!


reasoning_effort = CONFIG.gatekeeper.reasoning_effort
if reasoning_effort is not None:
if reasoning_effort == ReasoningEffort.ENABLE_THINKING:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so confused by the terms they're using for these LLM parameters. I thought "thinking" and "reasoning" mean the same thing, but they're used in different places at the same time.
Since "thinking" and "reasoning" settings are mixed together in ReasoningEffort, does it mean enabling "thinking" is equal to enabling "reasoning" with a default "reasoning_effort.value" under the hood for those LLM?

The reason I ask this is because if users set LINUX_MCP_GATEKEEPER__reasoning_effort="enable_thinking" then they can not change the reasoning_effort. I suspect enabling "reasoning" means the same thing as enabling "thinking" for those LLMs and If user want more fine-grained control for how deep the model should reason for they should always use "reasoning" for it instead of "thinking".

Let me know if I'm wrong! Just want to clarify my doubts on these 😁

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking === reasoning ... just diferent words for the same thing. The "enable_thinking" here is a gruesome thing ... to quote my docs:

+| --gatekeeper.reasoning_effort
LINUX_MCP_GATEKEEPER__REASONING_EFFORT | (model specific) | Reasoning effort to use for gatekeeper model (none, minimal, low, medium, high, xhigh). Not all values are supported for all models. Special values (enable_thinking, disable_thinking) set the enable_thinking chat template parameter. |

Basically, when using llama.cpp, the only way to turn reasoning on or off through the API is via chat template parameters. (There's a --reasoning on/off cli parameter.)

  • qwen3.5 and gemma4 - enable_thinking
  • gpt-oss - reasoning_effort
  • other models: ???

Other possible approaches:

  1. We could put this logic with specific model names right in our code. This has the advantage that it just works. But it's not extensible - if we ship with qwen3.5 in our code, there's no way to set it for qwen3.6
  2. We could do it generic ... LINUX_MCP_GATEKEEPER__TEMPLATE_KWARGS__ENABLE_THINKING=true ... LINUX_MCP_GATEKEEPER__TEMPLATE_KWARGS__REASONING_EFFORT=\"low\"

Note that using llama.cpp is a bit of an edge case ... it works well for our evals, but it's inherently not friendly to a user. So we don't need to make things pretty.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! I can't agree more that llama.cpp is just not friendly to a user. These settings are just to make the eval work well so I agree too that we just don't bother to make things pretty here 😁

Comment thread src/linux_mcp_server/config.py Outdated

@model_validator(mode="before")
@classmethod
def handle_deprecated_aliases(cls, data: Any) -> Any:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we're not using the cls inside the function, can we just use @staticmethod instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should work [tries the test case], does work.

@owtaylor owtaylor force-pushed the gatekeeper-model-configuration branch from 0b02672 to 771b4fd Compare May 20, 2026 19:17
@github-actions
Copy link
Copy Markdown

For team members: test commit 771b4fd in internal GitLab

@owtaylor
Copy link
Copy Markdown
Contributor Author

Pushed a new version:

  • rebased to main
  • with fixes for the things you commented on
  • with template_kwargs instead of the reasoning_effort=ENABLE_THINKING hack

@owtaylor owtaylor requested a review from Jazzcort May 20, 2026 19:18
Move the gatekeeper configuration into a nested member off
our config object, so LINUX_MCP_GATEKEEPER_MODEL becomes
LINUX_MCP_GATEKEEPER__MODEL (LINUX_MCP_GATEKEEPER_MODEL
is supported as a deprecated alias.)

Add controls for:
 reasoning_effort: turn off or down reasoning often make models
   perform better for us.
 structured_output: e.g. for gemma-4-31b-it, turning off
   response_format is needed to keep the model from going into
   infinite looop.
 temperature: Anthropc models need a non-zero temperature to
   enable reasoning.
 quantization: OpenRouter mixes together models with different
   quantization in a single model name - specifying a specific
   quantization is needed for clean benchmarking data.
 template_kwarg: Set model-specific values in the chat template -
   e.g. `{"enable_thinking": false}` is useful for llama.cpp.
@owtaylor owtaylor force-pushed the gatekeeper-model-configuration branch from 771b4fd to 35ce97b Compare May 21, 2026 18:02
@github-actions
Copy link
Copy Markdown

For team members: test commit 35ce97b in internal GitLab

@owtaylor
Copy link
Copy Markdown
Contributor Author

New version fixes some typos in the docs, and cleans up the mess that prettier's new Markdown support made.

Copy link
Copy Markdown
Contributor

@Jazzcort Jazzcort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 😁

@owtaylor owtaylor merged commit 9ffa5e3 into rhel-lightspeed:main May 24, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants