feat(config): Make the gatekeeper model more configurable#465
Conversation
|
For team members: test commit |
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 2 files with indirect coverage changes 🚀 New features to boost your workflow:
|
| raise typer.Exit(code=1) | ||
|
|
||
| if "LINUX_MCP_GATEKEEPER_MODEL" not in os.environ: | ||
| if "LINUX_MCP_GATEKEEPER__MODEL" not in os.environ and "LINUX_MCP_GATEKEEPER__MODEL" not in os.environ: |
|
|
||
| reasoning_effort = CONFIG.gatekeeper.reasoning_effort | ||
| if reasoning_effort is not None: | ||
| if reasoning_effort == ReasoningEffort.ENABLE_THINKING: |
There was a problem hiding this comment.
I'm so confused by the terms they're using for these LLM parameters. I thought "thinking" and "reasoning" mean the same thing, but they're used in different places at the same time.
Since "thinking" and "reasoning" settings are mixed together in ReasoningEffort, does it mean enabling "thinking" is equal to enabling "reasoning" with a default "reasoning_effort.value" under the hood for those LLM?
The reason I ask this is because if users set LINUX_MCP_GATEKEEPER__reasoning_effort="enable_thinking" then they can not change the reasoning_effort. I suspect enabling "reasoning" means the same thing as enabling "thinking" for those LLMs and If user want more fine-grained control for how deep the model should reason for they should always use "reasoning" for it instead of "thinking".
Let me know if I'm wrong! Just want to clarify my doubts on these 😁
There was a problem hiding this comment.
thinking === reasoning ... just diferent words for the same thing. The "enable_thinking" here is a gruesome thing ... to quote my docs:
+| --gatekeeper.reasoning_effortLINUX_MCP_GATEKEEPER__REASONING_EFFORT | (model specific) | Reasoning effort to use for gatekeeper model (none, minimal, low, medium, high, xhigh). Not all values are supported for all models. Special values (enable_thinking, disable_thinking) set the enable_thinking chat template parameter. |
Basically, when using llama.cpp, the only way to turn reasoning on or off through the API is via chat template parameters. (There's a --reasoning on/off cli parameter.)
- qwen3.5 and gemma4 - enable_thinking
- gpt-oss - reasoning_effort
- other models: ???
Other possible approaches:
- We could put this logic with specific model names right in our code. This has the advantage that it just works. But it's not extensible - if we ship with qwen3.5 in our code, there's no way to set it for qwen3.6
- We could do it generic ...
LINUX_MCP_GATEKEEPER__TEMPLATE_KWARGS__ENABLE_THINKING=true...LINUX_MCP_GATEKEEPER__TEMPLATE_KWARGS__REASONING_EFFORT=\"low\"
Note that using llama.cpp is a bit of an edge case ... it works well for our evals, but it's inherently not friendly to a user. So we don't need to make things pretty.
There was a problem hiding this comment.
Thanks for the clarification! I can't agree more that llama.cpp is just not friendly to a user. These settings are just to make the eval work well so I agree too that we just don't bother to make things pretty here 😁
|
|
||
| @model_validator(mode="before") | ||
| @classmethod | ||
| def handle_deprecated_aliases(cls, data: Any) -> Any: |
There was a problem hiding this comment.
I'm wondering if we're not using the cls inside the function, can we just use @staticmethod instead?
There was a problem hiding this comment.
Should work [tries the test case], does work.
0b02672 to
771b4fd
Compare
|
For team members: test commit |
|
Pushed a new version:
|
Move the gatekeeper configuration into a nested member off
our config object, so LINUX_MCP_GATEKEEPER_MODEL becomes
LINUX_MCP_GATEKEEPER__MODEL (LINUX_MCP_GATEKEEPER_MODEL
is supported as a deprecated alias.)
Add controls for:
reasoning_effort: turn off or down reasoning often make models
perform better for us.
structured_output: e.g. for gemma-4-31b-it, turning off
response_format is needed to keep the model from going into
infinite looop.
temperature: Anthropc models need a non-zero temperature to
enable reasoning.
quantization: OpenRouter mixes together models with different
quantization in a single model name - specifying a specific
quantization is needed for clean benchmarking data.
template_kwarg: Set model-specific values in the chat template -
e.g. `{"enable_thinking": false}` is useful for llama.cpp.
771b4fd to
35ce97b
Compare
|
For team members: test commit |
|
New version fixes some typos in the docs, and cleans up the mess that prettier's new Markdown support made. |
Move the gatekeeper configuration into a nested member off our config object, so LINUX_MCP_GATEKEEPER_MODEL becomes LINUX_MCP_GATEKEEPER__MODEL (LINUX_MCP_GATEKEEPER_MODEL is supported as a deprecated alias.)
Add controls for:
reasoning_effort: turn off or down reasoning often make models
perform better for us.
structured_output: e.g. for gemma-4-31b-it, turning off
response_format is needed to keep the model from going into
infinite looop.
temperature: Anthropc models need a non-zero temperature to
enable reasoning.
quantization: OpenRouter mixes together models with different
quantization in a single model name - specifying a specific
quantization is needed for clean benchmarking data.