feat(config): Make the gatekeeper model more configurable (#465)

owtaylor · web-flow · commit 9ffa5e3aa3a9 · 2026-05-24T16:43:42.000-04:00
Move the gatekeeper configuration into a nested member off
our config object, so LINUX_MCP_GATEKEEPER_MODEL becomes
LINUX_MCP_GATEKEEPER__MODEL (LINUX_MCP_GATEKEEPER_MODEL
is supported as a deprecated alias.)

Add controls for:
 reasoning_effort: turn off or down reasoning often make models
   perform better for us.
 structured_output: e.g. for gemma-4-31b-it, turning off
   response_format is needed to keep the model from going into
   infinite looop.
 temperature: Anthropc models need a non-zero temperature to
   enable reasoning.
 quantization: OpenRouter mixes together models with different
   quantization in a single model name - specifying a specific
   quantization is needed for clean benchmarking data.
 template_kwarg: Set model-specific values in the chat template -
   e.g. `{"enable_thinking": false}` is useful for llama.cpp.
diff --git a/.gitlab/ci/eval-gatekeeper.yml b/.gitlab/ci/eval-gatekeeper.yml
@@ -36,7 +36,6 @@ download-secure-files:
     paths:
       - securefiles/
 
-
 # ==========================================
 # EVAL WORKFLOW - TEMPLATES
 # ==========================================
@@ -68,7 +67,7 @@ download-secure-files:
     - export VERTEXAI_PROJECT="rhel-lightspeed-650189"
     - export VERTEXAI_LOCATION="global"
   artifacts:
-    paths: 
+    paths:
       - data/
     expire_in: 1 hour
 
@@ -77,23 +76,22 @@ download-secure-files:
 # ==========================================
 
 # Vertex AI
-gatekeeper-eval-gpt-oss-120b: 
+gatekeeper-eval-gpt-oss-120b:
   extends: .eval-base
   variables:
     MODEL_NAME: "gpt-oss-120b"
   script:
-    - export LINUX_MCP_GATEKEEPER_MODEL="vertex_ai/openai/$MODEL_NAME-maas"
+    - export LINUX_MCP_GATEKEEPER__MODEL="vertex_ai/openai/$MODEL_NAME-maas"
     - uv run --extra gcp eval/gatekeeper/run-eval.py --all -f json --output-all -o "$CI_PROJECT_DIR/data/$MODEL_NAME.json"
 
-gatekeeper-eval-gemini-3.1-pro-preview: 
+gatekeeper-eval-gemini-3.1-pro-preview:
   extends: .eval-base
   variables:
     MODEL_NAME: "gemini-3.1-pro-preview"
   script:
-    - export LINUX_MCP_GATEKEEPER_MODEL="vertex_ai/$MODEL_NAME"
+    - export LINUX_MCP_GATEKEEPER__MODEL="vertex_ai/$MODEL_NAME"
     - uv run --extra gcp eval/gatekeeper/run-eval.py --all -f json --output-all -o "$CI_PROJECT_DIR/data/$MODEL_NAME.json"
 
-
 # Models.corp
 gatekeeper-eval-models-corp-granite-4.0-h-small:
   extends: .eval-base
diff --git a/docs/config-reference.md b/docs/config-reference.md
@@ -56,10 +56,15 @@ See [Guarded Command Execution](guarded-command-execution.md) for details on the
 These are used when `LINUX_MCP_TOOLSET` is set to `run_script` or `both`.
 
 | Option / Env Var | Default | Description |
-|------------------|---------|-------------|
-| `--gatekeeper-model`<br>`LINUX_MCP_GATEKEEPER_MODEL` | *(none)* | Required: [LiteLLM model name](https://docs.litellm.ai/docs/providers) to use |
+| ---------------- | ------- | ----------- |
 | `--always-confirm-scripts` / `--no-always-confirm-scripts`<br>`LINUX_MCP_ALWAYS_CONFIRM_SCRIPTS` | `False` | All scripts must be confirmed by the user |
-| Other environment variables | *(none)* | As required by the LiteLLM provider, e.g. `OPENAI_API_KEY` |
+| `--gatekeeper.model`<br>`LINUX_MCP_GATEKEEPER__MODEL` | _(none)_ | Required: [LiteLLM model name](https://docs.litellm.ai/docs/providers) to use |
+| `--gatekeeper.quantization`<br>`LINUX_MCP_GATEKEEPER__QUANTIZATION` | _(model specific)_ | _Not usually needed_ - Particular model quantization to use (openrouter only) |
+| `--gatekeeper.reasoning_effort`<br>`LINUX_MCP_GATEKEEPER__REASONING_EFFORT` | _(model specific)_ | Reasoning effort to use for gatekeeper model (`none`, `minimal`, `low`, `medium`, `high`, `xhigh`). Not all values are supported for all models. |
+| `--gatekeeper.structured_output`<br>`LINUX_MCP_GATEKEEPER__STRUCTURED_OUTPUT` | _(autodetected)_ | _Not usually needed_ - Whether to use structured output generation for the model. Default is to use if detected as available. |
+| `--gatekeeper.temperature`<br>`LINUX_MCP_GATEKEEPER__TEMPERATURE` | 0.0 | _Not usually needed_ - Temperature to use for model - for some models, a non-zero value may be necessary when enabling reasoning. |
+| `--gatekeeper.template_kwargs`<br>`LINUX_MCP_GATEKEEPER__TEMPLATE_KWARGS` | _(none)_ | _Not usually needed_ - Extra arguments for the model's chat template, formatted as a JSON string. Example: `{ "enable_thinking": false }` |
+| Other environment variables | _(none)_ | As required by the LiteLLM provider, e.g. `OPENAI_API_KEY` |
 
 ## Logging Configuration
 
diff --git a/docs/guarded-command-execution.md b/docs/guarded-command-execution.md
@@ -106,14 +106,14 @@ LINUX_MCP_TOOLSET=run_script
 
 **Configure a Gatekeeper Model**
 
-Set `LINUX_MCP_GATEKEEPER_MODEL` to the name of the model you want to use. Additional environment
+Set `LINUX_MCP_GATEKEEPER__MODEL` to the name of the model you want to use. Additional environment
 variables may be needed to configure credentials. See the
 [LiteLLM documentation](https://docs.litellm.ai/docs/providers) for details on how to configure your provider.
 
 Example:
 
 ```sh
-LINUX_MCP_GATEKEEPER_MODEL=openai/chatgpt-5.2
+LINUX_MCP_GATEKEEPER__MODEL=openai/chatgpt-5.2
 OPENAI_API_KEY=<....>
 ```
 
diff --git a/eval/gatekeeper/README.md b/eval/gatekeeper/README.md
@@ -64,7 +64,7 @@ Runs test cases through the gatekeeper and reports results.
 
 ```bash
 # Set the gatekeeper model
-export LINUX_MCP_GATEKEEPER_MODEL="openrouter/anthropic/claude-3.5-sonnet"
+export LINUX_MCP_GATEKEEPER__MODEL="openrouter/anthropic/claude-3.5-sonnet"
 
 # Run evaluation on a single file
 uv run eval/gatekeeper/run-eval.py testcases/selinux-port-denial.yaml -o results.yaml
diff --git a/eval/gatekeeper/run-eval-models-corp.sh b/eval/gatekeeper/run-eval-models-corp.sh
@@ -38,8 +38,8 @@ get_MC_base_url() {
 }
 
 [[ -z "${MODEL}" ]] && echo "$MODEL must be set" && exit 1
-export LINUX_MCP_GATEKEEPER_MODEL
-LINUX_MCP_GATEKEEPER_MODEL="openai/$MODEL"
+export LINUX_MCP_GATEKEEPER__MODEL
+LINUX_MCP_GATEKEEPER__MODEL="openai/$MODEL"
 export OPENAI_API_BASE
 OPENAI_API_BASE="$(get_MC_base_url "$MODEL")"
 
diff --git a/eval/gatekeeper/run-eval.py b/eval/gatekeeper/run-eval.py
@@ -219,9 +219,9 @@ def main(
         typer.echo("Must specify either a test case file or --all.", err=True)
         raise typer.Exit(code=1)
 
-    if "LINUX_MCP_GATEKEEPER_MODEL" not in os.environ:
+    if "LINUX_MCP_GATEKEEPER__MODEL" not in os.environ and "LINUX_MCP_GATEKEEPER_MODEL" not in os.environ:
         typer.echo(
-            "Please set the LINUX_MCP_GATEKEEPER_MODEL environment variable to specify the Gatekeeper model to use."
+            "Please set the LINUX_MCP_GATEKEEPER__MODEL environment variable to specify the Gatekeeper model to use."
         )
         raise typer.Exit(code=1)
 
diff --git a/src/linux_mcp_server/config.py b/src/linux_mcp_server/config.py
@@ -1,10 +1,14 @@
 """Settings for linux-mcp-server"""
 
+import logging
+import os
 import sys
 
 from pathlib import Path
+from typing import Any
 
 from pydantic import Field
+from pydantic import model_validator
 from pydantic import SecretStr
 from pydantic_settings import BaseSettings
 from pydantic_settings import SettingsConfigDict
@@ -13,6 +17,9 @@
 from linux_mcp_server.utils.types import UpperCase
 
 
+logger = logging.getLogger(__name__)
+
+
 class Transport(StrEnum):
     stdio = "stdio"
     http = "http"
@@ -27,6 +34,18 @@ class Toolset(StrEnum):
     BOTH = "both"
 
 
+class ReasoningEffort(StrEnum):
+    """Reasoning effort levels for the gatekeeper model."""
+
+    NONE = "none"
+    MINIMAL = "minimal"
+    LOW = "low"
+    MEDIUM = "medium"
+    HIGH = "high"
+    XHIGH = "xhigh"
+    DEFAULT = "default"
+
+
 class AuthProvider(StrEnum):
     """Authentication provider types."""
 
@@ -78,6 +97,27 @@ class AuthConfig(BaseSettings):
     introspection: IntrospectionAuthConfig | None = None
 
 
+class GatekeeperConfig(BaseSettings):
+    """Gatekeeper Model configuration"""
+
+    model: str | None = None
+
+    # model quantization (e.g. fp8, bf16 - only supported for openrouter)
+    quantization: str | None = None
+
+    # reasoning effort
+    reasoning_effort: ReasoningEffort | None = None
+
+    # Whether we should use structured output (default, autodetect support)
+    structured_output: bool | None = None
+
+    # dict of extra template keyword arguments
+    template_kwargs: dict[str, Any] = Field(default_factory=dict)
+
+    # Temperature for gatekeeper model
+    temperature: float = 0.0
+
+
 class Config(BaseSettings):
     # The '_' is required in the env_prefix, otherwise, pydantic would
     # interpret the prefix as LINUX_MCPLOG_DIR, instead of LINUX_MCP_LOG_DIR
@@ -127,8 +167,7 @@ class Config(BaseSettings):
     # What tools are available
     toolset: Toolset = Toolset.FIXED
 
-    # Gatekeeper model (required for run_script tools)
-    gatekeeper_model: str | None = None
+    gatekeeper: GatekeeperConfig = Field(default_factory=GatekeeperConfig)
 
     # Command execution timeout (applies to both local and remote commands)
     command_timeout: int = 30  # Timeout in seconds; prevents hung commands
@@ -165,9 +204,25 @@ def transport_kwargs(self):
     #
     # @model_validator(mode="after")
     # def validate_gatekeeper_model(self):
-    #     if self.toolset != Toolset.FIXED and self.gatekeeper_model is None:
-    #         raise ValueError('gatekeeper_model must be set unless the toolset is "fixed"')
+    #     if self.toolset != Toolset.FIXED and self.gatekeeper.model is None:
+    #         raise ValueError('gatekeeper.model must be set unless the toolset is "fixed"')
     #     return self
 
+    @model_validator(mode="before")
+    @staticmethod
+    def handle_deprecated_aliases(data: Any) -> Any:
+        if isinstance(data, dict):
+            old_value = os.environ.get("LINUX_MCP_GATEKEEPER_MODEL")
+            if old_value is not None:
+                logger.warning(
+                    "LINUX_MCP_GATEKEEPER_MODEL is deprecated. Please use LINUX_MCP_GATEKEEPER__MODEL instead.",
+                )
+
+                gatekeeper_data = data.setdefault("gatekeeper", {})
+                if isinstance(gatekeeper_data, dict) and "model" not in gatekeeper_data:
+                    gatekeeper_data["model"] = old_value
+
+        return data
+
 
 CONFIG = Config()
diff --git a/src/linux_mcp_server/gatekeeper/check_run_script.py b/src/linux_mcp_server/gatekeeper/check_run_script.py
@@ -1,5 +1,7 @@
 import logging
 
+from typing import Any
+
 import litellm
 
 from litellm import Choices
@@ -9,6 +11,7 @@
 from pydantic import BaseModel
 
 from linux_mcp_server.config import CONFIG
+from linux_mcp_server.config import ReasoningEffort
 from linux_mcp_server.utils import StrEnum
 
 
@@ -23,10 +26,10 @@
 
 
 def get_model() -> str:
-    if CONFIG.gatekeeper_model is None:
-        raise ValueError("To use run_script tools, you must set gatekeeper_model in the linux-mcp-server config")
+    if CONFIG.gatekeeper.model:
+        return CONFIG.gatekeeper.model
     else:
-        return CONFIG.gatekeeper_model
+        raise ValueError("To use run_script tools, you must set LINUX_MCP_GATEKEEPER__MODEL")
 
 
 READONLY_INSTRUCTION = """
@@ -184,6 +187,42 @@ def parse_from_description(cls, description: str) -> "GatekeeperResult":
             return cls(status=status, detail=detail)
 
 
+def _build_completion_kwargs():
+    extra_kwargs: dict[str, Any] = {}
+    model = get_model()
+
+    structured_output = CONFIG.gatekeeper.structured_output
+    if structured_output is None:
+        params = get_supported_openai_params(model=model)
+        structured_output = params is not None and "response_format" in params
+
+    if structured_output:
+        extra_kwargs["response_format"] = GatekeeperResult
+
+    reasoning_effort = CONFIG.gatekeeper.reasoning_effort
+    if reasoning_effort is not None:
+        if model.startswith("openrouter/"):
+            if reasoning_effort == ReasoningEffort.NONE:
+                extra_kwargs["reasoning"] = {"enabled": False}
+            else:
+                extra_kwargs["reasoning"] = {"enabled": True, "effort": reasoning_effort.value}
+        else:
+            extra_kwargs["reasoning_effort"] = reasoning_effort.value
+
+    if model.startswith("openrouter/"):
+        provider: dict[str, Any] = {
+            "require_parameters": True,
+        }
+        extra_kwargs["provider"] = provider
+        if CONFIG.gatekeeper.quantization:
+            provider["quantizations"] = [CONFIG.gatekeeper.quantization]
+
+    if CONFIG.gatekeeper.template_kwargs:
+        extra_kwargs["chat_template_kwargs"] = CONFIG.gatekeeper.template_kwargs
+
+    return extra_kwargs
+
+
 def check_run_script(description: str, script_type: str, script: str, *, readonly: bool) -> GatekeeperResult:
     # Check that the script does what is described
     if "start_of_script" in script.lower() or "end_of_script" in script.lower():
@@ -207,13 +246,14 @@ def check_run_script(description: str, script_type: str, script: str, *, readonl
 
     messages = [{"role": "user", "content": prompt}]
 
-    params = get_supported_openai_params(model=get_model())
-    if params is not None and "response_format" in params:
-        response_format = GatekeeperResult
-    else:
-        response_format = None
+    extra_kwargs = _build_completion_kwargs()
 
-    response = completion(model=get_model(), messages=messages, response_format=response_format, temperature=0)
+    response = completion(
+        model=get_model(),
+        messages=messages,
+        temperature=CONFIG.gatekeeper.temperature,
+        **extra_kwargs,
+    )
     assert isinstance(response, ModelResponse)
     assert isinstance(response.choices[0], Choices)
     response_text = (response.choices[0].message.content or "").strip()
diff --git a/src/linux_mcp_server/server.py b/src/linux_mcp_server/server.py
@@ -178,8 +178,8 @@ def _current_toolset():
 
 
 def _check_gatekeeper_model():
-    if CONFIG.toolset != Toolset.FIXED and CONFIG.gatekeeper_model is None:
-        logger.error("LINUX_MCP_GATEKEEPER_MODEL not set, this is needed for run_script tools")
+    if CONFIG.toolset != Toolset.FIXED and CONFIG.gatekeeper.model is None:
+        logger.error("LINUX_MCP_GATEKEEPER__MODEL not set, this is needed for run_script tools")
         sys.exit(1)
 
 
diff --git a/tests/conftest.py b/tests/conftest.py
@@ -8,7 +8,7 @@
 # Register script tools on the in-process MCP instance used by ``mcp_client``.
 # Default CLI/config is FIXED-only; tests need validate_script / run_script / etc.
 os.environ.setdefault("LINUX_MCP_TOOLSET", "both")
-os.environ.setdefault("LINUX_MCP_GATEKEEPER_MODEL", "test/gatekeeper-placeholder")
+os.environ.setdefault("LINUX_MCP_GATEKEEPER__MODEL", "test/gatekeeper-placeholder")
 
 import pytest
 
diff --git a/tests/gatekeeper/test_check_run_script.py b/tests/gatekeeper/test_check_run_script.py
diff --git a/tests/test_config.py b/tests/test_config.py

Original file line number	Diff line number	Diff line change
`@@ -38,8 +38,8 @@ get_MC_base_url() {`
`38`	`38`	`}`
`39`	`39`
`40`	`40`	`[[ -z "${MODEL}" ]] && echo "$MODEL must be set" && exit 1`
`41`		`-export LINUX_MCP_GATEKEEPER_MODEL`
`42`		`-LINUX_MCP_GATEKEEPER_MODEL="openai/$MODEL"`
	`41`	`+export LINUX_MCP_GATEKEEPER__MODEL`
	`42`	`+LINUX_MCP_GATEKEEPER__MODEL="openai/$MODEL"`
`43`	`43`	`export OPENAI_API_BASE`
`44`	`44`	`OPENAI_API_BASE="$(get_MC_base_url "$MODEL")"`
`45`	`45`