Feat/sear xng tool integration by illuminate97 · Pull Request #953 · it-at-m/mucgpt

illuminate97 · 2026-05-08T08:15:58Z

Summary

What changed?

Added a new InternetSearch tool for the core service.

Searches via the configured SearXNG instance.
Returns sourced results with title, URL, and snippet.
Adds INTERNET_SEARCH configuration support.
Registers the tool only when INTERNET_SEARCH.SEARXNG_URL is configured.
Configured the stack example/local config for https://searxng-test.muenchen.de/.
Added focused unit tests for URL building, placeholder handling, result formatting, and tool metadata.

Why

Why is this change needed?

The assistant currently has no built-in way to retrieve current external information. This tool enables internet search through the organization-controlled SearXNG instance, allowing answers to include up-to-date sourced
information without relying only on model training data.

Validation

How was this verified?

Automated tests
Manual verification
Not applicable

Verified with:

uv run ruff check app/agent/tools/internet_search.py app/agent/tools/tools.py app/config/settings.py tests/unit/test_internet_search_tool.py
uv run pytest tests/unit

Result: 38 passed, 5 skipped.

Live endpoint verification was not completed because the SearXNG instance is only reachable from the corporate network.

UI Changes (If applicable)

Please add screenshots or screencasts of any visual changes.

No direct UI changes. The tool appears through the existing tool discovery/selection mechanism once configured.

Change Areas

Risks / Rollout Notes

Is there anything reviewers or operators should pay attention to?
Examples: breaking changes, config changes, migration order, deployment considerations, rollback concerns.

New optional config section: INTERNET_SEARCH.
The tool is only registered when INTERNET_SEARCH.SEARXNG_URL is set.
The configured SearXNG instance must expose JSON search via /search?format=json.
The configured test endpoint is only reachable from the corporate network.
Search results depend on SearXNG availability, network access, and enabled engines.
No database migration required.

References

Related: #
Closes: #

Summary by CodeRabbit

New Features
- Internet search (SearXNG-backed) added as an agent tool with streaming support
- Tools listing endpoint gains a force_reload option to refresh tool metadata
Improvements
- More robust tool loader with improved cache/lock handling, partial-result caching, and per-source failure tracking
- Tool collection now conditionally includes internet search and refines agent state selection
Tests
- Unit tests added for the internet search tool
Chores
- Added internet search settings and updated config examples

…d caching logic

…grade' into feat/enhance-mcp-tool-retrieval

…ests

coderabbitai · 2026-05-08T08:18:31Z

Warning

Review limit reached

@Meteord, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 12 minutes and 11 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eb84ee59-e104-4b77-ba60-f1babcc4ec9e

📥 Commits

Reviewing files that changed from the base of the PR and between f4e9b61 and cc5e3f7.

📒 Files selected for processing (4)

mucgpt-core-service/app/agent/prompt_pool/atlassian_scope_router.md
mucgpt-core-service/app/agent/prompt_pool/default_instructions.md
mucgpt-core-service/app/agent/tools/internet_search.py
mucgpt-core-service/tests/unit/test_internet_search_tool.py

📝 Walkthrough

Walkthrough

Adds a SearXNG-backed InternetSearch tool with Pydantic config and cached accessor, integrates it into the tool collection and API (force_reload), refactors MCP tool loading/caching with bounded cache-warmup and partial-result short-TTL, and adds unit tests and config examples.

Changes

Internet Search Feature and Tool System Enhancements

Layer / File(s)	Summary
Internet Search Configuration & Models `mucgpt-core-service/app/config/settings.py`	`InternetSearchConfig` model with SearXNG parameters, `MCPToolDescription` model, `MCPConfig.FORCE_RELOAD` flag, and `get_internet_search_settings()` cached accessor.
Internet Search Core Implementation `mucgpt-core-service/app/agent/tools/internet_search.py`	Configuration validation, result formatting, core `internet_search()` with HTTP request/error handling and optional LangGraph streaming, and `make_internet_search_tool()` LangChain factory with MCP metadata.
Internet Search Unit Tests `mucgpt-core-service/tests/unit/test_internet_search_tool.py`	Mocks (`FakeClient`, `FakeResponse`), tests for unconfigured placeholder handling, HTTP request parameter assertions, returned formatting, and tool metadata default (`mcp_group: "default"`).
MCP Tool Loading: Force Reload & Caching Strategy `mucgpt-core-service/app/agent/tools/mcp.py`	Adds `force_reload` to `load_mcp_tools()`, centralizes fetching, tracks per-source failures, caches partial results with short TTL on partial failure, and waits (bounded retries) for cache warm-up on lock contention rather than returning empty.
Tool Collection: Internet Search Integration `mucgpt-core-service/app/agent/tools/tools.py`	Conditionally constructs and surfaces InternetSearch, `list_tool_metadata()` accepts `force_reload`, localized metadata entries for InternetSearch, and MCP loading gets `force_reload`.
API Endpoint & Configuration Examples `mucgpt-core-service/app/api/routers/tools_router.py`, `mucgpt-core-service/config.yaml.example`, `stack/core.config.yaml.example`	`list_tools` endpoint adds `force_reload` query param; example configs include `INTERNET_SEARCH` block (SEARXNG URL, timeout, max results, language, safesearch).

Sequence Diagram

sequenceDiagram
  participant Client
  participant internet_search as internet_search()
  participant SearXNG as SearXNG_HTTP
  participant Writer as StreamWriter
  Client->>internet_search: call with query, max_results, language
  internet_search->>Writer: emit STARTED (if writer present)
  internet_search->>SearXNG: GET /search with params (format=json, q, language, safesearch)
  SearXNG-->>internet_search: JSON results
  internet_search->>internet_search: parse, filter, format numbered results
  internet_search->>Writer: emit ENDED (if writer present)
  internet_search->>Client: return formatted string

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through configs and code so neat,

SEARXNG traces the web for crumbs to eat,
Cache waits a heartbeat, tools line the way,
Results neatly formatted, ready to relay,
A rabbit's small cheer for a feature complete.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Feat/sear xng tool integration' is partially related to the changeset but contains a formatting issue and lacks clarity about the primary change being a SearXNG-backed InternetSearch tool.	Clarify the title to be more specific and professional, e.g., 'Add InternetSearch tool with SearXNG integration' or 'Integrate SearXNG for internet search capabilities'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The pull request description is well-structured, follows the template, and comprehensively covers all major sections including Summary, Why, Validation, Change Areas, Risks, and References.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/searXNG-tool-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 18

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@mucgpt-core-service/app/agent/agent_executor.py`:
- Line 143: The code is injecting a hardcoded "agent_state": {"current_scope":
"general"} into the request payload for both streaming and non-streaming paths,
which makes scope routing never run in the non-streaming (graph-bypassed) path
and creates a false impression of scope awareness; update the logic around where
the payload/dict is assembled (the place that currently adds the "agent_state"
key in agent_executor.py and the duplicate occurrence later) to only include
agent_state when the streaming/graph path is taken (or remove it entirely for
non-streaming), so non-streaming requests do not carry a misleading scope and
scope routing behavior remains correct.
- Around line 276-279: The non-streaming branch is bypassing the
MUCGPTReActAgent graph and therefore skips tools, data-source injection, scope
routing and ContextMiddleware; replace the raw LLM sync call (the block using
self.agent.model.with_config(config).invoke(msgs)) with a synchronous graph call
so the full ReAct loop runs (use self.agent.graph.invoke or the agent's sync
invocation API with the same config and msgs), ensuring enabled_tools,
data_sources, and agent_state/context middleware are honored during
non-streaming execution.

In `@mucgpt-core-service/app/agent/middleware.py`:
- Around line 216-221: The code is currently echoing raw exception text into the
ToolMessage (using str(exc)), which can leak sensitive info; update the two
spots where ToolMessage is constructed after exceptions (the block using
logger.exception(...) and returning ToolMessage(...
tool_call_id=request.tool_call["id"]) around the logger.exception call and the
similar block at lines 232-237) to: keep logger.exception(...) as-is to record
full details, but change the ToolMessage content to a generic error string
(e.g., "Tool execution failed" or "An internal error occurred while executing
the tool") without including exc or any exception details, while still
preserving tool_call id from request.tool_call["id"].

In `@mucgpt-core-service/app/agent/prompt_pool/atlassian_scope_router.md`:
- Around line 14-15: The classifier instruction string in
atlassian_scope_router.md contains a typo "fokus" that should be corrected to
"focus"; update the sentence "Important: fokus on the latest scope of the
conversation..." to "Important: focus on the latest scope of the
conversation..." so the user-facing agent instruction is correct and consistent.

In `@mucgpt-core-service/app/agent/prompt_pool/default_instructions.md`:
- Line 6: Fix the typo in the default instruction string "Responde in the same
language as the request." by changing "Responde" to "Respond" so the line reads
"Respond in the same language as the request." Update the text inside
default_instructions.md where that exact sentence appears to ensure generated
instructions use the correct spelling.

In `@mucgpt-core-service/app/agent/react_agent.py`:
- Around line 143-145: The code reads messages[0].content without ensuring
messages is non-empty in _prepare_run; update the logic where system_prompt is
set (the block using DEFAULT_INSTRUCTIONS and messages[0].content) and the
similar block at the later location so you first check that messages is
truthy/len(messages) > 0 before accessing messages[0]; if messages is empty,
treat the system_prompt branch as None (or the safe default) to avoid
IndexError. Ensure you reference _prepare_run, the messages variable, and
DEFAULT_INSTRUCTIONS when making the change.
- Around line 69-73: The _select_tools method currently returns an empty list
when enabled_tools is None, dropping all tools; change it so that when
enabled_tools is None it returns the full toolset (e.g., self.tools or a shallow
copy) instead of [], and keep the existing filtering behavior when enabled_tools
is provided; update the method _select_tools in react_agent.py to check for None
specifically and return the agent's tools attribute in that case so callers that
omit enabled_tools retain InternetSearch/MCP access.

In `@mucgpt-core-service/app/agent/state_models/atlassian_state.py`:
- Around line 11-14: The four required fields (current_scope, locked_scope,
initial_scope_checked, scope_confidence) in the Atlassian state model are
causing ValidationError when partial updates are supplied by
AtlassianScopePolicy; make them safe by providing sensible defaults: set
current_scope and locked_scope default to "general", initial_scope_checked
default to False, and scope_confidence default to 0.0 on the model (the
attributes current_scope, locked_scope, initial_scope_checked, scope_confidence)
so partial construction succeeds.

In `@mucgpt-core-service/app/agent/state_models/default_state.py`:
- Line 13: AgentState currently declares data_sources as a required field which
causes TypedDict validation failures; change the declaration of data_sources in
the AgentState TypedDict to be optional by wrapping it with
typing_extensions.NotRequired (or typing.NotRequired if available) so the field
is not required in all state dicts—update the symbol data_sources in the
AgentState definition in default_state.py to use NotRequired to fix the
TypedDict violation.
- Line 3: Update the import so it uses LangChain's public re-export: replace the
current import of AgentState from langchain.agents.middleware with importing
AgentState from langchain.agents (i.e., ensure the top of default_state.py
imports AgentState via "from langchain.agents import AgentState") to follow
LangChain 1.2.x standard conventions and avoid fragile internal import paths.

In `@mucgpt-core-service/app/agent/tools/internet_search.py`:
- Around line 171-172: The InternetSearch tool is incorrectly placed in the
Atlassian tool group via internet_search_tool.metadata = {"mcp_group":
"atlassian"}; remove or change that metadata so InternetSearch is not assigned
to the "atlassian" group (e.g., delete the metadata assignment or set a neutral
group) to prevent it from opt‑ing into AtlassianAgentState and the Atlassian
scope router.

In `@mucgpt-core-service/app/agent/tools/policies.py`:
- Around line 132-139: _tool_scope currently reads metadata["mcp_scope"] but the
loader (McpLoader) and make_internet_search_tool() populate
metadata["mcp_group"], so update _tool_scope() to read the same key(s): first
check metadata.get("mcp_group") and if missing fallback to
metadata.get("mcp_scope"); normalize the value (str(...).strip().lower()) and
return it when it matches the allowed set {"jira","confluence"} so tools grouped
via mcp_group are correctly scope-filtered.
- Around line 165-167: The current logger.info(f"Request: {request}") prints the
entire ModelRequest (including conversation text and injected docs); replace
that full-request logging with a minimal summary: log the message count (e.g.,
len(request.messages)) and any relevant state flags such as whether injected
documents are present (e.g., bool(getattr(request, "documents", None)) or a
similar field), and remove or redact any direct conversation/content fields;
keep the surrounding flow (the early return and the existing logger.info("No
messages in request; skipping initial scope check.") and return request) intact.

In `@mucgpt-core-service/app/agent/tools/tools.py`:
- Around line 37-44: The function select_agent_state_schema collects tool
metadata groups but doesn't handle the case where no tools have metadata,
causing list(tool_groups)[0] to crash; update select_agent_state_schema to first
check if tool_groups is empty and immediately return DefaultAgentState when so,
otherwise preserve the existing logic that returns DefaultAgentState when
multiple groups exist or looks up AGENT_STATE_SCHEMA_REGISTRY for the single
group key.

In `@mucgpt-core-service/config.yaml.example`:
- Around line 76-85: The INTERNET_SEARCH example block is active by default and
points to a corporate SearXNG URL; update the example so the entire
INTERNET_SEARCH block (keys SEARXNG_URL, TIMEOUT, MAX_RESULTS, LANGUAGE,
SAFESEARCH) is commented out and add a short inline note that this feature is
optional and only enabled when SEARXNG_URL is set (or via
MUCGPT_CORE_INTERNET_SEARCH__SEARXNG_URL env var). Locate the INTERNET_SEARCH
block in the example config and convert each line to a comment variant so users
copying the file don't unknowingly enable a broken corporate URL.

In `@mucgpt-core-service/tests/unit/test_internet_search_tool.py`:
- Around line 8-12: Add explicit type hints to the untyped test function and
mock methods: annotate raise_for_status() as -> None and json() as -> Dict[str,
Any]; import typing names (Dict, Any) at top of the test module. Also annotate
all test function signatures referenced (the functions around lines 27-37 and
42-76) with appropriate parameter and return types (e.g., def test_xxx() ->
None) to comply with the repository rule. Ensure the mock class methods keep
self parameter types implicit but include return types exactly as above and
update any other untyped functions in that file similarly.
- Around line 75-79: The unit test
test_make_internet_search_tool_has_default_metadata expects metadata
{"mcp_group": "default"} but the current
internet_search.make_internet_search_tool produces {"mcp_group": "atlassian"},
so update the assertion or the tool constructor to be consistent; either change
the test's expected value to {"mcp_group": "atlassian"} (update the assertion on
tool.metadata) or modify make_internet_search_tool to set metadata to
{"mcp_group": "default"} so tool.metadata matches the test.

In `@stack/core.config.yaml.example`:
- Line 72: Replace the hard-coded corporate URL in the example config for
SEARXNG_URL with a neutral placeholder or empty string so the example does not
enable an environment-specific search endpoint by default; update the
SEARXNG_URL entry in stack/core.config.yaml.example to something like an empty
value or "https://example.local/" and add a short comment indicating users must
explicitly set/opt-in to a real search endpoint for their environment.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 64d884af-ea0d-44ba-b548-1afd3732a53f

📥 Commits

Reviewing files that changed from the base of the PR and between d3b47c2 and 3aa1a81.

⛔ Files ignored due to path filters (1)

mucgpt-core-service/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (27)

mucgpt-core-service/app/agent/agent.py
mucgpt-core-service/app/agent/agent_executor.py
mucgpt-core-service/app/agent/middleware.py
mucgpt-core-service/app/agent/prompt_pool/atlassian_confluence.md
mucgpt-core-service/app/agent/prompt_pool/atlassian_general.md
mucgpt-core-service/app/agent/prompt_pool/atlassian_jira.md
mucgpt-core-service/app/agent/prompt_pool/atlassian_scope_router.md
mucgpt-core-service/app/agent/prompt_pool/default_instructions.md
mucgpt-core-service/app/agent/prompt_pool/tool_instructions.md
mucgpt-core-service/app/agent/react_agent.py
mucgpt-core-service/app/agent/state_models/__init__.py
mucgpt-core-service/app/agent/state_models/atlassian_state.py
mucgpt-core-service/app/agent/state_models/default_state.py
mucgpt-core-service/app/agent/state_models/registry.py
mucgpt-core-service/app/agent/tools/internet_search.py
mucgpt-core-service/app/agent/tools/mcp.py
mucgpt-core-service/app/agent/tools/policies.py
mucgpt-core-service/app/agent/tools/tools.py
mucgpt-core-service/app/api/routers/tools_router.py
mucgpt-core-service/app/config/settings.py
mucgpt-core-service/app/init_app.py
mucgpt-core-service/config.yaml.example
mucgpt-core-service/pyproject.toml
mucgpt-core-service/tests/unit/test_agent.py
mucgpt-core-service/tests/unit/test_init_app.py
mucgpt-core-service/tests/unit/test_internet_search_tool.py
stack/core.config.yaml.example

💤 Files with no reviewable changes (2)

mucgpt-core-service/app/agent/agent.py
mucgpt-core-service/tests/unit/test_agent.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

mucgpt-core-service/app/agent/tools/tools.py (1)
206-236: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

InternetSearch metadata missing for français, bairisch, and ukrainisch.

Brainstorming and Vereinfachen have entries in every supported language map, but InternetSearch is only present in deutsch and english. Users with the other locales will fall back to the raw tool name/description from make_internet_search_tool, producing inconsistent UX compared to other tools.

Add InternetSearch entries (or document the intended fallback) so localized labels render correctly across all listed languages.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mucgpt-core-service/app/agent/tools/tools.py` around lines 206 - 236, The
language maps are missing localized entries for the InternetSearch tool for the
locales "français", "bairisch", and "ukrainisch", causing those locales to fall
back to the raw make_internet_search_tool labels; add an "InternetSearch" key
alongside "Brainstorming" and "Vereinfachen" in each of those locale
dictionaries with appropriate localized "name" and "description" strings (or
alternatively add a clear comment documenting the intended fallback behavior),
updating the same structure used for other tools so the localized
label/description for InternetSearch renders consistently across the locales
referenced in tools.py.

♻️ Duplicate comments (1)

stack/core.config.yaml.example (1)
70-77: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace hard-coded corporate SearXNG URL with a placeholder.

The example still ships https://searxng-test.muenchen.de/ as the default SEARXNG_URL. Outside the corporate network this either fails resolution or blocks tool calls until the timeout expires, which is a poor first-run experience. Keep the example opt-in by using a placeholder.
♻️ Suggested change
 INTERNET_SEARCH:
-  SEARXNG_URL: "https://searxng-test.muenchen.de/"
+  SEARXNG_URL: "<your-searxng-url>"  # Leave empty or set to your SearXNG instance to enable
   TIMEOUT: 10.0
   MAX_RESULTS: 5
   LANGUAGE: "de"
   SAFESEARCH: 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@stack/core.config.yaml.example` around lines 70 - 77, The example config
currently hard-codes a corporate SearXNG URL under the
INTERNET_SEARCH.SEAXNG_URL key which causes failures for external users; change
SEARXNG_URL to a neutral placeholder (e.g. "<YOUR_SEARXNG_URL>" or empty string)
and keep TIMEOUT, MAX_RESULTS, LANGUAGE, SAFESEARCH as-is so the InternetSearch
tool remains opt-in; update any nearby comment to instruct users to replace the
placeholder with their own SearXNG instance if they want to enable the tool.

🧹 Nitpick comments (3)

mucgpt-core-service/app/agent/tools/tools.py (2)

39-43: ⚡ Quick win

Add type hints to _metadata_value signature.

metadata and default are untyped. Both parameters can accept arbitrary tool-metadata containers and default values, so Any is appropriate here.
♻️ Proposed type hints
-def _metadata_value(metadata, key: str, default=None):
+def _metadata_value(metadata: Any, key: str, default: Any = None) -> Any:
     if isinstance(metadata, dict):
         return metadata.get(key, default)
     return getattr(metadata, key, default)
You'll also need from typing import Any (already absent in this file's imports).
As per coding guidelines: "Use mandatory type hints for all function signatures in Python code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mucgpt-core-service/app/agent/tools/tools.py` around lines 39 - 43, Add
explicit type hints to the _metadata_value signature: annotate metadata: Any,
default: Any, and the return type: Any, and import Any from typing at the top of
the module; update the function declaration for _metadata_value to use these
types so both callers and static checkers understand that metadata and default
can be arbitrary values and the function returns Any.
147-160: 💤 Low value

Document the new force_reload parameter in the docstring.

list_tool_metadata gained a force_reload argument that propagates into McpLoader.load_mcp_tools, but the docstring still only describes user_info and lang. Adding a brief note prevents confusion for callers tracing this from the API layer.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mucgpt-core-service/app/agent/tools/tools.py` around lines 147 - 160, Update
the docstring for list_tool_metadata to include the new force_reload parameter:
state that force_reload (bool, default False) forces reloading of MCP tools from
the source by propagating into McpLoader.load_mcp_tools, describe its effect on
caching/refresh behavior, and add it to the Args section alongside user_info and
lang so callers can see its purpose and default value.

mucgpt-core-service/app/config/settings.py (1)

370-377: ⚡ Quick win

Consider validating SAFESEARCH and TIMEOUT bounds.

SearXNG accepts safesearch as 0, 1, or 2; arbitrary integers will be silently sent to the engine and may produce undefined behavior. Likewise TIMEOUT should be strictly positive. Tightening these at the config layer surfaces misconfiguration at startup rather than at request time.

♻️ Proposed tightening

-from pydantic import (
+from pydantic import (
     BaseModel,
     Field,
     HttpUrl,
+    PositiveFloat,
     PositiveInt,
     PrivateAttr,
     SecretStr,
     TypeAdapter,
     field_validator,
     model_validator,
 )
@@
 class InternetSearchConfig(BaseModel):
     """Internet search configuration (nested under INTERNET_SEARCH key in YAML)."""

     SEARXNG_URL: str = ""
-    TIMEOUT: float = 10.0
+    TIMEOUT: PositiveFloat = 10.0
     MAX_RESULTS: PositiveInt = 5
     LANGUAGE: str = "de"
-    SAFESEARCH: int = 1
+    SAFESEARCH: int = Field(default=1, ge=0, le=2)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mucgpt-core-service/app/config/settings.py` around lines 370 - 377, The
InternetSearchConfig model should validate SAFESEARCH and TIMEOUT bounds: change
SAFESEARCH from a plain int to a constrained type (e.g., Literal[0,1,2] or
conint(ge=0, le=2)) so only 0/1/2 are accepted, and make TIMEOUT a strictly
positive float (e.g., PositiveFloat or confloat(gt=0)) and/or add pydantic
validators on InternetSearchConfig to raise clear validation errors with
contextual messages when values are out of range; reference the
InternetSearchConfig class and the SAFESEARCH and TIMEOUT fields when
implementing these changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@mucgpt-core-service/app/agent/tools/tools.py`:
- Around line 206-236: The language maps are missing localized entries for the
InternetSearch tool for the locales "français", "bairisch", and "ukrainisch",
causing those locales to fall back to the raw make_internet_search_tool labels;
add an "InternetSearch" key alongside "Brainstorming" and "Vereinfachen" in each
of those locale dictionaries with appropriate localized "name" and "description"
strings (or alternatively add a clear comment documenting the intended fallback
behavior), updating the same structure used for other tools so the localized
label/description for InternetSearch renders consistently across the locales
referenced in tools.py.

---

Duplicate comments:
In `@stack/core.config.yaml.example`:
- Around line 70-77: The example config currently hard-codes a corporate SearXNG
URL under the INTERNET_SEARCH.SEAXNG_URL key which causes failures for external
users; change SEARXNG_URL to a neutral placeholder (e.g. "<YOUR_SEARXNG_URL>" or
empty string) and keep TIMEOUT, MAX_RESULTS, LANGUAGE, SAFESEARCH as-is so the
InternetSearch tool remains opt-in; update any nearby comment to instruct users
to replace the placeholder with their own SearXNG instance if they want to
enable the tool.

---

Nitpick comments:
In `@mucgpt-core-service/app/agent/tools/tools.py`:
- Around line 39-43: Add explicit type hints to the _metadata_value signature:
annotate metadata: Any, default: Any, and the return type: Any, and import Any
from typing at the top of the module; update the function declaration for
_metadata_value to use these types so both callers and static checkers
understand that metadata and default can be arbitrary values and the function
returns Any.
- Around line 147-160: Update the docstring for list_tool_metadata to include
the new force_reload parameter: state that force_reload (bool, default False)
forces reloading of MCP tools from the source by propagating into
McpLoader.load_mcp_tools, describe its effect on caching/refresh behavior, and
add it to the Args section alongside user_info and lang so callers can see its
purpose and default value.

In `@mucgpt-core-service/app/config/settings.py`:
- Around line 370-377: The InternetSearchConfig model should validate SAFESEARCH
and TIMEOUT bounds: change SAFESEARCH from a plain int to a constrained type
(e.g., Literal[0,1,2] or conint(ge=0, le=2)) so only 0/1/2 are accepted, and
make TIMEOUT a strictly positive float (e.g., PositiveFloat or confloat(gt=0))
and/or add pydantic validators on InternetSearchConfig to raise clear validation
errors with contextual messages when values are out of range; reference the
InternetSearchConfig class and the SAFESEARCH and TIMEOUT fields when
implementing these changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 63a2f922-d5ad-4085-954e-dad1df20897d

📥 Commits

Reviewing files that changed from the base of the PR and between 3aa1a81 and 8f53583.

📒 Files selected for processing (4)

mucgpt-core-service/app/agent/tools/tools.py
mucgpt-core-service/app/config/settings.py
mucgpt-core-service/config.yaml.example
stack/core.config.yaml.example

✅ Files skipped from review due to trivial changes (1)

mucgpt-core-service/config.yaml.example

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai · 2026-05-28T14:27:51Z

Actionable comments posted: 0

…-at-m/mucgpt into feat/searXNG-tool-integration

coderabbitai · 2026-05-28T14:33:00Z

Actionable comments posted: 0

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…er.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…-at-m/mucgpt into feat/searXNG-tool-integration

Meteord

LGTM! i changed the group for the websearch tool. Pls review again if that is okay @illuminate97

Sebastian.Berger added 3 commits May 8, 2026 08:40

feat: enhance MCP tool retrieval with force reload option and improve…

3dd8d21

…d caching logic

Merge remote-tracking branch 'origin/feat/agent-architecture-react-up…

5852689

…grade' into feat/enhance-mcp-tool-retrieval

feat: integrate SearXNG internet search tool with configuration and t…

3aa1a81

…ests

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Merge branch 'main' into feat/searXNG-tool-integration

8f53583

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Meteord and others added 2 commits May 28, 2026 16:21

Merge branch 'main' into feat/searXNG-tool-integration

17b4675

Apply suggestions from code review

9e08696

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Meteord added 2 commits May 28, 2026 16:30

Make internet search a default tool

dadfc8e

Merge branch 'feat/searXNG-tool-integration' of https://github.com/it…

f4e9b61

…-at-m/mucgpt into feat/searXNG-tool-integration

Meteord and others added 5 commits May 28, 2026 16:38

type this

31080c4

Update mucgpt-core-service/app/agent/prompt_pool/default_instructions.md

671cac5

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update mucgpt-core-service/app/agent/prompt_pool/atlassian_scope_rout…

9995525

…er.md Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Group internet

c1ec987

Merge branch 'feat/searXNG-tool-integration' of https://github.com/it…

cc5e3f7

…-at-m/mucgpt into feat/searXNG-tool-integration

Meteord approved these changes May 28, 2026

View reviewed changes

Conversation

illuminate97 commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

UI Changes (If applicable)

Change Areas

Risks / Rollout Notes

References

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

Meteord left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

illuminate97 commented May 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading