Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
_cache_favicon, accessingsp.temporary_cache["_ws_favicon"]assumes the sub-dict already exists; consider usingsetdefaultor an existence check to avoid aKeyErroron first use. - The API key presence checks in the tool
callmethods (e.g., Tavily/BoCha/Brave) are duplicated by the_KeyRotator.getvalidation; you could simplify by centralizing the validation in_KeyRotator.getand removing the redundant checks in each tool.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_cache_favicon`, accessing `sp.temporary_cache["_ws_favicon"]` assumes the sub-dict already exists; consider using `setdefault` or an existence check to avoid a `KeyError` on first use.
- The API key presence checks in the tool `call` methods (e.g., Tavily/BoCha/Brave) are duplicated by the `_KeyRotator.get` validation; you could simplify by centralizing the validation in `_KeyRotator.get` and removing the redundant checks in each tool.
## Individual Comments
### Comment 1
<location path="astrbot/core/tools/web_search_tools.py" line_range="149-135" />
<code_context>
+ ]
+
+
+async def _tavily_extract(provider_settings: dict, payload: dict) -> list[dict]:
+ tavily_key = await _TAVILY_KEY_ROTATOR.get(provider_settings)
+ header = {
+ "Authorization": f"Bearer {tavily_key}",
+ "Content-Type": "application/json",
+ }
+ async with aiohttp.ClientSession(trust_env=True) as session:
+ async with session.post(
+ "https://api.tavily.com/extract",
+ json=payload,
+ headers=header,
+ ) as response:
+ if response.status != 200:
+ reason = await response.text()
+ raise Exception(
+ f"Tavily web search failed: {reason}, status: {response.status}",
+ )
+ data = await response.json()
</code_context>
<issue_to_address>
**nitpick:** Error message for Tavily extract endpoint still refers to "web search" which can be misleading.
Since this helper targets the Tavily extract endpoint, using "Tavily web search failed" in the exception is misleading and makes it harder to distinguish extract vs search issues in logs. Please update the message to mention extract (or the specific endpoint) so failures are clearly identifiable.
</issue_to_address>
### Comment 2
<location path="astrbot/core/tools/web_search_tools.py" line_range="117" />
<code_context>
+ return json.dumps({"results": ret_ls}, ensure_ascii=False)
+
+
+async def _tavily_search(
+ provider_settings: dict,
+ payload: dict,
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting shared HTTP/JSON handling and common provider key/result checks into small helpers to reduce duplication and make each web search function and tool call easier to read.
You can keep all functionality as-is and still trim a lot of complexity by extracting a couple of tight internal helpers. Two places where the code is currently quite duplicated:
---
### 1. Centralize HTTP request/JSON handling
The 4 search functions (`_tavily_search`, `_tavily_extract`, `_bocha_search`, `_brave_search`, `_baidu_search`) all duplicate `ClientSession`/status checking/JSON parsing.
You can extract a small internal helper and keep provider-specific mapping logic local:
```python
async def _request_json(
method: str,
url: str,
*,
headers: dict,
json_payload: dict | None = None,
params: dict | None = None,
error_prefix: str,
) -> dict:
async with aiohttp.ClientSession(trust_env=True) as session:
request = session.post if method.lower() == "post" else session.get
async with request(url, json=json_payload, params=params, headers=headers) as resp:
if resp.status != 200:
reason = await resp.text()
raise Exception(f"{error_prefix} failed: {reason}, status: {resp.status}")
return await resp.json()
```
Then each provider becomes much smaller, e.g.:
```python
async def _tavily_search(provider_settings: dict, payload: dict) -> list[SearchResult]:
tavily_key = await _TAVILY_KEY_ROTATOR.get(provider_settings)
headers = {
"Authorization": f"Bearer {tavily_key}",
"Content-Type": "application/json",
}
data = await _request_json(
"post",
"https://api.tavily.com/search",
headers=headers,
json_payload=payload,
error_prefix="Tavily web search",
)
return [
SearchResult(
title=item.get("title"),
url=item.get("url"),
snippet=item.get("content"),
favicon=item.get("favicon"),
)
for item in data.get("results", [])
]
```
And similarly for `_tavily_extract`, `_bocha_search`, `_brave_search`, `_baidu_search`. This pulls out the noisy, repeated error/HTTP boilerplate and lets you focus on payload building + result mapping.
---
### 2. Normalize “provider key missing” & empty-result handling
Each tool’s `call` repeats:
- Get `provider_settings`.
- Check for presence of API key and return similar error strings.
- After calling the search function, check `if not results: return "Error: … does not return any results."`.
You can hide this pattern behind very small helpers per provider without changing behavior:
```python
def _ensure_provider_key(provider_settings: dict, key_name: str, label: str) -> str | None:
value = provider_settings.get(key_name)
if not value:
# Preserve exact error wording
return f"Error: {label} API key is not configured in AstrBot."
return None
def _ensure_results(results: list[SearchResult], label: str) -> str | None:
if not results:
return f"Error: {label} web searcher does not return any results."
return None
```
Usage example in `BochaWebSearchTool.call`:
```python
async def call(self, context, **kwargs) -> ToolExecResult:
_, provider_settings, _ = _get_runtime(context)
err = _ensure_provider_key(provider_settings, "websearch_bocha_key", "BoCha")
if err:
return err
payload = {
"query": kwargs["query"],
"count": kwargs.get("count", 10),
"summary": bool(kwargs.get("summary", False)),
}
if kwargs.get("freshness"):
payload["freshness"] = kwargs["freshness"]
if kwargs.get("include"):
payload["include"] = kwargs["include"]
if kwargs.get("exclude"):
payload["exclude"] = kwargs["exclude"]
results = await _bocha_search(provider_settings, payload)
err = _ensure_results(results, "BoCha")
if err:
return err
return _search_result_payload(results)
```
You can do the same for Tavily/Brave/Baidu, keeping provider-specific messages intact while cutting repeated scaffolding and making the happy-path logic easier to scan.
---
These two small extractions keep everything in one module, don’t change observable behavior, and remove a good chunk of duplication that contributes to the “600+ line blob” feel.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request refactors the web search functionality by removing the legacy "default" search engine scrapers and introducing a modular tool-based system for API providers including Tavily, BoCha, Brave, and Baidu AI Search. The changes include a new web_search_tools.py file, configuration migration logic, and updates to the dashboard and documentation to support these providers. Feedback focuses on optimizing the configuration migration path to avoid redundant disk writes during LLM requests, improving error handling for API responses in the BoCha tool, and removing unnecessary query truncation in the Baidu search implementation.
| plugin_context: Context, | ||
| ) -> None: | ||
| cfg = plugin_context.get_config(umo=event.unified_msg_origin) | ||
| normalize_legacy_web_search_config(cfg) |
There was a problem hiding this comment.
| f"BoCha web search failed: {reason}, status: {response.status}", | ||
| ) | ||
| data = await response.json() | ||
| rows = data["data"]["webPages"]["value"] |
There was a problem hiding this comment.
Accessing data["data"]["webPages"]["value"] directly might raise a KeyError if the BoCha API returns an unexpected response structure or an error message in a successful HTTP response. Using .get() or verifying the structure before access would be safer.
| rows = data["data"]["webPages"]["value"] | |
| rows = data.get("data", {}).get("webPages", {}).get("value", []) |
| top_k = 50 | ||
|
|
||
| payload = { | ||
| "messages": [{"role": "user", "content": str(kwargs["query"])[:72]}], |
There was a problem hiding this comment.
The Baidu AI Search query is truncated to 72 characters. This limit seems quite low for modern LLM-driven search queries and might lead to loss of context. If there isn't a strict API requirement for this specific length, consider increasing it or removing the truncation.
| "messages": [{"role": "user", "content": str(kwargs["query"])[:72]}], | |
| "messages": [{"role": "user", "content": str(kwargs["query"])}], |
* refactor: remove default web search * chore: ruff format
Modifications / 改动点
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Refactor web search integration to use a new built-in tool set and drop the legacy default web search provider.
New Features:
Enhancements:
Chores: