Skip to content

fix: 改进知识库的初始化错误处理#7243

Merged
RC-CHN merged 6 commits intoAstrBotDevs:masterfrom
piexian:fix/kb-init-error-handlin
Apr 2, 2026
Merged

fix: 改进知识库的初始化错误处理#7243
RC-CHN merged 6 commits intoAstrBotDevs:masterfrom
piexian:fix/kb-init-error-handlin

Conversation

@piexian
Copy link
Copy Markdown
Contributor

@piexian piexian commented Mar 31, 2026

本次修改主要解决知识库在模型加载失败时的错误降级问题(对应 issue #7218)。

此前当模型提供方异常时,知识库可能出现不理想行为: 因重排序模型异常导致知识库整体不可用。
本次改动将行为调整为更符合预期且更易排查:重排序模型不可用时,跳过重排序并继续检索(知识库仍可用),在日志中输出清晰的警告/错误信息,便于定位问题。

Modifications / 改动点

  1. 调整知识库初始化与检索流程中的异常处理策略。
  2. 增加对“重排序不可用”场景的降级逻辑:
    • 记录 Warning 日志;
    • 跳过重排序;
    • 保留嵌入检索结果继续返回。
  3. 调整“嵌入不可用”场景的处理:
    • 检索阶段直接抛错;
    • 避免误返回空结果。
  4. 优化知识库实例状态管理:
    • 初始化失败状态可被记录并透出;
    • 相关生命周期方法增加防御性处理,避免二次异常。
  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

image image

Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Copilot AI review requested due to automatic review settings March 31, 2026 16:26
@auto-assign auto-assign bot requested review from Raven95676 and Soulter March 31, 2026 16:27
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 31, 2026
@dosubot dosubot bot added the feature:knowledge-base The bug / feature is about knowledge base label Mar 31, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In KBManager.retrieve, failing the whole request when any kb_helper.init_error is set might be too aggressive for multi-KB queries; consider skipping only the unavailable knowledge bases while still returning results from the remaining ones.
  • In list_kbs, calling await kb_manager.get_kb(kb.kb_id) inside the loop introduces an N+1 access pattern; consider exposing init_error in a way that can be fetched in bulk or cached to avoid repeated awaits per KB.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `KBManager.retrieve`, failing the whole request when any `kb_helper.init_error` is set might be too aggressive for multi-KB queries; consider skipping only the unavailable knowledge bases while still returning results from the remaining ones.
- In `list_kbs`, calling `await kb_manager.get_kb(kb.kb_id)` inside the loop introduces an N+1 access pattern; consider exposing `init_error` in a way that can be fetched in bulk or cached to avoid repeated awaits per KB.

## Individual Comments

### Comment 1
<location path="astrbot/dashboard/routes/knowledge_base.py" line_range="317-322" />
<code_context>
             kb_list = []
             for kb in kbs:
-                kb_list.append(kb.model_dump())
+                kb_dict = kb.model_dump()
+                # include init_error from KBHelper if present
+                kb_helper = await kb_manager.get_kb(kb.kb_id)
+                if kb_helper and kb_helper.init_error:
+                    kb_dict["init_error"] = kb_helper.init_error
+                kb_list.append(kb_dict)

             return (
</code_context>
<issue_to_address>
**suggestion (performance):** Per-KB `get_kb` calls in a loop can introduce N+1 async performance overhead.

`await kb_manager.get_kb(kb.kb_id)` inside the loop creates an N+1 I/O pattern for the list endpoint, which will add latency as the number of KBs grows or if `get_kb` hits DB/disk. To avoid this, consider exposing `init_error` directly on `KnowledgeBase`, adding a bulk `kb_manager` API to fetch all needed helpers in one call, or collecting the coroutines and using `asyncio.gather` to run them in parallel.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +317 to +322
kb_dict = kb.model_dump()
# include init_error from KBHelper if present
kb_helper = await kb_manager.get_kb(kb.kb_id)
if kb_helper and kb_helper.init_error:
kb_dict["init_error"] = kb_helper.init_error
kb_list.append(kb_dict)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Per-KB get_kb calls in a loop can introduce N+1 async performance overhead.

await kb_manager.get_kb(kb.kb_id) inside the loop creates an N+1 I/O pattern for the list endpoint, which will add latency as the number of KBs grows or if get_kb hits DB/disk. To avoid this, consider exposing init_error directly on KnowledgeBase, adding a bulk kb_manager API to fetch all needed helpers in one call, or collecting the coroutines and using asyncio.gather to run them in parallel.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces improved error handling and status tracking for knowledge base initialization. Key changes include the addition of an init_error field to track failures, graceful degradation when reranking providers are unavailable, and exposing these initialization errors to the dashboard. Review feedback suggests adopting standard Python logging practices by using exc_info=True instead of manual traceback formatting. Additionally, there is a recommendation to reconsider the new behavior of raising exceptions when a single knowledge base fails during retrieval, suggesting that skipping faulty nodes might better preserve system availability in multi-knowledge base scenarios.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 针对知识库在模型/Provider 初始化或执行失败时的行为进行了“可用性降级 + 可观测性增强”的调整:当重排序模型不可用时继续检索并记录明确日志;当嵌入检索不可用时在检索阶段明确抛错,避免静默返回空结果;同时透出知识库实例初始化失败状态,便于在 Dashboard 排查。

Changes:

  • 知识库列表接口新增透出 init_error(来自 KBHelper 实例初始化状态)。
  • 检索流程中 rerank 失败改为降级跳过(保留融合结果),并调整稠密检索失败为直接抛错。
  • 知识库加载/更新时捕获初始化异常并记录到实例状态(init_error),检索前对不可用 KB 明确报错。

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
astrbot/dashboard/routes/knowledge_base.py KB 列表接口在返回项中附加初始化错误信息,提升可排查性
astrbot/core/knowledge_base/retrieval/manager.py rerank 失败降级跳过;稠密检索失败改为抛错,避免静默吞错
astrbot/core/knowledge_base/kb_mgr.py KB 初始化失败状态记录与透出;更新 KB 后尝试重新初始化并更新 init_error
astrbot/core/knowledge_base/kb_helper.py 新增 init_error 字段;rerank provider 不可用时降级为不启用 rerank;terminate 增强防御性

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3fba97b71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 164640a9c9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

display a dedicated error state for knowledge base cards that fail
initialization, including a visible badge and error details

prevent navigation and edit actions for failed cards while keeping
delete available, and hide normal stats/description for error items

add list.initError locale strings for en-US, ru-RU, and zh-CN
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Apr 2, 2026
@RC-CHN
Copy link
Copy Markdown
Member

RC-CHN commented Apr 2, 2026

image 注意到后端的init_error字段在前端未使用,添加了相关显示组件

RC-CHN added 2 commits April 2, 2026 15:40
Initialize a new KB helper before swapping instances so a failed re-init
does not break the active knowledge base service.

If initialization fails, restore in-memory KB settings and keep the
existing helper and previous init error state.

Also clear stale init_error after successful vector DB initialization to
prevent outdated error reporting.
cover initialization failure and recovery scenarios to guard
against regressions in kb error handling

include reference assets under refs for test validation
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 2, 2026
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Apr 2, 2026
@RC-CHN RC-CHN merged commit 9d4472c into AstrBotDevs:master Apr 2, 2026
7 checks passed
@piexian piexian deleted the fix/kb-init-error-handlin branch April 5, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature:knowledge-base The bug / feature is about knowledge base lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants