Fix(kb_helper):数据库选择的提供商消失,导致的数据库不显示问题#7244
Fix(kb_helper):数据库选择的提供商消失,导致的数据库不显示问题#7244Li-shi-ling wants to merge 10 commits intoAstrBotDevs:masterfrom
Conversation
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
get_ep/_ensure_vec_db, whenembedding_provider_idis missing you now only log an error but still callget_provider_by_idwith a falsy ID and continue; consider either returning a placeholder immediately or keeping the earlyraiseto avoid unexpected behavior downstream. - The temporary provider classes (
TempEmbeddingProvider,TempRerankProvider) are defined inside the methods and re-created on each call; consider moving them to module scope (or a shared helper) to avoid duplication and make their behavior easier to maintain. - In
TempRerankProvider.rerankthe error message mentionsEmbedding Providerinstead ofRerank Provider, which is likely a copy-paste mistake and may confuse debugging when the placeholder is triggered.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `get_ep` / `_ensure_vec_db`, when `embedding_provider_id` is missing you now only log an error but still call `get_provider_by_id` with a falsy ID and continue; consider either returning a placeholder immediately or keeping the early `raise` to avoid unexpected behavior downstream.
- The temporary provider classes (`TempEmbeddingProvider`, `TempRerankProvider`) are defined inside the methods and re-created on each call; consider moving them to module scope (or a shared helper) to avoid duplication and make their behavior easier to maintain.
- In `TempRerankProvider.rerank` the error message mentions `Embedding Provider` instead of `Rerank Provider`, which is likely a copy-paste mistake and may confuse debugging when the placeholder is triggered.
## Individual Comments
### Comment 1
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="137-138" />
<code_context>
async def get_ep(self) -> EmbeddingProvider:
if not self.kb.embedding_provider_id:
- raise ValueError(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
+ logger.error(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
ep: EmbeddingProvider = await self.prov_mgr.get_provider_by_id(
self.kb.embedding_provider_id,
</code_context>
<issue_to_address>
**issue (bug_risk):** After logging missing embedding_provider_id, the code still proceeds and may pass an invalid ID into get_provider_by_id.
Previously this branch raised and prevented use of an unset `embedding_provider_id`. Now it logs and then calls `get_provider_by_id(self.kb.embedding_provider_id)`, which will be `None` here, risking undefined or inconsistent behavior depending on `get_provider_by_id`’s implementation. Please either short‑circuit when the ID is missing (e.g., return a placeholder provider) or restore an early failure (raise or return a clearly invalid provider) so downstream code never relies on a `None`/invalid ID.
</issue_to_address>
### Comment 2
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="186-194" />
<code_context>
+ super().__init__(provider_config, provider_settings)
+ self.rerank_provider_id = rerank_provider_id
+
+ async def rerank(
+ self,
+ query: str,
+ documents: list[str],
+ top_n: int | None = None,
+ ):
+ raise ValueError(
+ f"无法找到 ID 为 {self.rerank_provider_id} 的 Embedding Provider"
+ )
+
</code_context>
<issue_to_address>
**issue (typo):** TempRerankProvider error message refers to an Embedding Provider instead of a Rerank Provider.
The `TempRerankProvider.rerank` error message currently says `Embedding Provider` instead of `Rerank Provider`, which can be confusing when both provider types exist. Please update the message (and its localized text) to reference the rerank provider to match the actual failing dependency.
```suggestion
class TempRerankProvider(RerankProvider):
def __init__(
self,
provider_config: dict,
provider_settings: dict,
rerank_provider_id: str,
) -> None:
super().__init__(provider_config, provider_settings)
self.rerank_provider_id = rerank_provider_id
async def rerank(
self,
query: str,
documents: list[str],
top_n: int | None = None,
):
raise ValueError(
f"无法找到 ID 为 {self.rerank_provider_id} 的 Rerank Provider"
)
```
</issue_to_address>
### Comment 3
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="167-168" />
<code_context>
+ f"无法找到 ID 为 {self.embedding_provider_id} 的 Embedding Provider"
+ )
+
+ def get_dim(self) -> int:
+ return 512
+
+ ep: EmbeddingProvider = TempEmbeddingProvider(
</code_context>
<issue_to_address>
**issue (bug_risk):** Hardcoded embedding dimension 512 in TempEmbeddingProvider may not match the actual FAISS index or configuration.
`TempEmbeddingProvider.get_dim` always returns 512. If the existing FAISS index or config uses a different dimension, this will lead to runtime errors or index inconsistencies when `_ensure_vec_db` accesses the vector DB. Consider deriving the dimension from the knowledge base config (when available) or raising an explicit error instead of returning a hardcoded value.
</issue_to_address>
### Comment 4
<location path="astrbot/core/knowledge_base/kb_helper.py" line_range="136" />
<code_context>
async def get_ep(self) -> EmbeddingProvider:
if not self.kb.embedding_provider_id:
- raise ValueError(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the placeholder provider classes and error-handling into shared, top-level helpers so the provider methods stay short, consistent, and easier to maintain.
You can keep the new “placeholder provider + logging” behavior while reducing complexity and duplication by:
* Moving the temp providers out of the methods.
* Sharing construction logic.
* Fixing the error message typo.
* Keeping `_ensure_vec_db` simple.
For example:
```python
# module-level helpers
class TempEmbeddingProvider(EmbeddingProvider):
def __init__(self, provider_id: str) -> None:
super().__init__({}, {})
self._provider_id = provider_id
async def get_embedding(self, text: str) -> list[float]:
raise ValueError(f"无法找到 ID 为 {self._provider_id} 的 Embedding Provider")
async def get_embeddings(self, texts: list[str]) -> list[list[float]]:
raise ValueError(f"无法找到 ID 为 {self._provider_id} 的 Embedding Provider")
def get_dim(self) -> int:
return 512
class TempRerankProvider(RerankProvider):
def __init__(self, provider_id: str) -> None:
super().__init__({}, {})
self._provider_id = provider_id
async def rerank(
self,
query: str,
documents: list[str],
top_n: int | None = None,
):
raise ValueError(f"无法找到 ID 为 {self._provider_id} 的 Rerank Provider")
```
Then the methods become straightforward, with no inner classes and consistent behavior:
```python
async def get_ep(self) -> EmbeddingProvider:
if not self.kb.embedding_provider_id:
logger.error(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
return TempEmbeddingProvider("<missing>")
ep: EmbeddingProvider | None = await self.prov_mgr.get_provider_by_id(
self.kb.embedding_provider_id, # type: ignore[arg-type]
)
if not ep:
logger.error(
f"无法找到 ID 为 {self.kb.embedding_provider_id} 的 Embedding Provider, 使用占位 Embedding Provider"
)
return TempEmbeddingProvider(self.kb.embedding_provider_id)
return ep
```
```python
async def get_rp(self) -> RerankProvider | None:
if not self.kb.rerank_provider_id:
return None
rp: RerankProvider | None = await self.prov_mgr.get_provider_by_id(
self.kb.rerank_provider_id, # type: ignore[arg-type]
)
if not rp:
logger.error(
f"无法找到 ID 为 {self.kb.rerank_provider_id} 的 Rerank Provider, 使用占位 Rerank Provider"
)
return TempRerankProvider(self.kb.rerank_provider_id)
return rp
```
`_ensure_vec_db` can stay focused on constructing the vector DB, delegating all error semantics to `get_ep` / `get_rp`:
```python
async def _ensure_vec_db(self) -> FaissVecDB:
if not self.kb.embedding_provider_id:
logger.error(f"知识库 {self.kb.kb_name} 未配置 Embedding Provider")
ep = await self.get_ep()
rp = await self.get_rp()
vec_db = FaissVecDB(
doc_store_path=str(self.kb_dir / "doc.db"),
index_store_path=str(self.kb_dir / "index.faiss"),
embedding_provider=ep,
rerank_provider=rp,
)
...
```
This keeps the new behavior (logging + placeholder providers + delayed error), but:
* Removes inner classes from methods.
* Centralizes and deduplicates the temp-provider logic.
* Fixes the “Embedding Provider”/“Rerank Provider” message inconsistency.
* Makes `get_ep` / `get_rp` easier to read and maintain.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76259ed2be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request modifies the knowledge base helper to use placeholder providers instead of raising immediate exceptions when an embedding or rerank provider is missing. Feedback includes addressing a hardcoded vector dimension of 512 that may cause mismatches, moving the placeholder class definitions to the top level for better efficiency and readability, correcting a copy-paste error in an exception message, and removing redundant error logging in the vector database initialization logic.
fixes #7218
修理了数据库选择的提供商消失,导致的数据库不显示问题
Modifications / 改动点
在AstrBot\astrbot\core\knowledge_base\kb_helper.py的文件给get_ep和get_rp添加了临时占位的提供商,该提供商实现的方法被调用将抛出报错,通过这样的方法将报错延迟到实际调用时,防止数据库在初始化时报错导致的数据库不显示。
Screenshots or Test Results / 运行截图或测试结果
这是一个两个提供商都被删除的数据库,正常显示了
使用占位提供商的后台日志
测试截图
Checklist / 检查清单
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
没有新功能
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
有进行测试
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。没有引入
/ 我的更改没有引入恶意代码。
没有恶意代码
Summary by Sourcery
Bug Fixes: