-
Notifications
You must be signed in to change notification settings - Fork 664
🐛 Bugfix: knowledge_base_search_tool called with TypeError #3259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -198,11 +198,16 @@ def create_local_tool(self, tool_config: ToolConfig): | |
| raise ValueError(f"{class_name} not found in local") | ||
| else: | ||
| if class_name == "KnowledgeBaseSearchTool": | ||
| # Filter out conflicting parameters from params to avoid conflicts | ||
| # These parameters have exclude=True and cannot be passed to __init__ | ||
| # due to smolagents.tools.Tool wrapper restrictions | ||
| # Filter out conflicting parameters from params to avoid conflicts. | ||
| # Parameters declared with exclude=True cannot be passed to __init__ | ||
| # due to smolagents.tools.Tool wrapper restrictions; they are set as | ||
| # attributes on the instance after construction, sourced from metadata. | ||
| # `document_paths` is intentionally hidden from the LLM and only | ||
| # populated via tool_params from the northbound interface. | ||
| filtered_params = {k: v for k, v in params.items() | ||
| if k not in ["vdb_core", "embedding_model", "observer", "rerank_model", "display_name_to_index_map"]} | ||
| if k not in ["vdb_core", "embedding_model", "observer", | ||
| "rerank_model", "display_name_to_index_map", | ||
| "document_paths"]} | ||
| # Create instance with only non-excluded parameters | ||
| tools_obj = tool_class(**filtered_params) | ||
| # Set excluded parameters directly as attributes after instantiation | ||
|
|
@@ -216,6 +221,13 @@ def create_local_tool(self, tool_config: ToolConfig): | |
| "rerank_model", None) if tool_config.metadata else None | ||
| tools_obj.display_name_to_index_map = tool_config.metadata.get( | ||
| "display_name_to_index_map", {}) if tool_config.metadata else {} | ||
| # Internal access control: restrict results to documents whose | ||
| # path_or_url is in the allow list. Only the northbound interface | ||
| # may populate this; never the LLM. | ||
| tools_obj.set_document_paths( | ||
| tool_config.metadata.get( | ||
| "document_paths") if tool_config.metadata else None | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P1] metadata 里的 document_paths 被直接送进访问控制过滤器,没有校验来源和类型。只要 metadata 被错误构造为字符串或混合 URL,就会影响检索授权;这里应在 agent 侧先 validate/canonicalize。
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 事后审查补充:[P1] metadata 里的 document_paths 被直接送进访问控制过滤器,没有校验来源和类型。只要 metadata 被错误构造为字符串或混合 URL,就会影响检索授权;这里应在 agent 侧先 validate/canonicalize。 影响:这个问题合入后会在对应部署、运行或权限场景中留下真实故障/安全风险,后续排查成本较高。
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 二次事后审查补充:[P1] metadata 里的 document_paths 被直接送进访问控制过滤器,没有校验来源和类型。只要 metadata 被错误构造为字符串或混合 URL,就会影响检索授权;这里应在 agent 侧先 validate/canonicalize。 影响:该问题合入后仍可能在真实部署、运行、权限或测试场景中形成回归风险。 |
||
| ) | ||
| elif class_name in ["DifySearchTool", "DataMateSearchTool"]: | ||
| # These parameters have exclude=True and cannot be passed to __init__ | ||
| filtered_params = {k: v for k, v in params.items() | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,6 +21,21 @@ | |
| logger = logging.getLogger("knowledge_base_search_tool") | ||
|
|
||
|
|
||
| def _unwrap_field_info(value): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| """Resolve a value that may be wrapped in a Pydantic FieldInfo. | ||
|
|
||
| Parameters declared with `Field(...)` and `exclude=True` are not expanded by | ||
| smolagents' Tool wrapper, so they arrive at `__init__` as raw FieldInfo | ||
| instances instead of their declared defaults. This helper extracts the | ||
| concrete value so callers can safely treat the result as plain data. | ||
| """ | ||
| if isinstance(value, FieldInfo): | ||
| if value.default_factory is not None: | ||
| return value.default_factory() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P2] default_factory 每次 unwrap 都会执行,包括过滤阶段再次调用时。如果 factory 生成的是可变列表或有副作用,允许列表会在一次请求内变化;建议在 init/set_document_paths 时解析一次并缓存具体值。
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P2] default_factory 每次 unwrap 都会执行,包括过滤阶段再次调用时。如果 factory 生成的是可变列表或有副作用,允许列表会在一次请求内变化;建议在 init/set_document_paths 时解析一次并缓存具体值。
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 事后审查补充:[P2] default_factory 每次 unwrap 都会执行,包括过滤阶段再次调用时。如果 factory 生成的是可变列表或有副作用,允许列表会在一次请求内变化;建议在 init/set_document_paths 时解析一次并缓存具体值。 影响:这个问题合入后会在对应部署、运行或权限场景中留下真实故障/安全风险,后续排查成本较高。
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 二次事后审查补充:[P2] default_factory 每次 unwrap 都会执行,包括过滤阶段再次调用时。如果 factory 生成的是可变列表或有副作用,允许列表会在一次请求内变化;建议在 init/set_document_paths 时解析一次并缓存具体值。 影响:该问题合入后仍可能在真实部署、运行、权限或测试场景中形成回归风险。 |
||
| return value.default | ||
| return value | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P1] 这里直接返回原值,没有保证 document_paths 是 list[str]。如果外部传入字符串,后续 membership 会变成对子串的检查,可能错误放行或丢弃结果;需要显式校验/转换类型。
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P1] 这里直接返回原值,没有保证 document_paths 是 list[str]。如果外部传入字符串,后续 membership 会变成对子串的检查,可能错误放行或丢弃结果;需要显式校验/转换类型。
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 事后审查补充:[P1] 这里直接返回原值,没有保证 document_paths 是 list[str]。如果外部传入字符串,后续 membership 会变成对子串的检查,可能错误放行或丢弃结果;需要显式校验/转换类型。 影响:这个问题合入后会在对应部署、运行或权限场景中留下真实故障/安全风险,后续排查成本较高。
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 二次事后审查补充:[P1] 这里直接返回原值,没有保证 document_paths 是 list[str]。如果外部传入字符串,后续 membership 会变成对子串的检查,可能错误放行或丢弃结果;需要显式校验/转换类型。 影响:该问题合入后仍可能在真实部署、运行、权限或测试场景中形成回归风险。 |
||
|
|
||
|
|
||
| class KnowledgeBaseSearchTool(Tool): | ||
| """Knowledge base search tool""" | ||
|
|
||
|
|
@@ -129,7 +144,10 @@ def __init__( | |
| self.rerank_model = rerank_model | ||
| self.data_process_service = os.getenv("DATA_PROCESS_SERVICE") | ||
| self.display_name_to_index_map = display_name_to_index_map | ||
| self._internal_document_paths = document_paths | ||
| # `document_paths` is declared with `exclude=True` so smolagents passes the | ||
| # raw FieldInfo default when no value is supplied. Unwrap it here so the | ||
| # internal filter is always a concrete list (or None), never a FieldInfo. | ||
| self._internal_document_paths = _unwrap_field_info(document_paths) | ||
|
|
||
| self.record_ops = 1 | ||
| self.running_prompt_zh = "知识库检索中..." | ||
|
|
@@ -144,7 +162,7 @@ def set_document_paths(self, document_paths: Optional[List[str]]) -> None: | |
| Args: | ||
| document_paths: List of allowed document path_or_urls. If None, no filtering is applied. | ||
| """ | ||
| self._internal_document_paths = document_paths | ||
| self._internal_document_paths = _unwrap_field_info(document_paths) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P1] set_document_paths 仍然接受任意值并直接保存 unwrap 结果。这个字段承担访问控制,不能只处理 FieldInfo;应该拒绝非 list[str],并把空列表保留为“禁止所有文档”而不是 None。
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P1] set_document_paths 仍然接受任意值并直接保存 unwrap 结果。这个字段承担访问控制,不能只处理 FieldInfo;应该拒绝非 list[str],并把空列表保留为“禁止所有文档”而不是 None。
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 事后审查补充:[P1] set_document_paths 仍然接受任意值并直接保存 unwrap 结果。这个字段承担访问控制,不能只处理 FieldInfo;应该拒绝非 list[str],并把空列表保留为“禁止所有文档”而不是 None。 影响:这个问题合入后会在对应部署、运行或权限场景中留下真实故障/安全风险,后续排查成本较高。
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 二次事后审查补充:[P1] set_document_paths 仍然接受任意值并直接保存 unwrap 结果。这个字段承担访问控制,不能只处理 FieldInfo;应该拒绝非 list[str],并把空列表保留为“禁止所有文档”而不是 None。 影响:该问题合入后仍可能在真实部署、运行、权限或测试场景中形成回归风险。 |
||
|
|
||
| def _convert_to_index_names(self, names: List[str]) -> List[str]: | ||
| """Convert display names (knowledge_name) to index names if necessary. | ||
|
|
@@ -188,7 +206,7 @@ def _filter_by_document_paths(self, results: List[dict]) -> List[dict]: | |
| Returns: | ||
| Filtered list containing only results with allowed document paths | ||
| """ | ||
| allowed_paths = self._internal_document_paths | ||
| allowed_paths = _unwrap_field_info(self._internal_document_paths) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [代码规范]
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 在
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P1] 过滤前没有把 path_or_url 和 allowed_paths 规范到同一种对象路径。s3://、/nexent/、presigned URL 三种形式会互相不匹配,访问控制结果取决于上游返回格式。
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [P1] 过滤前没有把 path_or_url 和 allowed_paths 规范到同一种对象路径。s3://、/nexent/、presigned URL 三种形式会互相不匹配,访问控制结果取决于上游返回格式。
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 事后审查补充:[P1] 过滤前没有把 path_or_url 和 allowed_paths 规范到同一种对象路径。s3://、/nexent/、presigned URL 三种形式会互相不匹配,访问控制结果取决于上游返回格式。 影响:这个问题合入后会在对应部署、运行或权限场景中留下真实故障/安全风险,后续排查成本较高。
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 二次事后审查补充:[P1] 过滤前没有把 path_or_url 和 allowed_paths 规范到同一种对象路径。s3://、/nexent/、presigned URL 三种形式会互相不匹配,访问控制结果取决于上游返回格式。 影响:该问题合入后仍可能在真实部署、运行、权限或测试场景中形成回归风险。 |
||
| if not allowed_paths: | ||
| return results | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P1] metadata 里的 document_paths 被直接送进访问控制过滤器,没有校验来源和类型。只要 metadata 被错误构造为字符串或混合 URL,就会影响检索授权;这里应在 agent 侧先 validate/canonicalize。