Skip to content

Refactor(ai.common): consolidate tool-call envelope and index helpers (stage 2) #935

@asclearuc

Description

@asclearuc

Problem Statement

  • After the recent ai.common.utils consolidation, three duplicates remain in the node-side code.
  • _tool_call_protocol_prompt is byte-identical between nodes/src/nodes/agent_langchain/langchain.py:322 and nodes/src/nodes/agent_deepagent/deepagent.py:472.
  • _parse_tool_call_envelope is a near-duplicate between the same two drivers (langchain.py:338, deepagent.py:495). The deepagent version is strictly better — it uses _extract_first_json_object to tolerate trailing prose, fenced markdown, and stacked JSON objects.
  • _build_highlight_config is byte-identical inside one node: nodes/src/nodes/index_search/opensearch_client.py:50 and elasticsearch_store.py:52.
  • None of the four helpers have direct unit tests today.

Proposed Solution

  • Move _tool_call_protocol_prompt, _parse_tool_call_envelope, and _extract_first_json_object into a new packages/ai/src/ai/common/agent/_internal/protocol.py. Public names: build_tool_call_protocol_prompt, parse_tool_call_envelope, extract_first_json_object. Keep the trio together — they describe one protocol (LLM emits a JSON envelope; we parse it).
  • Move _build_highlight_config into a new ai.common.index namespace at packages/ai/src/ai/common/index/highlight.py, with public name build_highlight_config. Anticipates future index-related shared helpers.
  • Add unit tests next to each consolidated helper. For the envelope parser, include cases for trailing prose, fenced markdown, stacked JSON, escaped quotes, malformed input, and the LangChain-compat additional_kwargs fallback path. For the walker, add direct tests of balanced-brace edge cases.
  • Behaviour changes accepted for agent_langchain: the unified envelope parser uses the smarter walker (parses noisy LLM output that previously returned None), and tool-call IDs use uuid.uuid4().hex[:12] instead of id(obj). Call IDs are opaque tokens; the format change is invisible to the model.

Alternatives Considered

  • Placing all three in ai.common.utils.agent_tools (the namespace established by the parent PR). Rejected — the agent-protocol trio is conceptually part of the agent module, not generic utilities.
  • Splitting the parser and the walker into different modules. Rejected — the parser depends on the walker and they share one purpose.
  • Keeping LangChain's strict json.loads parsing and id(obj) call IDs via a strict_json / id_format parameter. Rejected as over-engineering — the deepagent behaviour is strictly better.
  • Putting build_highlight_config inside index_search/ (the only current user). Rejected to anticipate future Elasticsearch/OpenSearch-related nodes.

Affected Modules

  • server (C++ engine)
  • client-typescript
  • client-python
  • client-mcp
  • nodes (pipeline)
  • ai
  • chat-ui
  • dropper-ui
  • vscode
  • tika

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions