Skip to content

Commit e8dff76

Browse files
committed
fix(level_4): replace gahmen McpToolset with FunctionTool wrappers + chart artifact passthrough (W9.3)
Fixes the W9.2 Telegram E2E failure where Level 4 returned [empty] — gahmen tools were wired in source but never registered at runtime. Smoking gun (verified 2026-05-01 via deployed-engine probe): every Level 4 invocation hit `Tool 'gahmen_singstat_search_resources' not found. Available tools: consult_level_1`. Root cause: Smithery's hosted MCP endpoint requires `?api_key=<KEY>` query-param auth, NOT `Authorization: Bearer <KEY>` header. McpToolset's StreamableHTTPConnectionParams sent Bearer header → silent 401 on tools/list lazy-fetch → 0 gahmen tools in the agent's tools_dict → LLM saw the names in the instruction but couldn't call them. The same configuration ghost has been present since the W8.2 NBS swarm Strategist wiring; that integration has likely never actually served gahmen tools either (worth a follow-up probe). level_4_agent/gahmen_tools.py (new file) Eight async function wrappers around Smithery's JSON-RPC-over-HTTP endpoint, each registered as ADK 2.0's FunctionTool primitive. Verified 2026-05-01 with a tools/list probe returning all 8 expected tools. Same gahmen_<original> naming so the data_fetcher instruction is unchanged. level_4_agent/agent.py - Drop the McpToolset construction + the gahmen_toolset variable. - data_fetcher_agent.tools = [FunctionTool(consult_level_1), *GAHMEN_TOOLS]. - report_writer_agent: dropped output_schema=Brief. With it, the response packaged as a JSON function-response that didn't surface as text in the A2A response. Markdown text via instruction-enforced headings is the contract now (same fix as the orchestrator's writer_agent in W9.1). - analyst_agent: appended a CHART DESCRIPTION block to the instruction. Required text after every chart, leading with the headline finding + 3-5 specific numbers. Coordinator quotes this in the brief so A2A consumers (orchestrator, bot) get the chart's narrative even when the binary artifact doesn't propagate. deploy_a2a.py Register google.adk.a2a.executor.interceptors.include_artifacts_in_a2a_event_interceptor in A2aAgentExecutorConfig. Default execute_interceptors=None means pre-W9.3 deploys never copied session artifacts into the A2A response — that's why the orchestrator's chart_agent images died in their containers. The interceptor reads actions.artifact_delta (which BuiltInCodeExecutor populates via _code_execution.py:299-311) and emits TaskArtifactUpdateEvents that surface in Task.artifacts[*]. Local probe (scripts/local_smoke.py level_4_agent ...) confirms the full pipeline: data_fetcher → 3x gahmen_singstat_search_resources + gahmen_datagovsg_search_dataset + consult_level_1 → analyst_agent (chart, no gotcha-google#21 hang) → report_writer_agent (Markdown brief). 108s elapsed; 5138 chars of structured output with all 5 Markdown headings and the CHART DESCRIPTION block. Vertex deployments: level_4: 7527384696859131904 (us-central1, appspot SA) a2a_orchestrator: 1748140475035942912 (us-central1, appspot SA) Old W9.2 engines (7335418762742464512, 3876654248921923584) deleted. Two structural gaps surface in deployed testing, deferred to W9.4: 1. Vertex Agent Engine's ~180s per-request timeout aborts orchestrator runs on prompts that trigger 3+ cross-region Pro 2.5 Level consults ("thorough analysis"-class wording). Returns 400 with no body. Not configurable via agent_engines.create() parameters. 2. Orchestrator's chart_agent doesn't always fire — its instruction has an "if chart warranted" clause that Pro 2.5 sometimes declines. When it doesn't fire, no chart exists for the interceptor to ship. Plan reference: new features/18-level-4-mcp-and-chart-return.md.
1 parent fd3c68c commit e8dff76

3 files changed

Lines changed: 343 additions & 58 deletions

File tree

deploy_a2a.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,10 @@
4747
import vertexai
4848
from a2a.types import AgentSkill
4949
from google.adk.a2a.executor.a2a_agent_executor import A2aAgentExecutor
50+
from google.adk.a2a.executor.config import A2aAgentExecutorConfig
51+
from google.adk.a2a.executor.interceptors.include_artifacts_in_a2a_event import (
52+
include_artifacts_in_a2a_event_interceptor,
53+
)
5054
from google.adk.artifacts import InMemoryArtifactService
5155
from google.adk.memory import InMemoryMemoryService
5256
from google.adk.runners import Runner
@@ -97,7 +101,19 @@ def _build() -> A2aAgentExecutor:
97101
artifact_service=InMemoryArtifactService(),
98102
memory_service=InMemoryMemoryService(),
99103
)
100-
return A2aAgentExecutor(runner=runner) # keyword-only kwarg
104+
# W9.3 (2026-05-01): register the include_artifacts interceptor so
105+
# session artifacts produced by BuiltInCodeExecutor (matplotlib charts
106+
# from chart_agent / analyst_agent) get packaged into the A2A
107+
# response's TaskArtifactUpdateEvents — visible to A2A callers as
108+
# FileParts in Task.artifacts. Without this, charts live and die in
109+
# the engine's session and never reach the orchestrator/bot.
110+
# Default config has execute_interceptors=None; we explicitly set
111+
# the artifact passthrough one. Source: ADK 2.0
112+
# google/adk/a2a/executor/interceptors/include_artifacts_in_a2a_event.py.
113+
config = A2aAgentExecutorConfig(
114+
execute_interceptors=[include_artifacts_in_a2a_event_interceptor],
115+
)
116+
return A2aAgentExecutor(runner=runner, config=config)
101117
return _build
102118

103119

level_4_agent/agent.py

Lines changed: 66 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -109,60 +109,31 @@
109109
from google.adk.code_executors.built_in_code_executor import BuiltInCodeExecutor
110110
from google.adk.planners.built_in_planner import BuiltInPlanner
111111
from google.adk.planners.plan_re_act_planner import PlanReActPlanner
112-
from google.adk.tools.mcp_tool import McpToolset, StreamableHTTPConnectionParams
113112
from google.genai import types
114113
from pydantic import BaseModel
115114
from pydantic import Field
116115

117116
from google.adk.tools import FunctionTool
118117

119118
from .creator_tools import create_specialist
119+
from .gahmen_tools import GAHMEN_TOOLS
120120
from .registry import hydrate_capabilities
121121
from .remote_tools import consult_level_1
122122

123123

124-
# --- Optional Singapore-government MCP toolset (gahmen-mcp) ---------------
125-
# Same Smithery-hosted MCP server attached to the NBS swarm's Strategist.
126-
# Exposes data.gov.sg + SingStat tables. Conditional on SMITHERY_API_KEY:
127-
# in deploys without the env var the data fetcher runs google_search-only.
128-
# Attached to `data_fetcher_agent` (NOT `analyst_agent`, which uses
129-
# `BuiltInCodeExecutor` — Gemini's built-in + function-tool mutex would
130-
# conflict). data_fetcher_agent already has `bypass_multi_tools_limit=True`
131-
# so adding more function tools is friction-free.
124+
# --- Singapore-government data tools (gahmen-mcp via Smithery) ------------
125+
# W9.3 (2026-05-01): switched from McpToolset to plain FunctionTool wrappers.
126+
# Root cause of W9.2 failure: Smithery uses ?api_key=<KEY> query-param auth,
127+
# not Authorization: Bearer header. McpToolset's silent 401 left the agent's
128+
# tools_dict empty so the LLM tried to call gahmen_* tools that the framework
129+
# couldn't dispatch. FunctionTool wrappers in level_4_agent/gahmen_tools.py
130+
# call Smithery's JSON-RPC endpoint directly and return text — predictable,
131+
# no MCP session lifecycle, no auth scheme drama.
132132
#
133-
# Excluded tools: datagovsg_initiate_download / datagovsg_poll_download
134-
# (async server-side jobs that don't fit a single-turn fetcher).
133+
# GAHMEN_TOOLS is an empty list when SMITHERY_API_KEY is unset (graceful
134+
# degradation: data_fetcher runs consult_level_1-only).
135135

136136
_SMITHERY_API_KEY = os.environ.get("SMITHERY_API_KEY", "")
137-
_SMITHERY_GAHMEN_URL = os.environ.get(
138-
"SMITHERY_GAHMEN_URL",
139-
"https://server.smithery.ai/aniruddha-adhikary/gahmen-mcp",
140-
)
141-
_GAHMEN_TOOL_FILTER = [
142-
"datagovsg_list_collections",
143-
"datagovsg_get_collection",
144-
"datagovsg_list_datasets",
145-
"datagovsg_get_dataset_metadata",
146-
"datagovsg_search_dataset",
147-
"singstat_search_resources",
148-
"singstat_get_metadata",
149-
"singstat_get_table_data",
150-
]
151-
152-
if _SMITHERY_API_KEY:
153-
gahmen_toolset = McpToolset(
154-
connection_params=StreamableHTTPConnectionParams(
155-
url=_SMITHERY_GAHMEN_URL,
156-
headers={"Authorization": f"Bearer {_SMITHERY_API_KEY}"},
157-
),
158-
tool_filter=_GAHMEN_TOOL_FILTER,
159-
# Prefix ⇒ tools surface as `gahmen_singstat_*` / `gahmen_datagovsg_*`.
160-
# Same prefix the swarm bot's Telegram anchor uses to render visible
161-
# tool calls. Future MCP additions won't collide with this namespace.
162-
tool_name_prefix="gahmen",
163-
)
164-
else:
165-
gahmen_toolset = None
166137

167138

168139
# ---------------------------------------------------------------------------
@@ -270,18 +241,14 @@ class Brief(BaseModel):
270241
# through consult_level_1, paying one A2A roundtrip (~5-10s) for each
271242
# search. That's the trade-off of demonstrating delegation: slower than
272243
# inline google_search, but architecturally cleaner.
273-
_data_fetcher_tools: list = [
274-
FunctionTool(consult_level_1),
275-
]
276-
if gahmen_toolset is not None:
277-
_data_fetcher_tools.append(gahmen_toolset)
244+
_data_fetcher_tools: list = [FunctionTool(consult_level_1), *GAHMEN_TOOLS]
278245

279246

280247
# Build the data_fetcher instruction conditionally on whether gahmen
281248
# tools are actually wired in. This prevents the LLM from hallucinating
282249
# gahmen tool calls when SMITHERY_API_KEY is unset — the model can only
283250
# call tools the instruction names. Same content, two variants.
284-
if gahmen_toolset is not None:
251+
if GAHMEN_TOOLS:
285252
_data_fetcher_instruction = (
286253
"You are a business-data research specialist. You have NO"
287254
" built-in search of your own. You acquire data only via two"
@@ -442,6 +409,19 @@ class Brief(BaseModel):
442409
- Stateful across turns: do not re-initialise variables or re-load data.
443410
- Pre-imported: io, math, re, matplotlib.pyplot as plt, numpy as np, pandas as pd, scipy.
444411
- Do NOT run `pip install`.
412+
413+
# After EVERY chart — REQUIRED chart description (W9.3)
414+
415+
After every chart you produce, write a one-paragraph description of what
416+
the chart shows, leading with the headline finding. Format as:
417+
418+
CHART DESCRIPTION: <one sentence saying what the chart shows>.
419+
Key data points: <list 3-5 specific numbers from the chart>.
420+
421+
The coordinator quotes this description in the final brief, since the
422+
chart artifact itself does not propagate through A2A responses to the
423+
calling agent. The text description IS the chart for A2A consumers.
424+
Without this line, the brief will reference an invisible chart.
445425
""",
446426
code_executor=BuiltInCodeExecutor(),
447427
# Mandatory for the same reason as data_fetcher_agent — except here
@@ -457,22 +437,51 @@ class Brief(BaseModel):
457437
# us-central1 + Pro 2.5 (W9.2 — all A2A sub-agents on Pro per Simon 2026-05-01).
458438
model="gemini-2.5-pro",
459439
description=(
460-
"Formats accumulated findings into a structured BI brief."
440+
"Formats accumulated findings into a Markdown BI brief."
461441
" Output is the final answer — do not re-paraphrase."
462442
),
463443
mode="single_turn",
464444
input_schema=WriterInput,
465-
# SAFE to set output_schema here: this agent has no built-in tools,
466-
# so set_model_response can be injected without conflict. Demonstrates
467-
# the v2 typed-output contract for terminal nodes.
468-
output_schema=Brief,
445+
# W9.3 (2026-05-01): dropped output_schema=Brief. Same fix as
446+
# a2a_orchestrator's writer_agent: structured-output forced JSON
447+
# serialization that the A2A response packaged as a non-text part,
448+
# surfacing as [empty] to the calling orchestrator's
449+
# extract_a2a_text. Markdown text is what the orchestrator (and the
450+
# bot's downstream renderer) expects. The Brief class above is kept
451+
# as a documentation contract for what fields the writer produces;
452+
# actual enforcement is now via the instruction's "use these EXACT
453+
# Markdown headings" rule below.
469454
instruction=(
470-
"Synthesise a Brief from the fetcher_findings (raw facts) and"
471-
" analyst_findings (numeric summaries) for the topic. Quote"
472-
" analyst numbers VERBATIM — do not round, re-format, or"
473-
" re-interpret them. Weave fetcher source domains inline in the"
474-
" analysis. Lead with the most important finding in"
475-
" executive_summary."
455+
"Synthesise a Markdown brief from the fetcher_findings (raw "
456+
"facts) and analyst_findings (numeric summaries + chart "
457+
"descriptions) for the topic. Use these EXACT Markdown "
458+
"headings, in this order:\n\n"
459+
"# {Concise Title — describes the question's topic}\n\n"
460+
"## Executive Summary\n\n"
461+
"[1-2 paragraph plain-English answer. Lead with the most "
462+
"important finding. If data is incomplete, state that upfront.]\n\n"
463+
"## Key Metrics\n\n"
464+
"- Bullet list of headline numbers, quoted VERBATIM from "
465+
"analyst_findings. Do not round or re-interpret.\n\n"
466+
"## Analysis\n\n"
467+
"[2-3 paragraph synthesis. Inline source attribution "
468+
"throughout (e.g., 'According to SingStat (table M015711), ...' "
469+
"or 'According to bloomberg.com (via Level 1), ...'). If a "
470+
"chart description appears in analyst_findings, weave it into "
471+
"this section explicitly: 'The chart shows ...'.]\n\n"
472+
"## Sources\n\n"
473+
"- Bullet list of sources cited inline. Include both gahmen / "
474+
"SingStat / data.gov.sg references AND any external domains.\n\n"
475+
"## Confidence and Gaps\n\n"
476+
"[Where the brief is most/least confident. CALL OUT explicitly "
477+
"any consult that returned [error] / [skip] / [empty] in the "
478+
"fetcher_findings. Do not pretend a failed lookup succeeded.]\n\n"
479+
"Rules:\n"
480+
"1. Quote analyst numbers VERBATIM. Do not round, re-format, or "
481+
"re-interpret.\n"
482+
"2. Output ONLY the Markdown report. No preamble ('Here is your "
483+
"brief:'), no JSON, no commentary. The Markdown above IS the "
484+
"complete final response."
476485
),
477486
# No built-in tools, so transfer_to_agent injection wouldn't
478487
# conflict — but consistency: terminal sub-agents shouldn't

0 commit comments

Comments
 (0)