fix(level_4): replace gahmen McpToolset with FunctionTool wrappers + chart artifact passthrough (W9.3)

simonraj79 · simonraj79 · commit e8dff76f775f · 2026-05-01T16:34:59.000+08:00
Fixes the W9.2 Telegram E2E failure where Level 4 returned [empty] — gahmen tools were wired in source but never registered at runtime. Smoking gun (verified 2026-05-01 via deployed-engine probe): every Level 4 invocation hit `Tool 'gahmen_singstat_search_resources' not found. Available tools: consult_level_1`. Root cause: Smithery's hosted MCP endpoint requires `?api_key=<KEY>` query-param auth, NOT `Authorization: Bearer <KEY>` header. McpToolset's StreamableHTTPConnectionParams sent Bearer header → silent 401 on tools/list lazy-fetch → 0 gahmen tools in the agent's tools_dict → LLM saw the names in the instruction but couldn't call them. The same configuration ghost has been present since the W8.2 NBS swarm Strategist wiring; that integration has likely never actually served gahmen tools either (worth a follow-up probe). level_4_agent/gahmen_tools.py (new file) Eight async function wrappers around Smithery's JSON-RPC-over-HTTP endpoint, each registered as ADK 2.0's FunctionTool primitive. Verified 2026-05-01 with a tools/list probe returning all 8 expected tools. Same gahmen_<original> naming so the data_fetcher instruction is unchanged. level_4_agent/agent.py - Drop the McpToolset construction + the gahmen_toolset variable. - data_fetcher_agent.tools = [FunctionTool(consult_level_1), *GAHMEN_TOOLS]. - report_writer_agent: dropped output_schema=Brief. With it, the response packaged as a JSON function-response that didn't surface as text in the A2A response. Markdown text via instruction-enforced headings is the contract now (same fix as the orchestrator's writer_agent in W9.1). - analyst_agent: appended a CHART DESCRIPTION block to the instruction. Required text after every chart, leading with the headline finding + 3-5 specific numbers. Coordinator quotes this in the brief so A2A consumers (orchestrator, bot) get the chart's narrative even when the binary artifact doesn't propagate. deploy_a2a.py Register google.adk.a2a.executor.interceptors.include_artifacts_in_a2a_event_interceptor in A2aAgentExecutorConfig. Default execute_interceptors=None means pre-W9.3 deploys never copied session artifacts into the A2A response — that's why the orchestrator's chart_agent images died in their containers. The interceptor reads actions.artifact_delta (which BuiltInCodeExecutor populates via _code_execution.py:299-311) and emits TaskArtifactUpdateEvents that surface in Task.artifacts[*]. Local probe (scripts/local_smoke.py level_4_agent ...) confirms the full pipeline: data_fetcher → 3x gahmen_singstat_search_resources + gahmen_datagovsg_search_dataset + consult_level_1 → analyst_agent (chart, no gotcha-google#21 hang) → report_writer_agent (Markdown brief). 108s elapsed; 5138 chars of structured output with all 5 Markdown headings and the CHART DESCRIPTION block. Vertex deployments: level_4: 7527384696859131904 (us-central1, appspot SA) a2a_orchestrator: 1748140475035942912 (us-central1, appspot SA) Old W9.2 engines (7335418762742464512, 3876654248921923584) deleted. Two structural gaps surface in deployed testing, deferred to W9.4: 1. Vertex Agent Engine's ~180s per-request timeout aborts orchestrator runs on prompts that trigger 3+ cross-region Pro 2.5 Level consults ("thorough analysis"-class wording). Returns 400 with no body. Not configurable via agent_engines.create() parameters. 2. Orchestrator's chart_agent doesn't always fire — its instruction has an "if chart warranted" clause that Pro 2.5 sometimes declines. When it doesn't fire, no chart exists for the interceptor to ship. Plan reference: new features/18-level-4-mcp-and-chart-return.md.
diff --git a/deploy_a2a.py b/deploy_a2a.py
@@ -47,6 +47,10 @@
 import vertexai
 from a2a.types import AgentSkill
 from google.adk.a2a.executor.a2a_agent_executor import A2aAgentExecutor
+from google.adk.a2a.executor.config import A2aAgentExecutorConfig
+from google.adk.a2a.executor.interceptors.include_artifacts_in_a2a_event import (
+    include_artifacts_in_a2a_event_interceptor,
+)
 from google.adk.artifacts import InMemoryArtifactService
 from google.adk.memory import InMemoryMemoryService
 from google.adk.runners import Runner
@@ -97,7 +101,19 @@ def _build() -> A2aAgentExecutor:
             artifact_service=InMemoryArtifactService(),
             memory_service=InMemoryMemoryService(),
         )
-        return A2aAgentExecutor(runner=runner)  # keyword-only kwarg
+        # W9.3 (2026-05-01): register the include_artifacts interceptor so
+        # session artifacts produced by BuiltInCodeExecutor (matplotlib charts
+        # from chart_agent / analyst_agent) get packaged into the A2A
+        # response's TaskArtifactUpdateEvents — visible to A2A callers as
+        # FileParts in Task.artifacts. Without this, charts live and die in
+        # the engine's session and never reach the orchestrator/bot.
+        # Default config has execute_interceptors=None; we explicitly set
+        # the artifact passthrough one. Source: ADK 2.0
+        # google/adk/a2a/executor/interceptors/include_artifacts_in_a2a_event.py.
+        config = A2aAgentExecutorConfig(
+            execute_interceptors=[include_artifacts_in_a2a_event_interceptor],
+        )
+        return A2aAgentExecutor(runner=runner, config=config)
     return _build
 
 
diff --git a/level_4_agent/agent.py b/level_4_agent/agent.py
@@ -109,60 +109,31 @@
 from google.adk.code_executors.built_in_code_executor import BuiltInCodeExecutor
 from google.adk.planners.built_in_planner import BuiltInPlanner
 from google.adk.planners.plan_re_act_planner import PlanReActPlanner
-from google.adk.tools.mcp_tool import McpToolset, StreamableHTTPConnectionParams
 from google.genai import types
 from pydantic import BaseModel
 from pydantic import Field
 
 from google.adk.tools import FunctionTool
 
 from .creator_tools import create_specialist
+from .gahmen_tools import GAHMEN_TOOLS
 from .registry import hydrate_capabilities
 from .remote_tools import consult_level_1
 
 
-# --- Optional Singapore-government MCP toolset (gahmen-mcp) ---------------
-# Same Smithery-hosted MCP server attached to the NBS swarm's Strategist.
-# Exposes data.gov.sg + SingStat tables. Conditional on SMITHERY_API_KEY:
-# in deploys without the env var the data fetcher runs google_search-only.
-# Attached to `data_fetcher_agent` (NOT `analyst_agent`, which uses
-# `BuiltInCodeExecutor` — Gemini's built-in + function-tool mutex would
-# conflict). data_fetcher_agent already has `bypass_multi_tools_limit=True`
-# so adding more function tools is friction-free.
+# --- Singapore-government data tools (gahmen-mcp via Smithery) ------------
+# W9.3 (2026-05-01): switched from McpToolset to plain FunctionTool wrappers.
+# Root cause of W9.2 failure: Smithery uses ?api_key=<KEY> query-param auth,
+# not Authorization: Bearer header. McpToolset's silent 401 left the agent's
+# tools_dict empty so the LLM tried to call gahmen_* tools that the framework
+# couldn't dispatch. FunctionTool wrappers in level_4_agent/gahmen_tools.py
+# call Smithery's JSON-RPC endpoint directly and return text — predictable,
+# no MCP session lifecycle, no auth scheme drama.
 #
-# Excluded tools: datagovsg_initiate_download / datagovsg_poll_download
-# (async server-side jobs that don't fit a single-turn fetcher).
+# GAHMEN_TOOLS is an empty list when SMITHERY_API_KEY is unset (graceful
+# degradation: data_fetcher runs consult_level_1-only).
 
 _SMITHERY_API_KEY = os.environ.get("SMITHERY_API_KEY", "")
-_SMITHERY_GAHMEN_URL = os.environ.get(
-    "SMITHERY_GAHMEN_URL",
-    "https://server.smithery.ai/aniruddha-adhikary/gahmen-mcp",
-)
-_GAHMEN_TOOL_FILTER = [
-    "datagovsg_list_collections",
-    "datagovsg_get_collection",
-    "datagovsg_list_datasets",
-    "datagovsg_get_dataset_metadata",
-    "datagovsg_search_dataset",
-    "singstat_search_resources",
-    "singstat_get_metadata",
-    "singstat_get_table_data",
-]
-
-if _SMITHERY_API_KEY:
-    gahmen_toolset = McpToolset(
-        connection_params=StreamableHTTPConnectionParams(
-            url=_SMITHERY_GAHMEN_URL,
-            headers={"Authorization": f"Bearer {_SMITHERY_API_KEY}"},
-        ),
-        tool_filter=_GAHMEN_TOOL_FILTER,
-        # Prefix ⇒ tools surface as `gahmen_singstat_*` / `gahmen_datagovsg_*`.
-        # Same prefix the swarm bot's Telegram anchor uses to render visible
-        # tool calls. Future MCP additions won't collide with this namespace.
-        tool_name_prefix="gahmen",
-    )
-else:
-    gahmen_toolset = None
 
 
 # ---------------------------------------------------------------------------
@@ -270,18 +241,14 @@ class Brief(BaseModel):
 # through consult_level_1, paying one A2A roundtrip (~5-10s) for each
 # search. That's the trade-off of demonstrating delegation: slower than
 # inline google_search, but architecturally cleaner.
-_data_fetcher_tools: list = [
-    FunctionTool(consult_level_1),
-]
-if gahmen_toolset is not None:
-    _data_fetcher_tools.append(gahmen_toolset)
+_data_fetcher_tools: list = [FunctionTool(consult_level_1), *GAHMEN_TOOLS]
 
 
 # Build the data_fetcher instruction conditionally on whether gahmen
 # tools are actually wired in. This prevents the LLM from hallucinating
 # gahmen tool calls when SMITHERY_API_KEY is unset — the model can only
 # call tools the instruction names. Same content, two variants.
-if gahmen_toolset is not None:
+if GAHMEN_TOOLS:
     _data_fetcher_instruction = (
         "You are a business-data research specialist. You have NO"
         " built-in search of your own. You acquire data only via two"
@@ -442,6 +409,19 @@ class Brief(BaseModel):
 - Stateful across turns: do not re-initialise variables or re-load data.
 - Pre-imported: io, math, re, matplotlib.pyplot as plt, numpy as np, pandas as pd, scipy.
 - Do NOT run `pip install`.
+
+# After EVERY chart — REQUIRED chart description (W9.3)
+
+After every chart you produce, write a one-paragraph description of what
+the chart shows, leading with the headline finding. Format as:
+
+  CHART DESCRIPTION: <one sentence saying what the chart shows>.
+  Key data points: <list 3-5 specific numbers from the chart>.
+
+The coordinator quotes this description in the final brief, since the
+chart artifact itself does not propagate through A2A responses to the
+calling agent. The text description IS the chart for A2A consumers.
+Without this line, the brief will reference an invisible chart.
 """,
     code_executor=BuiltInCodeExecutor(),
     # Mandatory for the same reason as data_fetcher_agent — except here
@@ -457,22 +437,51 @@ class Brief(BaseModel):
     # us-central1 + Pro 2.5 (W9.2 — all A2A sub-agents on Pro per Simon 2026-05-01).
     model="gemini-2.5-pro",
     description=(
-        "Formats accumulated findings into a structured BI brief."
+        "Formats accumulated findings into a Markdown BI brief."
         " Output is the final answer — do not re-paraphrase."
     ),
     mode="single_turn",
     input_schema=WriterInput,
-    # SAFE to set output_schema here: this agent has no built-in tools,
-    # so set_model_response can be injected without conflict. Demonstrates
-    # the v2 typed-output contract for terminal nodes.
-    output_schema=Brief,
+    # W9.3 (2026-05-01): dropped output_schema=Brief. Same fix as
+    # a2a_orchestrator's writer_agent: structured-output forced JSON
+    # serialization that the A2A response packaged as a non-text part,
+    # surfacing as [empty] to the calling orchestrator's
+    # extract_a2a_text. Markdown text is what the orchestrator (and the
+    # bot's downstream renderer) expects. The Brief class above is kept
+    # as a documentation contract for what fields the writer produces;
+    # actual enforcement is now via the instruction's "use these EXACT
+    # Markdown headings" rule below.
     instruction=(
-        "Synthesise a Brief from the fetcher_findings (raw facts) and"
-        " analyst_findings (numeric summaries) for the topic. Quote"
-        " analyst numbers VERBATIM — do not round, re-format, or"
-        " re-interpret them. Weave fetcher source domains inline in the"
-        " analysis. Lead with the most important finding in"
-        " executive_summary."
+        "Synthesise a Markdown brief from the fetcher_findings (raw "
+        "facts) and analyst_findings (numeric summaries + chart "
+        "descriptions) for the topic. Use these EXACT Markdown "
+        "headings, in this order:\n\n"
+        "# {Concise Title — describes the question's topic}\n\n"
+        "## Executive Summary\n\n"
+        "[1-2 paragraph plain-English answer. Lead with the most "
+        "important finding. If data is incomplete, state that upfront.]\n\n"
+        "## Key Metrics\n\n"
+        "- Bullet list of headline numbers, quoted VERBATIM from "
+        "analyst_findings. Do not round or re-interpret.\n\n"
+        "## Analysis\n\n"
+        "[2-3 paragraph synthesis. Inline source attribution "
+        "throughout (e.g., 'According to SingStat (table M015711), ...' "
+        "or 'According to bloomberg.com (via Level 1), ...'). If a "
+        "chart description appears in analyst_findings, weave it into "
+        "this section explicitly: 'The chart shows ...'.]\n\n"
+        "## Sources\n\n"
+        "- Bullet list of sources cited inline. Include both gahmen / "
+        "SingStat / data.gov.sg references AND any external domains.\n\n"
+        "## Confidence and Gaps\n\n"
+        "[Where the brief is most/least confident. CALL OUT explicitly "
+        "any consult that returned [error] / [skip] / [empty] in the "
+        "fetcher_findings. Do not pretend a failed lookup succeeded.]\n\n"
+        "Rules:\n"
+        "1. Quote analyst numbers VERBATIM. Do not round, re-format, or "
+        "re-interpret.\n"
+        "2. Output ONLY the Markdown report. No preamble ('Here is your "
+        "brief:'), no JSON, no commentary. The Markdown above IS the "
+        "complete final response."
     ),
     # No built-in tools, so transfer_to_agent injection wouldn't
     # conflict — but consistency: terminal sub-agents shouldn't
diff --git a/level_4_agent/gahmen_tools.py b/level_4_agent/gahmen_tools.py