Skip to content

Agent snapshots can become non-resumable when snapshot fallback stores empty tool_invoker/chat_generator payloads #11126

@shaun0927

Description

@shaun0927

Describe the bug
_create_agent_snapshot() now avoids masking the original runtime error by falling back to {} when serializing chat_generator or tool_invoker inputs fails. However, that fallback can produce a saved agent snapshot that is no longer resumable.

I reproduced this on the current main branch by passing a non-serializable runtime-only callback (lambda) as streaming_callback. The agent still saves a snapshot file, but the snapshot contains empty {} payloads for chat_generator / tool_invoker, and agent.run(..., snapshot=...) then fails during resume.

This looks like a follow-up regression to the robustness fix in #11108: the original runtime error is preserved, but the saved snapshot violates the resume contract.

Error message
On resume:

DeserializationError: Invalid format of passed serialized payload. Expected a dictionary with keys 'serialization_schema' and 'serialized_data'. Got: {}

Warnings emitted when the snapshot is created:

Failed to serialize the agent's chat_generator inputs. The inputs in the snapshot will be replaced with an empty dictionary. Error: Serialization of lambdas is not supported.
Failed to serialize the agent's tool_invoker inputs. The inputs in the snapshot will be replaced with an empty dictionary. Error: Serialization of lambdas is not supported.

Expected behavior
One of the following:

  1. The snapshot remains resumable by omitting only the non-serializable runtime-only fields.
  2. Or Haystack clearly marks the snapshot as non-resumable instead of saving structurally invalid {} payloads.

Additional context

To Reproduce
Minimal repro outline:

  1. Create an Agent with a ToolBreakpoint snapshot path.
  2. Pass a non-serializable runtime callback, for example streaming_callback=lambda chunk: None.
  3. Trigger a breakpoint or tool failure so the agent writes a snapshot.
  4. Load the saved snapshot with load_pipeline_snapshot(...).
  5. Call agent.run(..., snapshot=loaded.agent_snapshot).
  6. Resume fails with DeserializationError because chat_generator / tool_invoker were saved as {}.

A concrete version of the repro (validated locally on current main) is:

from pathlib import Path
import os

from haystack import component
from haystack.components.agents import Agent
from haystack.core.errors import BreakpointException
from haystack.core.pipeline.breakpoint import load_pipeline_snapshot
from haystack.dataclasses import ChatMessage, ToolCall
from haystack.dataclasses.breakpoints import AgentBreakpoint, ToolBreakpoint
from haystack.tools import Tool, Toolset


@component
class MockChatGenerator:
    def __init__(self):
        self._counter = 0
        self.responses = [
            ChatMessage.from_assistant(tool_calls=[ToolCall(tool_name="weather_tool", arguments={"location": "Berlin"})]),
            ChatMessage.from_assistant(text="The weather in Berlin is sunny."),
        ]

    def to_dict(self):
        return {"type": "MockChatGenerator", "data": {}}

    @classmethod
    def from_dict(cls, data):
        return cls()

    @component.output_types(replies=list[ChatMessage])
    def run(self, messages, tools: list[Tool] | Toolset | None = None, **kwargs):
        result = self.responses[self._counter]
        self._counter += 1
        return {"replies": [result]}


def weather_tool(location: str):
    return {"weather": "sunny", "location": location}


os.environ["HAYSTACK_PIPELINE_SNAPSHOT_SAVE_ENABLED"] = "true"
debug_path = Path("debug_snapshots")

agent = Agent(
    chat_generator=MockChatGenerator(),
    tools=[
        Tool(
            name="weather_tool",
            description="Weather tool",
            parameters={"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]},
            function=weather_tool,
        )
    ],
)

breakpoint = AgentBreakpoint(
    break_point=ToolBreakpoint(component_name="tool_invoker", tool_name="weather_tool", snapshot_file_path=str(debug_path)),
    agent_name="test_agent",
)

try:
    agent.run(
        messages=[ChatMessage.from_user("What's the weather in Berlin?")],
        break_point=breakpoint,
        streaming_callback=lambda chunk: None,
    )
except BreakpointException:
    pass

snapshot_file = max(debug_path.glob("test_agent_tool_invoker_*.json"), key=lambda p: p.stat().st_ctime)
loaded = load_pipeline_snapshot(snapshot_file)
agent.run(messages=[ChatMessage.from_user("ignored")], snapshot=loaded.agent_snapshot)

FAQ Check

System:

  • OS: macOS (darwin arm64)
  • GPU/CPU: CPU
  • Haystack version (commit or version number): 07e8c9bad898a466f0a91c97d817ba3e6ba13dc8 (main as of 2026-04-17)
  • DocumentStore: N/A
  • Reader: N/A
  • Retriever: N/A

Metadata

Metadata

Assignees

Labels

P1High priority, add to the next sprint

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions