Skip to content

Commit e24614d

Browse files
authored
blog: Haystack persistent memory (drop-in tools + auto-recall wrapper) (#2147)
* blog: add Haystack persistent memory integration post Walkthrough of hindsight-haystack — two integration modes: - create_hindsight_tools() returning a list[Tool] for an Agent - HindsightMemoryWrapper, a Toolset subclass with auto_recall and auto_retain that runs the memory work before/after each turn. Plus the three memory primitives (retain/recall/reflect) and the include_* flags to drop any subset. Every concrete claim verified against the README and hindsight_haystack/tools.py: - Package name + version (0.1.0) - Python >= 3.10, haystack-ai >= 2.12.0, hindsight-client >= 0.4.0 - Exported names from __init__.py - create_hindsight_tools() and HindsightMemoryWrapper signatures - "Use toolset.run(agent, ...) not agent.run(...) for auto behavior" - configure() shape and acceptable kwargs Underlying integration unit tests: 83/83 passing (3 e2e skipped for lack of API keys in CI sandbox). Cover is a placeholder (Codex art) for now; swap before merging.
1 parent f5a6c30 commit e24614d

2 files changed

Lines changed: 193 additions & 0 deletions

File tree

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
---
2+
title: "Haystack Persistent Memory: Drop-In Tools and Auto-Recall for Any Agent"
3+
authors: [benfrank241]
4+
slug: "2026/06/11/haystack-persistent-memory"
5+
date: 2026-06-11T13:00
6+
tags: [haystack, memory, persistent-memory, hindsight, agents, tutorial]
7+
description: "Add persistent long-term memory to any Haystack agent with Hindsight. Three Haystack Tools (retain, recall, reflect) plus an optional HindsightMemoryWrapper that injects memories before each turn and stores the transcript after."
8+
image: /img/blog/haystack-persistent-memory.png
9+
hide_table_of_contents: true
10+
---
11+
12+
![Haystack Persistent Memory with Hindsight](/img/blog/haystack-persistent-memory.png)
13+
14+
[Haystack](https://haystack.deepset.ai/) is deepset's open-source framework for building production LLM applications: pipelines, agents, RAG, the whole stack. The `Agent` component handles tool use and conversation control well, but it's stateless across runs. The next session starts cold.
15+
16+
This post is a walkthrough of `hindsight-haystack`, the integration that adds three drop-in Haystack `Tool` instances (retain, recall, reflect) and an optional `HindsightMemoryWrapper` toolset that does the memory work automatically without the agent having to decide when to call a tool.
17+
18+
## TL;DR
19+
20+
<!-- truncate -->
21+
22+
- `pip install hindsight-haystack`. Two entry points cover both styles of integration:
23+
- **`create_hindsight_tools()`** returns a `list[Tool]` you pass to any Haystack `Agent`. The agent decides when to call them.
24+
- **`HindsightMemoryWrapper`** is a `Toolset` subclass with `auto_recall` and `auto_retain` flags. Use `toolset.run(agent, ...)` and the memory work runs automatically before and after each turn.
25+
- The same three memory primitives in both: **retain** stores content, **recall** searches, **reflect** synthesizes.
26+
- Drop tools individually with `include_retain` / `include_recall` / `include_reflect`.
27+
- Hindsight Cloud is the recommended path; self-hosted works the same way once you point the client URL at your instance.
28+
29+
## Why Persistent Memory Matters for Haystack Agents
30+
31+
Haystack's `Agent` component is fine for a single conversation: it carries the message history through the run and the chat model has it in context. What it can't do is remember anything once that run ends. The next call starts with whatever you pass in `messages`, nothing else.
32+
33+
For a one-shot RAG endpoint that's fine. For a customer-facing assistant that should remember preferences across sessions, a support agent that should accumulate fixes to known issues, or a research agent that should build up a project model over weeks, you need a layer that survives the run. That's the gap this integration fills.
34+
35+
## Two Integration Modes
36+
37+
The integration ships two entry points, and they cover the two ways teams typically want to wire memory in.
38+
39+
### Mode 1: Tools the Agent Decides To Call
40+
41+
`create_hindsight_tools()` returns a list of Haystack `Tool` instances. You pass them to `Agent(tools=...)`, and the model decides when to call them based on the tool descriptions and the user's request.
42+
43+
```python
44+
from hindsight_client import Hindsight
45+
from hindsight_haystack import create_hindsight_tools
46+
from haystack.components.agents import Agent
47+
from haystack.components.generators.chat import OpenAIChatGenerator
48+
from haystack.dataclasses import ChatMessage
49+
50+
client = Hindsight(base_url="http://localhost:8888")
51+
52+
tools = create_hindsight_tools(
53+
client=client,
54+
bank_id="user-123",
55+
mission="Track user preferences",
56+
)
57+
58+
agent = Agent(
59+
chat_generator=OpenAIChatGenerator(model="gpt-4o-mini"),
60+
tools=tools,
61+
system_prompt=(
62+
"You are a helpful assistant with long-term memory. "
63+
"Use retain_memory to store important facts. "
64+
"Use recall_memory to search memory before answering."
65+
),
66+
)
67+
68+
result = agent.run(messages=[ChatMessage.from_user("Remember that I prefer dark mode")])
69+
print(result["messages"][-1].text)
70+
```
71+
72+
You get three tools by default: `retain_memory`, `recall_memory`, `reflect_memory`. The system prompt tells the agent when each one should fire. This mode is the right fit when you want explicit control: the agent calls memory on the turns where it makes sense, and skips it on the turns where it doesn't.
73+
74+
The tradeoff is that the model has to remember to use the tools. A weaker model under a busy prompt sometimes won't.
75+
76+
### Mode 2: Automatic Memory With `HindsightMemoryWrapper`
77+
78+
`HindsightMemoryWrapper` is a `Toolset` subclass that handles the memory lifecycle so the agent doesn't have to. With `auto_recall=True`, every turn prepends relevant memories to the system prompt before the chat model runs. With `auto_retain=True`, the user and assistant messages get stored after each turn.
79+
80+
```python
81+
from hindsight_haystack import HindsightMemoryWrapper
82+
83+
toolset = HindsightMemoryWrapper(
84+
client=client,
85+
bank_id="user-123",
86+
mission="Track user preferences",
87+
auto_recall=True, # Inject memories into system prompt before each turn
88+
auto_retain=True, # Store user + assistant messages after each turn
89+
)
90+
91+
agent = Agent(
92+
chat_generator=OpenAIChatGenerator(model="gpt-4o-mini"),
93+
tools=toolset,
94+
system_prompt="You are a helpful assistant with long-term memory.",
95+
)
96+
97+
# Use toolset.run() instead of agent.run() to get the auto behavior
98+
result = toolset.run(agent, messages=[ChatMessage.from_user("I prefer dark mode")])
99+
```
100+
101+
The one detail to watch is that you have to call `toolset.run(agent, ...)` rather than `agent.run(...)` for the automatic recall/retain to fire. The toolset wraps the agent's run with the pre- and post-turn memory work; bypassing the wrapper bypasses the automation. The three explicit tools are still attached to the agent, so the model can also call them mid-turn if it wants finer control (e.g. `reflect_memory` for a synthesizing question).
102+
103+
This mode is the right fit when you want memory to "just work" without depending on the model's tool-routing.
104+
105+
## The Three Memory Tools
106+
107+
Whichever mode you pick, the same three tools are available.
108+
109+
**`retain_memory`** stores free-text content in the bank. Hindsight extracts structured facts asynchronously after the call returns, so the tool returns quickly and the extraction happens server-side.
110+
111+
**`recall_memory`** searches the bank for content relevant to a query. The result is a ranked set of memories. Budget (`low` / `mid` / `high`) controls how deep the search goes.
112+
113+
**`reflect_memory`** asks Hindsight to synthesize an answer over the bank using an LLM, rather than returning raw memories. Good for "what do we know about X?" style questions where you want a paragraph back, not five separate snippets.
114+
115+
You can drop any of them at construction time:
116+
117+
```python
118+
# Only retain + recall (no reflect)
119+
tools = create_hindsight_tools(
120+
client=client,
121+
bank_id="user-123",
122+
include_reflect=False,
123+
)
124+
```
125+
126+
The same flags exist on `HindsightMemoryWrapper`: `include_retain`, `include_recall`, `include_reflect`. Default for all three is `True`.
127+
128+
## Configuration
129+
130+
For most apps, the same connection settings apply everywhere. `configure()` sets defaults once so every subsequent `create_hindsight_tools()` or `HindsightMemoryWrapper()` call only needs `bank_id`:
131+
132+
```python
133+
from hindsight_haystack import configure
134+
135+
configure(
136+
hindsight_api_url="http://localhost:8888",
137+
api_key="your-api-key",
138+
budget="mid",
139+
tags=["source:haystack"],
140+
context="my-app",
141+
mission="Track user preferences",
142+
)
143+
144+
# Now the bank_id is the only required argument
145+
tools = create_hindsight_tools(bank_id="user-123")
146+
```
147+
148+
You can still pass an explicit `client=` (already-configured `Hindsight` instance) when you need per-tenant routing or custom HTTP settings.
149+
150+
## Setup
151+
152+
You need a Hindsight account and an API key. Hindsight Cloud is the fastest path:
153+
154+
1. **Sign up** at [hindsight.vectorize.io](https://ui.hindsight.vectorize.io/signup). Free tier is enough to try it end to end.
155+
2. **Create an API key** from the dashboard. The format is `hsk_...`.
156+
3. **Point the client** at Cloud:
157+
```python
158+
from hindsight_client import Hindsight
159+
client = Hindsight(
160+
base_url="https://api.hindsight.vectorize.io",
161+
api_key="hsk_your_key",
162+
)
163+
```
164+
Or set the URL and key on `configure()` once and skip the explicit client.
165+
166+
Self-hosting works the same way once you point `base_url` at your local instance (typically `http://localhost:8888`).
167+
168+
## Tradeoffs
169+
170+
**Mode pick is meaningful.** Tools-only mode keeps the agent in control and saves a Hindsight call on turns where memory wasn't relevant. Auto mode is more reliable (every turn recalls and retains) but costs more Hindsight calls and a slightly longer time-to-first-token on each turn. For high-stakes production assistants, auto mode is usually the right default; for cheap RAG-style flows, tools-only is fine.
171+
172+
**Bank routing belongs in your app.** `bank_id` is the routing key for who owns the memory. Most production apps want one bank per user (or per project, or per tenant) consistently. Mixing different users' memories into the same bank works technically but undoes most of the point of persistent memory.
173+
174+
**Retain is asynchronous.** A `retain_memory` call returns when the content lands in the bank, not when the extractor finishes. Facts become recallable within seconds. For chat flows this is fine because the next turn happens after extraction completes. For automated scripts that retain-then-recall in the same run, add a short delay or use Reflect (which doesn't depend on extraction having finished).
175+
176+
## Recap
177+
178+
| | Haystack `Agent` default | With `hindsight-haystack` |
179+
| --- | --- | --- |
180+
| Memory across runs | None | Persistent, per bank |
181+
| Memory setup | Manual context-passing | Three tools or auto-wrapper |
182+
| Cross-tool sharing | n/a | Same bank readable from Claude Code, Cline, Flowise, API |
183+
| Automatic recall before turn | n/a | `auto_recall=True` |
184+
| Automatic retain after turn | n/a | `auto_retain=True` |
185+
| Selective tool surface | n/a | `include_retain/recall/reflect` flags |
186+
| Synthesized answers | n/a | `reflect_memory` returns LLM-synthesized text |
187+
188+
## Next Steps
189+
190+
- **Hindsight Cloud:** [ui.hindsight.vectorize.io](https://ui.hindsight.vectorize.io/signup)
191+
- **Integration docs:** [Haystack + Hindsight](/sdks/integrations/haystack)
192+
- **Source:** [`vectorize-io/hindsight/hindsight-integrations/haystack`](https://github.com/vectorize-io/hindsight/tree/main/hindsight-integrations/haystack)
193+
- **Hindsight API reference:** [API quickstart](/developer/api/quickstart)
754 KB
Loading

0 commit comments

Comments
 (0)