|
| 1 | +--- |
| 2 | +title: "What's new in Hindsight 0.8.4" |
| 3 | +description: Multi-LLM failover and round-robin routing, finer recall control, scheduled mental-model refresh, richer token accounting, and a batch of data-integrity fixes in Hindsight 0.8.4 |
| 4 | +authors: [nicoloboschi] |
| 5 | +date: 2026-06-30 |
| 6 | +hide_table_of_contents: true |
| 7 | +tags: [release] |
| 8 | +--- |
| 9 | + |
| 10 | +Hindsight 0.8.4 builds on [0.8.3](/blog/2026/06/18/version-0-8-3) with a focus on **reliability and control**: keep memory operations running when a provider degrades with **multi-LLM failover and round-robin routing**, tune retrieval more precisely with **finer recall controls**, keep knowledge current with **scheduled mental-model refresh**, and get **more accurate token accounting**. It also lands a batch of **data-integrity and robustness fixes**. **Self-managed deployments should upgrade.** |
| 11 | + |
| 12 | +<!-- truncate --> |
| 13 | + |
| 14 | +<video controls muted loop playsinline width="100%" style={{borderRadius: "8px"}}> |
| 15 | + |
| 16 | + <source src="/img/blog/release084/hindsight-0-8-4-release-notes.mp4" type="video/mp4" /> |
| 17 | +</video> |
| 18 | + |
| 19 | +- [**Multi-LLM Failover & Routing**](#multi-llm-failover--routing): Survive provider outages with failover and round-robin across LLMs. |
| 20 | +- [**Finer Recall Control**](#finer-recall-control): Per-stage scores, two-level score filtering, configurable recency decay, and observation-aware dedup. |
| 21 | +- [**Scheduled Mental-Model Refresh**](#scheduled-mental-model-refresh): Keep consolidated knowledge fresh on a cron schedule. |
| 22 | +- [**Sharper LLM Control & Accounting**](#sharper-llm-control--accounting): Per-operation temperature, per-scope timeouts, and richer token usage. |
| 23 | +- [**Data-Integrity & Robustness Fixes**](#data-integrity--robustness-fixes): Why you should upgrade. |
| 24 | + |
| 25 | +## Multi-LLM Failover & Routing |
| 26 | + |
| 27 | +You can now configure **multiple LLMs and have Hindsight route across them automatically**, with **failover** when one provider errors and **round-robin** to spread load. A single provider outage or rate-limit no longer has to stall memory operations. |
| 28 | + |
| 29 | +Each member of a multi-LLM configuration can be tuned individually — including **LiteLLM Router settings**, **Vertex AI service account keys**, and **per-member Vertex project and region** — so you can mix providers and accounts with the controls each one needs. |
| 30 | + |
| 31 | +This release also adds two new OpenAI-compatible providers, **Requesty** and **Atlas Cloud**, so you can point Hindsight at them directly. |
| 32 | + |
| 33 | +## Finer Recall Control |
| 34 | + |
| 35 | +Several changes give you more precise control over what recall returns and how it's ranked: |
| 36 | + |
| 37 | +- **Per-stage scores and two-level filtering.** Recall now exposes structured per-stage scores, and `min_scores` supports two-level filtering — so you can require minimum relevance at each retrieval stage instead of a single blunt cutoff. (Now correctly threaded through the maintained Python and TypeScript SDK wrappers, too.) |
| 38 | +- **Configurable recency decay.** Choose how strongly recency influences ranking — `linear`, `exponential`, or `none`. |
| 39 | +- **Observation-aware dedup.** A new `prefer_observations` option drops raw facts that have already been superseded by consolidated observations, so results favor the synthesized view. |
| 40 | +- **Exact filtering of global observations.** You can now exactly filter untagged/global observations. |
| 41 | + |
| 42 | +## Scheduled Mental-Model Refresh |
| 43 | + |
| 44 | +Mental models can now be **refreshed on a cron schedule**. Instead of relying solely on activity-triggered refreshes, you can keep a bank's consolidated knowledge current on a cadence you define. |
| 45 | + |
| 46 | +## Sharper LLM Control & Accounting |
| 47 | + |
| 48 | +- **Per-operation temperature.** Set different temperatures per operation for tighter control over response style across retain, recall, and reflect. |
| 49 | +- **Per-scope timeouts and retries.** Per-scope LLM timeout and retry policies are now applied consistently across providers. |
| 50 | +- **Richer token accounting.** Usage totals now include cached and "thoughts" tokens where supported, and reasoning tokens are tracked for OpenAI-compatible providers — for more accurate cost reporting. |
| 51 | +- **More reliable structured output.** Anthropic strict structured output now goes through forced tool use, and reflect's structured-output retries are capped to avoid runaway retry loops. |
| 52 | +- **Configurable upload size.** The maximum upload size is now configurable in the control plane. |
| 53 | +- **Faster, more scalable bank stats.** Bank statistics now scale to large deployments, and an on-demand refresh lets you force an exact recount when you need it. |
| 54 | +- **Full memory details in the explorer.** The control plane's memory explorer now shows the complete details of each memory, making inspection and debugging easier. |
| 55 | + |
| 56 | +## Data-Integrity & Robustness Fixes |
| 57 | + |
| 58 | +The reason to upgrade — a set of fixes that protect what gets stored and keep background work healthy: |
| 59 | + |
| 60 | +- **Append-mode chunking.** Retain append mode now merges JSON arrays so conversation-aware chunking is preserved. |
| 61 | +- **Observation search vectors.** Search vectors are now maintained on observation insert/update, keeping search and consolidation correct. |
| 62 | +- **Consolidation safety.** Consolidation now keeps items by default when a dedup action is missing, preventing unintended drops. |
| 63 | +- **Migration bootstrap.** Migration bootstrap now respects the configured vector extension. |
| 64 | +- **Accurate bank stats.** Bank stats cache is correctly invalidated after deletes and clears, so counts stay accurate. |
| 65 | +- **Graph maintenance deadlocks.** Concurrent inserts no longer trigger enqueue deadlocks during graph maintenance. |
| 66 | + |
| 67 | +Other notable fixes: |
| 68 | + |
| 69 | +- **Token usage on parse failure.** Provider token usage is now preserved even when tool-call argument parsing or validation fails, so cost reporting stays accurate. |
| 70 | +- **Reflect JSON envelopes.** Reflect now unwraps JSON-wrapped answers returned by some models. |
| 71 | +- **Cleaner error paths.** List endpoints reject negative `limit`/`offset` with a `422` instead of a server error, async operations return `404` when a bank doesn't exist, and `PATCH bank` / dry-run extract no longer create banks unintentionally. |
| 72 | +- **Retain error messages.** Retain error summaries now preserve the underlying exception message for easier debugging. |
0 commit comments