vectorize-io
diff --git a/‎hindsight-docs/blog/2026-06-30-version-0-8-4.md‎
Lines changed: 72 additions & 0 deletions b/‎hindsight-docs/blog/2026-06-30-version-0-8-4.md‎
Lines changed: 72 additions & 0 deletions
@@ -0,0 +1,72 @@
+---
+title: "What's new in Hindsight 0.8.4"
+description: Multi-LLM failover and round-robin routing, finer recall control, scheduled mental-model refresh, richer token accounting, and a batch of data-integrity fixes in Hindsight 0.8.4
+authors: [nicoloboschi]
+date: 2026-06-30
+hide_table_of_contents: true
+tags: [release]
+---
+
+Hindsight 0.8.4 builds on [0.8.3](/blog/2026/06/18/version-0-8-3) with a focus on **reliability and control**: keep memory operations running when a provider degrades with **multi-LLM failover and round-robin routing**, tune retrieval more precisely with **finer recall controls**, keep knowledge current with **scheduled mental-model refresh**, and get **more accurate token accounting**. It also lands a batch of **data-integrity and robustness fixes**. **Self-managed deployments should upgrade.**
+
+<!-- truncate -->
+
+<video controls muted loop playsinline width="100%" style={{borderRadius: "8px"}}>
+
+  <source src="/img/blog/release084/hindsight-0-8-4-release-notes.mp4" type="video/mp4" />
+</video>
+
+- [**Multi-LLM Failover & Routing**](#multi-llm-failover--routing): Survive provider outages with failover and round-robin across LLMs.
+- [**Finer Recall Control**](#finer-recall-control): Per-stage scores, two-level score filtering, configurable recency decay, and observation-aware dedup.
+- [**Scheduled Mental-Model Refresh**](#scheduled-mental-model-refresh): Keep consolidated knowledge fresh on a cron schedule.
+- [**Sharper LLM Control & Accounting**](#sharper-llm-control--accounting): Per-operation temperature, per-scope timeouts, and richer token usage.
+- [**Data-Integrity & Robustness Fixes**](#data-integrity--robustness-fixes): Why you should upgrade.
+
+## Multi-LLM Failover & Routing
+
+You can now configure **multiple LLMs and have Hindsight route across them automatically**, with **failover** when one provider errors and **round-robin** to spread load. A single provider outage or rate-limit no longer has to stall memory operations.
+
+Each member of a multi-LLM configuration can be tuned individually — including **LiteLLM Router settings**, **Vertex AI service account keys**, and **per-member Vertex project and region** — so you can mix providers and accounts with the controls each one needs.
+
+This release also adds two new OpenAI-compatible providers, **Requesty** and **Atlas Cloud**, so you can point Hindsight at them directly.
+
+## Finer Recall Control
+
+Several changes give you more precise control over what recall returns and how it's ranked:
+
+- **Per-stage scores and two-level filtering.** Recall now exposes structured per-stage scores, and `min_scores` supports two-level filtering — so you can require minimum relevance at each retrieval stage instead of a single blunt cutoff. (Now correctly threaded through the maintained Python and TypeScript SDK wrappers, too.)
+- **Configurable recency decay.** Choose how strongly recency influences ranking — `linear`, `exponential`, or `none`.
+- **Observation-aware dedup.** A new `prefer_observations` option drops raw facts that have already been superseded by consolidated observations, so results favor the synthesized view.
+- **Exact filtering of global observations.** You can now exactly filter untagged/global observations.
+
+## Scheduled Mental-Model Refresh
+
+Mental models can now be **refreshed on a cron schedule**. Instead of relying solely on activity-triggered refreshes, you can keep a bank's consolidated knowledge current on a cadence you define.
+
+## Sharper LLM Control & Accounting
+
+- **Per-operation temperature.** Set different temperatures per operation for tighter control over response style across retain, recall, and reflect.
+- **Per-scope timeouts and retries.** Per-scope LLM timeout and retry policies are now applied consistently across providers.
+- **Richer token accounting.** Usage totals now include cached and "thoughts" tokens where supported, and reasoning tokens are tracked for OpenAI-compatible providers — for more accurate cost reporting.
+- **More reliable structured output.** Anthropic strict structured output now goes through forced tool use, and reflect's structured-output retries are capped to avoid runaway retry loops.
+- **Configurable upload size.** The maximum upload size is now configurable in the control plane.
+- **Faster, more scalable bank stats.** Bank statistics now scale to large deployments, and an on-demand refresh lets you force an exact recount when you need it.
+- **Full memory details in the explorer.** The control plane's memory explorer now shows the complete details of each memory, making inspection and debugging easier.
+
+## Data-Integrity & Robustness Fixes
+
+The reason to upgrade — a set of fixes that protect what gets stored and keep background work healthy:
+
+- **Append-mode chunking.** Retain append mode now merges JSON arrays so conversation-aware chunking is preserved.
+- **Observation search vectors.** Search vectors are now maintained on observation insert/update, keeping search and consolidation correct.
+- **Consolidation safety.** Consolidation now keeps items by default when a dedup action is missing, preventing unintended drops.
+- **Migration bootstrap.** Migration bootstrap now respects the configured vector extension.
+- **Accurate bank stats.** Bank stats cache is correctly invalidated after deletes and clears, so counts stay accurate.
+- **Graph maintenance deadlocks.** Concurrent inserts no longer trigger enqueue deadlocks during graph maintenance.
+
+Other notable fixes:
+
+- **Token usage on parse failure.** Provider token usage is now preserved even when tool-call argument parsing or validation fails, so cost reporting stays accurate.
+- **Reflect JSON envelopes.** Reflect now unwraps JSON-wrapped answers returned by some models.
+- **Cleaner error paths.** List endpoints reject negative `limit`/`offset` with a `422` instead of a server error, async operations return `404` when a bank doesn't exist, and `PATCH bank` / dry-run extract no longer create banks unintentionally.
+- **Retain error messages.** Retain error summaries now preserve the underlying exception message for easier debugging.