fix(mental-models): cap history array length to prevent jsonb overflow by cdbartholomew · Pull Request #1593 · vectorize-io/hindsight

cdbartholomew · 2026-05-12T13:44:26Z

Problem

Each content-changing update to a mental model appends a full snapshot (previous_content + previous_reflect_response + changed_at) to the mental_models.history jsonb array. Two compounding issues:

1. Unbounded growth → jsonb 256 MB overflow. Without a cap, the array grows forever. PostgreSQL has a hard 256 MB limit on total jsonb array element size; once a row crosses it, every subsequent UPDATE fails:

ERROR: total size of jsonb array elements exceeds the maximum of 268435455 bytes
SQLSTATE: 54000

The mental model is permanently un-writable until the history column is manually trimmed at the DB level. Reachable in normal use: with reflect responses on the order of hundreds of KB and a workload that refreshes a small set of mental models repeatedly, the limit is hit in a few hundred refreshes. Once one tenant's row lands in this state, repeated UPDATE attempts materialize the 256 MB+ array on every retry — significant memory pressure on the primary, degraded availability for unrelated tenants on the same instance.

2. Per-update bloat → no HOT updates, persistent dead-tuple churn. Even with a cap in place, storing the full reflect_response payload in each snapshot pushes per-row size to ~22 MB at cap=50. That exceeds heap-page fit, so every UPDATE writes a full TOAST row and skips HOT. Every refresh leaves a ~22 MB dead tuple that has to be vacuumed; under sustained refresh load the dead-tuple backlog outruns autovacuum and the table balloons (observed: 570 MB / 43% dead tuples on a single tenant's mental_models table).

Fix

(a) Cap history length at write time. Trim to the most recent N entries via a single subquery on COALESCE(history, '[]'::jsonb) || $new::jsonb:

history = (
  SELECT COALESCE(jsonb_agg(elem ORDER BY idx), '[]'::jsonb)
  FROM jsonb_array_elements(
    COALESCE(history, '[]'::jsonb) || $new::jsonb
  ) WITH ORDINALITY a(elem, idx)
  WHERE idx > GREATEST(
    jsonb_array_length(COALESCE(history, '[]'::jsonb)) + 1 - N, 0
  )
)

New env var HINDSIGHT_API_MENTAL_MODEL_HISTORY_MAX_ENTRIES controls N. Default 50 — well under the 256 MB ceiling even with hundreds-of-KB reflect responses, while preserving enough recent history for audit / rollback.

(b) Slim each history entry to {based_on: ...} only. The control-plane history view (mental-model-detail-modal.tsx) only reads previous_reflect_response.based_on; every other field in the reflect payload is unused. Store just that slice — per-entry size drops ~100x, rows fit on a heap page, HOT updates re-enable, dead tuples self-clean.

Existing bulky rows rotate out naturally via the cap=50 ring buffer; no migration needed.

What this PR does NOT do

Rows already over the 256 MB ceiling pre-fix need a one-shot manual trim of their history column at the DB level. The SQL-side append in this PR cannot heal a row whose existing history is already too large to materialize in the jsonb engine — evaluating history || $new itself raises 54000. After the manual trim, this fix prevents recurrence.

A standalone migration that trims existing rows is intentionally NOT included here: the right cap depends on per-deployment tolerance and the migration would block on ACCESS EXCLUSIVE while trimming potentially-huge rows. Operators should do this as a targeted one-time DML if they've observed the issue.

Test plan

test_history_capped_to_max_entries: with max_entries=3, six content updates produce a 3-element history (most recent first: v5, v4, v3 — v1 and v2 dropped).
test_history_snapshots_previous_reflect_response: assertions updated to expect the slim {based_on: ...} shape.
test_history_snapshots_omit_reflect_response_when_based_on_missing: covers reflect payloads with no based_on field (stored as None) and with based_on: {} (stored as {based_on: {}}).
Existing history tests cover unchanged ordering, snapshot, and gating behaviors — semantics untouched for short histories.
Lint clean on modified files.
Env-var override verified: HINDSIGHT_API_MENTAL_MODEL_HISTORY_MAX_ENTRIES=7 reads as 7.
CI green

Each content-changing update to a mental model appends a full snapshot (previous_content + previous_reflect_response + changed_at) to the `mental_models.history` jsonb array. Without a cap the array grows unboundedly. Postgres has a hard 256MB limit on the total size of jsonb array elements; once a row crosses it, every subsequent UPDATE to that row fails with SQLSTATE 54000 ("total size of jsonb array elements exceeds the maximum of 268435455 bytes") — the mental model becomes permanently un-writable until the history is manually trimmed at the DB level. This is reachable in normal use: with reflect responses on the order of hundreds of KB (common when the bank has many memories) and a workload that refreshes a small set of mental models repeatedly, the limit is hit in a few hundred refreshes. Fix --- Trim history to the most recent N entries at write time. The append becomes a single subquery that takes the last N elements of `COALESCE(history, '[]'::jsonb) || $new::jsonb` ordered by their array index. New env var `HINDSIGHT_API_MENTAL_MODEL_HISTORY_MAX_ENTRIES` controls N; default 50 (well under the 256MB ceiling even with large reflect responses, while preserving enough recent history for audit / rollback). Rows already over the limit pre-fix need a one-shot manual trim of their `history` column — the SQL-side append in this PR cannot heal a row whose existing `history` is already too large to materialize in the jsonb engine, because evaluating `history || $new` itself raises 54000. After the manual trim, this fix prevents recurrence. Tests ----- New `test_history_capped_to_max_entries`: with max_entries=3, six content updates produce a 3-element history (most recent first: v5, v4, v3 — v1 and v2 dropped). Existing history tests cover the unchanged ordering, snapshot, and gating behaviors. Docs ---- New row in `configuration.md`.

Each history entry previously stored the full reflect_response payload (~400-500 KB), pushing per-row size to ~22 MB at the cap. That exceeds heap-page fit, so every UPDATE writes a full TOAST row and skips HOT, leaving a dead tuple that must be vacuumed. The control-plane history view only reads previous_reflect_response.based_on; everything else in the payload is unused. Store just that slice — per-entry size drops ~100x, rows fit on a heap page, HOT updates re-enable, dead tuples self-clean. Existing bulky rows rotate out naturally via the cap=50 ring buffer.

cdbartholomew requested review from benfrank241 and nicoloboschi May 12, 2026 13:46

benfrank241 approved these changes May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mental-models): cap history array length to prevent jsonb overflow#1593

fix(mental-models): cap history array length to prevent jsonb overflow#1593
cdbartholomew wants to merge 2 commits into
mainfrom
fix/mental-model-history-cap

cdbartholomew commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cdbartholomew commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

What this PR does NOT do

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cdbartholomew commented May 12, 2026 •

edited

Loading