Skip to content

Commit 6c51bce

Browse files
cdeustclaude
andcommitted
fix(verif): consolidation cadence uses ingested_at instead of wall-clock created_at
Production-relevant bug surfaced by LoCoMo smoke (tasks/e1-v3-locomo-smoke-finding.md): backfilled memories with backdated created_at (e.g. a 2023 conversation imported in 2026 wall-clock) trigger immediate gist/tag compression on the first consolidation pass because (now - created_at) >> 7-day gist gate. The intended semantics is "memory has had time to be revisited in MY system" — elapsed since ingest, not elapsed since the original event. Changes: - pg_schema.py: add memories.ingested_at TIMESTAMPTZ NOT NULL DEFAULT NOW() in MEMORIES_DDL (fresh DBs) plus an idempotent migration block that ALTERs existing tables and backfills ingested_at = created_at for legacy rows. Comment kept semicolon-free (df14e16, 9f94bd3). - pg_store.py: row normalizer surfaces ingested_at as ISO string. INSERT inherits DEFAULT NOW() automatically. - compression.py: cadence gate reads ingested_at (fallback to created_at for in-memory dicts that never round-tripped through PG). - decay_cycle.py: ACT-R lifetime L is ingest-relative; _hours_since_access fallback chain now last_accessed -> ingested_at -> created_at. - write_post_store.py, write_gate.py: synaptic-tagging window and temporal-novelty signal both ask "in MY system" — also use ingested_at. - Tests: regression in test_compression.py (backdated created_at + fresh ingested_at must stay level 0), in test_decay_cycle.py (ACT-R does not collapse on backfill), and a schema-shape test test_pg_ingested_at.py asserting the column declaration, migration guard, backfill statement, and semicolon-free comments. Audit: synaptic_*, microglial_pruning, cascade_* contain no wall-clock cadence reads (none of them use datetime.now() / created_at). The PL/pgSQL effective_heat() and recall_memories() functions correctly use heat_base_set_at / last_accessed (last_accessed -> created_at fallback is for retrieval recency ranking, which IS event-time semantics — not cadence — and is left unchanged). replay_execution.py and sleep_compute.py sort by created_at for narrative/temporal ordering — correct, unchanged. Verification: - Full suite: 2656 passed. - LoCoMo smoke (--with-consolidation, --limit 1): MRR 0.866 (matches baseline non-consolidation MRR exactly; pre-fix this collapsed to 0.222 because immediate compression destroyed retrievable content). Sources: Anderson & Lebiere (1998) ACT-R Eq. 4.4 — lifetime L is elapsed time in the learner's system since acquisition, not elapsed time since the original-source event. Tse et al. (2007) and McClelland et al. (1995) describe consolidation timescales relative to acquisition in the system, not relative to the original-event date. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b68c5ac commit 6c51bce

9 files changed

Lines changed: 305 additions & 27 deletions

File tree

mcp_server/core/compression.py

Lines changed: 32 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -31,18 +31,34 @@
3131
_CAMELCASE_RE = re.compile(r"\b[A-Z][a-z]+(?:[A-Z][a-z]+)+\b")
3232

3333

34-
def _parse_created_at(memory: dict) -> datetime | None:
35-
"""Parse created_at from memory, returning None on failure."""
36-
created_at_str = memory.get("created_at", "")
37-
if not created_at_str:
38-
return None
39-
try:
40-
created_at = datetime.fromisoformat(created_at_str)
41-
except (ValueError, TypeError):
34+
def _parse_ingested_at(memory: dict) -> datetime | None:
35+
"""Parse ingest timestamp for cadence reasoning.
36+
37+
Compression cadence asks "has this memory had time to be revisited
38+
in MY system" — that is elapsed time since ingest, NOT elapsed time
39+
since the original event. Backfilled / imported memories carry a
40+
backdated created_at (e.g. a 2023 conversation imported in 2026);
41+
using created_at would compress them on the first consolidation
42+
pass, before retrieval ever runs (see tasks/e1-v3-locomo-smoke-finding.md).
43+
44+
Falls back to created_at for legacy rows that predate the
45+
ingested_at column (the schema migration in pg_schema.py backfills
46+
ingested_at = created_at in that case anyway, so the fallback only
47+
matters for in-memory dicts that never round-tripped through PG).
48+
"""
49+
raw = memory.get("ingested_at") or memory.get("created_at", "")
50+
if not raw:
4251
return None
43-
if created_at.tzinfo is None:
44-
created_at = created_at.replace(tzinfo=timezone.utc)
45-
return created_at
52+
if isinstance(raw, datetime):
53+
dt = raw
54+
else:
55+
try:
56+
dt = datetime.fromisoformat(raw)
57+
except (ValueError, TypeError):
58+
return None
59+
if dt.tzinfo is None:
60+
dt = dt.replace(tzinfo=timezone.utc)
61+
return dt
4662

4763

4864
def _compute_resistance(memory: dict) -> float:
@@ -74,11 +90,13 @@ def get_compression_schedule(
7490
if memory.get("store_type", "episodic") == "semantic":
7591
return 0
7692

77-
created_at = _parse_created_at(memory)
78-
if created_at is None:
93+
ingested_at = _parse_ingested_at(memory)
94+
if ingested_at is None:
7995
return 0
8096

81-
hours_elapsed = (datetime.now(timezone.utc) - created_at).total_seconds() / 3600.0
97+
# Cadence is measured from ingest, not from the original event.
98+
# Source: tasks/e1-v3-locomo-smoke-finding.md.
99+
hours_elapsed = (datetime.now(timezone.utc) - ingested_at).total_seconds() / 3600.0
82100
resistance = _compute_resistance(memory)
83101

84102
if hours_elapsed < gist_age_hours * resistance:

mcp_server/core/decay_cycle.py

Lines changed: 43 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,19 @@ def _parse_datetime(value) -> datetime | None:
7575

7676

7777
def _hours_since_access(mem: dict, now: datetime) -> float | None:
78-
"""Return hours elapsed since last access, or None if unparseable."""
79-
last_accessed = mem.get("last_accessed", mem.get("created_at", ""))
78+
"""Return hours elapsed since last access, or None if unparseable.
79+
80+
Fallback chain: last_accessed → ingested_at → created_at. The
81+
ingested_at fallback (added for the consolidation-cadence fix —
82+
tasks/e1-v3-locomo-smoke-finding.md) ensures backfilled memories
83+
with a backdated created_at do not falsely register as
84+
"last-accessed years ago" when last_accessed is missing.
85+
"""
86+
last_accessed = (
87+
mem.get("last_accessed")
88+
or mem.get("ingested_at")
89+
or mem.get("created_at", "")
90+
)
8091
last_dt = _parse_datetime(last_accessed)
8192
if last_dt is None:
8293
return None
@@ -85,12 +96,25 @@ def _hours_since_access(mem: dict, now: datetime) -> float | None:
8596

8697

8798
def _hours_since_creation(mem: dict, now: datetime) -> float | None:
88-
"""Return hours elapsed since creation."""
89-
created = mem.get("created_at", "")
90-
created_dt = _parse_datetime(created)
91-
if created_dt is None:
99+
"""Return hours elapsed since the memory entered THIS system.
100+
101+
ACT-R "lifetime L" in B_i = ln(n) − d·ln(L) is lifetime in the
102+
learner's system since acquisition (Anderson & Lebiere 1998), not
103+
elapsed time since the original-source event. For Cortex this is
104+
``ingested_at``: backfilled / imported memories with backdated
105+
``created_at`` would otherwise return a wrongly-large L on first
106+
decay pass and collapse heat to near-zero before any access.
107+
Source: tasks/e1-v3-locomo-smoke-finding.md.
108+
109+
Falls back to ``created_at`` only for in-memory dicts that never
110+
round-tripped through PG (schema backfills ingested_at=created_at
111+
for legacy rows, so PG-sourced dicts always have the field).
112+
"""
113+
raw = mem.get("ingested_at") or mem.get("created_at", "")
114+
ingested_dt = _parse_datetime(raw)
115+
if ingested_dt is None:
92116
return None
93-
hours = (now - created_dt).total_seconds() / 3600.0
117+
hours = (now - ingested_dt).total_seconds() / 3600.0
94118
return max(_MIN_LIFETIME_HOURS, hours)
95119

96120

@@ -263,8 +287,18 @@ def compute_decay_updates(
263287

264288

265289
def _parse_hours_since_access(record: dict, now: datetime) -> float | None:
266-
"""Parse hours since last access from a memory or entity record."""
267-
last_accessed = record.get("last_accessed", record.get("created_at", ""))
290+
"""Parse hours since last access from a memory or entity record.
291+
292+
Fallback chain mirrors _hours_since_access: last_accessed →
293+
ingested_at → created_at. Entities do not currently carry
294+
ingested_at, so this is a no-op for entity records but defensive
295+
for memory dicts that take this path.
296+
"""
297+
last_accessed = (
298+
record.get("last_accessed")
299+
or record.get("ingested_at")
300+
or record.get("created_at", "")
301+
)
268302
last_dt = _parse_datetime(last_accessed)
269303
if last_dt is None:
270304
return None

mcp_server/core/write_gate.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,14 @@ def compute_temporal_novelty(
7878
best_idx = similarities.index(max(similarities))
7979
if best_idx < len(vec_hits):
8080
best_mem = get_memory(vec_hits[best_idx][0])
81-
if best_mem and best_mem.get("created_at"):
82-
hours = _parse_hours_since(best_mem["created_at"])
81+
# Temporal novelty asks "have we seen this in MY system
82+
# recently?" — that is elapsed-since-ingest, not
83+
# elapsed-since-the-original-event. Use ingested_at and fall
84+
# back to created_at for legacy rows.
85+
# Source: tasks/e1-v3-locomo-smoke-finding.md.
86+
if best_mem and (best_mem.get("ingested_at") or best_mem.get("created_at")):
87+
ts = best_mem.get("ingested_at") or best_mem["created_at"]
88+
hours = _parse_hours_since(ts)
8389
return _compute_temporal_novelty(hours)
8490

8591

mcp_server/core/write_post_store.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,14 @@ def _build_tagging_candidates(
131131
if mem["id"] == exclude_id:
132132
continue
133133
mem_ents = _find_shared_entities(mem["id"], entity_names, store)
134-
hours_ago = _hours_since_creation(mem.get("created_at", ""))
134+
# Synaptic-tagging window is cadence-relative to ingest time, not
135+
# original-event time. For backfilled memories with a backdated
136+
# created_at this prevents the tagging window from collapsing
137+
# immediately on first consolidation pass.
138+
# Source: tasks/e1-v3-locomo-smoke-finding.md.
139+
hours_ago = _hours_since_creation(
140+
mem.get("ingested_at") or mem.get("created_at", "")
141+
)
135142
candidates.append(
136143
{
137144
"id": mem["id"],

mcp_server/infrastructure/pg_schema.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
domain TEXT DEFAULT '',
2929
directory_context TEXT DEFAULT '',
3030
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
31+
ingested_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
3132
last_accessed TIMESTAMPTZ NOT NULL DEFAULT NOW(),
3233
heat_base REAL NOT NULL DEFAULT 1.0
3334
CHECK (heat_base >= 0.0 AND heat_base <= 1.0),
@@ -1283,6 +1284,33 @@
12831284
END IF;
12841285
END $$;
12851286
1287+
-- Migration: add ingested_at for consolidation cadence reasoning.
1288+
-- Source: tasks/e1-v3-locomo-smoke-finding.md.
1289+
-- created_at = original event/utterance time (may be backdated on import).
1290+
-- ingested_at = when the row entered THIS Cortex DB (always NOW at insert).
1291+
-- Compression and decay cadence MUST use ingested_at: the mechanism asks
1292+
-- "has this memory had time to be revisited in MY system" not "when did
1293+
-- the original event happen". Backfill existing rows from created_at to
1294+
-- preserve idempotency and pre-existing semantics.
1295+
-- NOTE keep this comment free of semicolons (DDL is split on the literal
1296+
-- character per _split_statements — df14e16 and 9f94bd3 are prior
1297+
-- incidents of that class).
1298+
DO $$
1299+
BEGIN
1300+
IF NOT EXISTS (
1301+
SELECT 1 FROM information_schema.columns
1302+
WHERE table_name = 'memories' AND column_name = 'ingested_at'
1303+
) THEN
1304+
ALTER TABLE memories ADD COLUMN ingested_at TIMESTAMPTZ NOT NULL DEFAULT NOW();
1305+
-- Backfill rows that pre-existed this column. They were created
1306+
-- before ingested_at was tracked, so the safest assumption is
1307+
-- ingested_at = created_at (i.e., they entered the system at the
1308+
-- time their content was authored). This block runs only inside
1309+
-- the IF NOT EXISTS guard, so it is naturally idempotent.
1310+
UPDATE memories SET ingested_at = created_at;
1311+
END IF;
1312+
END $$;
1313+
12861314
-- Migration: persist arousal and dominant_emotion from emotional tagging
12871315
DO $$ BEGIN
12881316
IF NOT EXISTS (SELECT 1 FROM information_schema.columns WHERE table_name='memories' AND column_name='arousal')

mcp_server/infrastructure/pg_store.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -768,7 +768,7 @@ def _normalize_memory_row(self, row: dict[str, Any]) -> dict[str, Any]:
768768
except (json.JSONDecodeError, TypeError):
769769
d["tags"] = []
770770
# Convert datetime to ISO string for compatibility
771-
for field in ("created_at", "last_accessed", "last_reconsolidated"):
771+
for field in ("created_at", "ingested_at", "last_accessed", "last_reconsolidated"):
772772
if isinstance(d.get(field), datetime):
773773
d[field] = d[field].isoformat()
774774
return d

tests_py/core/test_compression.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,56 @@ def test_no_created_at(self):
6060
def test_invalid_timestamp(self):
6161
assert get_compression_schedule({"created_at": "invalid"}) == 0
6262

63+
# ── Cadence relative to ingest, not original event ──────────────────
64+
# Source: tasks/e1-v3-locomo-smoke-finding.md.
65+
66+
def test_backdated_created_at_with_fresh_ingest_stays_level_zero(self):
67+
"""Regression: backfilled memories must NOT compress immediately.
68+
69+
A user imports a 2023 conversation in 2026. created_at is the
70+
original event time (years ago). ingested_at is now. The cadence
71+
gate must read ingested_at — otherwise the memory gist-compresses
72+
on the first consolidation pass before retrieval ever runs
73+
(LoCoMo MRR collapse from 0.866 to 0.222 surfaced this bug).
74+
"""
75+
years_ago = (datetime.now(timezone.utc) - timedelta(days=365 * 3)).isoformat()
76+
now = datetime.now(timezone.utc).isoformat()
77+
mem = {
78+
"created_at": years_ago,
79+
"ingested_at": now,
80+
"importance": 0.3,
81+
"store_type": "episodic",
82+
}
83+
# Fresh ingest → level 0 even though created_at says 3 years old.
84+
assert get_compression_schedule(mem) == 0
85+
86+
def test_fresh_created_at_old_ingest_compresses(self):
87+
"""Inverse: an old ingest with a recent created_at still compresses.
88+
89+
Demonstrates that ingested_at — not created_at — is the cadence
90+
signal. (Pathological dict; real INSERTs always set ingested_at
91+
to NOW(), but the contract must hold.)
92+
"""
93+
now = datetime.now(timezone.utc).isoformat()
94+
sixty_days_ago = (datetime.now(timezone.utc) - timedelta(days=60)).isoformat()
95+
mem = {
96+
"created_at": now,
97+
"ingested_at": sixty_days_ago,
98+
"importance": 0.3,
99+
"store_type": "episodic",
100+
}
101+
assert get_compression_schedule(mem) == 2
102+
103+
def test_missing_ingested_at_falls_back_to_created_at(self):
104+
"""Legacy rows (pre-migration) lack ingested_at → fall back."""
105+
sixty_days_ago = (datetime.now(timezone.utc) - timedelta(days=60)).isoformat()
106+
mem = {
107+
"created_at": sixty_days_ago,
108+
"importance": 0.3,
109+
"store_type": "episodic",
110+
}
111+
assert get_compression_schedule(mem) == 2
112+
63113

64114
class TestExtractGist:
65115
def test_short_content_preserved(self):

tests_py/core/test_decay_cycle.py

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,3 +140,66 @@ def test_cold_entities_skipped(self):
140140
entities = [{"id": 1, "heat": 0.01, "last_accessed": now.isoformat()}]
141141
updates = compute_entity_decay(entities, now=now, cold_threshold=0.05)
142142
assert len(updates) == 0
143+
144+
145+
class TestIngestRelativeCadence:
146+
"""Regression: cadence reasoning uses ingested_at, not created_at.
147+
148+
Source: tasks/e1-v3-locomo-smoke-finding.md.
149+
"""
150+
151+
def test_adaptive_decay_uses_ingested_at_not_created_at(self):
152+
"""Backfilled memory with backdated created_at must NOT collapse.
153+
154+
ACT-R lifetime L = elapsed time since acquisition by THIS system.
155+
For a 3-year-old created_at but a fresh ingested_at, L should be
156+
~hours (just-ingested), not ~3-years. Otherwise the ACT-R base
157+
level B = ln(n) - 0.5*ln(L) sinks far below threshold and heat
158+
collapses to near-zero on first decay pass.
159+
"""
160+
now = datetime.now(timezone.utc)
161+
years_ago = (now - timedelta(days=365 * 3)).isoformat()
162+
# Just-ingested 5 minutes ago.
163+
fresh_ingest = (now - timedelta(minutes=5)).isoformat()
164+
mems = [
165+
{
166+
"id": 1,
167+
"heat": 0.8,
168+
"created_at": years_ago,
169+
"ingested_at": fresh_ingest,
170+
"last_accessed": fresh_ingest,
171+
"access_count": 1,
172+
"importance": 0.5,
173+
"emotional_valence": 0.0,
174+
"confidence": 1.0,
175+
}
176+
]
177+
updates = compute_decay_updates(mems, now=now, adaptive_decay=True)
178+
# If cadence used created_at, lifetime would be ~26 280 hours and
179+
# heat would collapse to near-zero (<<0.1). With ingested_at it
180+
# stays close to its initial 0.8.
181+
if updates:
182+
new_heat = updates[0][1]
183+
assert new_heat > 0.4, (
184+
f"freshly-ingested backdated memory collapsed to {new_heat}"
185+
)
186+
187+
def test_legacy_dict_falls_back_to_created_at(self):
188+
"""Pre-migration rows (no ingested_at) keep the old behaviour."""
189+
now = datetime.now(timezone.utc)
190+
old = (now - timedelta(hours=48)).isoformat()
191+
mems = [
192+
{
193+
"id": 1,
194+
"heat": 1.0,
195+
"last_accessed": old,
196+
"created_at": old,
197+
"importance": 0.3,
198+
"emotional_valence": 0.0,
199+
"confidence": 1.0,
200+
}
201+
]
202+
updates = compute_decay_updates(mems, now=now)
203+
# Decays normally — fallback chain kept intact.
204+
assert len(updates) == 1
205+
assert updates[0][1] < 1.0

0 commit comments

Comments
 (0)