Summary
_process_memory_directory() has a cache mechanism intended to skip unchanged files, but the cache never hits due to a key mismatch between UUID filenames and LLM-generated descriptive titles. This causes every semantic refresh to reprocess all 1600+ memory files via LLM, consuming massive tokens.
Root Cause
Cache lookup (semantic_processor.py:467-474):
file_name = file_path.split("/")[-1] # e.g. "mem_c4a0edcf-11b8-47fc-9c3b-c18fe0d38fb6.md"
if file_path not in changed_files and file_name in existing_summaries:
# cache hit — reuse existing summary
Cache population (_parse_overview_md, semantic_processor.py:917-968):
header_match = re.match(r"^###\s+(.+?)\s*$", line)
# Extracts H3 heading text as key, e.g. "Session Context Management"
The LLM generates descriptive H3 headings like ### Session Context Management, but the lookup uses the actual filename mem_c4a0edcf-...md. These never match → reused 0 cached → all files reprocessed every time.
Contributing Factor
overview_generation.yaml section 4 says to create "One H3 subsection for each file/subdirectory" but does not require using the exact filename as the H3 heading. The LLM is free to write descriptive titles.
Impact
- Every
_process_memory_directory invocation generates LLM summaries for ALL files (O(n) LLM calls where n = total memory files)
- Combined with the 45s dedupe window (
_MEMORY_PARENT_SEMANTIC_DEDUPE_SEC), any active conversation creates an effectively infinite reprocessing loop:
- Processing 1600 files takes 10-30 minutes
- During that time, new memories are written by the compressor
- 45s window expires → new SemanticMsg enqueued
- Previous run finishes → next full run starts immediately
- This was the root cause of the ~20B token consumption incident on 2026-04-05
Suggested Fix
Option A: Sidecar cache file — use an independent .summary_cache.json per directory mapping {filename: {cache_key, summary}}, bypassing the unreliable .overview.md parsing entirely.
Option B: Fix the prompt + parser — require exact filenames in H3 headings in overview_generation.yaml and update _parse_overview_md to extract them reliably.
Option A is more robust as it decouples the cache from LLM output formatting.
Related
Summary
_process_memory_directory()has a cache mechanism intended to skip unchanged files, but the cache never hits due to a key mismatch between UUID filenames and LLM-generated descriptive titles. This causes every semantic refresh to reprocess all 1600+ memory files via LLM, consuming massive tokens.Root Cause
Cache lookup (
semantic_processor.py:467-474):Cache population (
_parse_overview_md,semantic_processor.py:917-968):The LLM generates descriptive H3 headings like
### Session Context Management, but the lookup uses the actual filenamemem_c4a0edcf-...md. These never match →reused 0 cached→ all files reprocessed every time.Contributing Factor
overview_generation.yamlsection 4 says to create "One H3 subsection for each file/subdirectory" but does not require using the exact filename as the H3 heading. The LLM is free to write descriptive titles.Impact
_process_memory_directoryinvocation generates LLM summaries for ALL files (O(n) LLM calls where n = total memory files)_MEMORY_PARENT_SEMANTIC_DEDUPE_SEC), any active conversation creates an effectively infinite reprocessing loop:Suggested Fix
Option A: Sidecar cache file — use an independent
.summary_cache.jsonper directory mapping{filename: {cache_key, summary}}, bypassing the unreliable.overview.mdparsing entirely.Option B: Fix the prompt + parser — require exact filenames in H3 headings in
overview_generation.yamland update_parse_overview_mdto extract them reliably.Option A is more robust as it decouples the cache from LLM output formatting.
Related