Replies: 2 comments
-
|
All of this sounds good to me! How about we:
Anyway, in terms of priorities, I think union find, followed by the the compression eval harness would be my preference, but I'm open to discussing this if you have other thoughts! Thanks for putting together the union find PR, and driving improvements to compression! |
Beta Was this translation helpful? Give feedback.
-
|
For the eval harness, I would measure more than compression ratio and summary quality. Union-find compaction changes the structure of memory, so the key risk is whether provenance, conflict boundaries, and recovery semantics survive the compaction step. Useful eval cases: two clusters that look semantically close but contain contradictory facts, a stale fact superseded by a later fact, a short-lived secret that should be tombstoned, and a long session where an early incorrect assumption must not be resurrected by summary text. Each case should check retrieval behavior after compaction, not just the summary output. For persistence, I would store cluster lineage with stable IDs, parent links, tombstone state, compaction timestamp, source message IDs, and summary prompt/version. That makes rollback and debugging possible when a compressed context causes a bad answer. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Union-Find Compaction: Roadmap
PR: #24736
Issue: #22877
Design: union-find-compaction
In this document:
Progress
chatCompressionService.ts(flat/union-find dispatch)COMPRESSION_STRATEGY, ID 45768880)packages/core/src/context/ContextProcessorinterface (generalistProfileexperiment)compression.strategyin.gemini/settings.json)union-find.jsonper directory)Vision
Flat compaction is lossy at the wrong granularity and runs at the wrong time. It compresses the entire history into one summary, so facts, decisions, and constraints vanish unpredictably. It stalls the conversation while reprocessing. And when the session ends, everything is gone.
Union-find compaction addresses context rot: the progressive degradation of model coherence as flat compaction discards facts the model still needs. Episodes degrade gracefully (full detail → summary → tombstone → gone), and at every stage the model retains enough to stay useful.
Union-find compaction makes three bets. Topical lossiness over chunkwise lossiness: messages cluster into episodes ("the part where we debugged the race condition"), and when something is forgotten, it's a complete topic. The user can reason about what's gone. Continuous background compaction over foreground surprise: episodes summarize incrementally as messages graduate out of the hot window. No foreground stall, no sudden loss of context mid-conversation. Message-wise provenance plus summaries over summaries only: source messages remain addressable through parent pointers.
expand(rootId)recovers the original messages behind any summary. Flat compaction replaces the source; once summarized, the original is gone.expand()union-find.jsonThe forest persists in a
union-find.jsonfile tied to the directory. A new session reads the forest and picks up where the last one left off. Multiple sessions in the same directory see the same episode history.Union-find owns the middle layer. Simple LRU eviction on clusters is sufficient because the layers above and below handle the hard cases.
Prework
The union-find compaction architecture was prototyped, experimentally validated, and ported to gemini-cli across three repos.
Prototype and standalone experiment (repo) — forest data structure, TF-IDF embedder, cluster summarizer. 7 trials on a synthetic 200-message conversation with 40 verifiable facts. Advantage comes from granularity: flat compresses 200 messages into one block, union-find operates on clusters of 5–20, preserving footnote facts that disappear in single-pass compression.
Directional but underpowered (n=40; one trial reached p=0.039).
Gemini-cli experiment — preregistered, 12 real GitHub issue conversations (120 messages each), evaluated with Flash Lite. Experiment harness ready to rerun with different parameters.
resolveDirty()(p50)Underpowered at 96 questions. Recommended follow-up: 24+ conversations with Gemini Pro as summarizer.
Contribution history on gemini-cli — #2, #3, #4, #5. Prior PRs on a fork.
Current state
Implemented. ContextWindow + Forest (union-find with path compression), local TF-IDF embedder, async cluster summarizer. Integration in
chatCompressionService.tsdispatches between'flat'and'union-find'strategies.COMPRESSION_STRATEGY(ID 45768880), default'flat'graduateAt=26— hot window size. Keeps ~13 user-assistant turn pairs fully expanded. Chosen to be large enough for an active debugging loop but small enough that graduation starts before the context window fills.evictAt=30— headroom between graduation and eviction so the summarizer can finish before the next eviction check.maxColdClusters=10— starting heuristic. Provides headroom for sessions with several topic changes. Tune via eval data on real conversation distributions.mergeThreshold=0.15— cosine similarity floor for clustering. Below this, messages start a new cluster. Initial value from the standalone experiment; needs validation against gemini-cli conversation patterns.Eval criteria
Clustering quality (primary)
Clusters are the unit of recall and the unit of forgetting. If they align with how a human would chunk the conversation, everything downstream works. If they don't, no eviction policy saves you.
Measure: precision and recall of cluster boundaries against human-annotated topic boundaries. Target gemini-cli's actual conversation patterns (tool calls, file reads, multi-turn debugging, topic interleaving).
Summary and tombstone quality
Latency
append()+ backgroundresolveDirty(). Should be <100ms per graduation (TF-IDF is local; LLM call is fire-and-forget).Token efficiency
Compressed-to-original token ratio at equivalent recall. Per-cluster summaries may use more tokens than a single flat summary, but the recall advantage should justify the budget.
Provenance (qualitative)
expand(rootId)recovers source messages while a cluster is live. After eviction, source messages are gone and only the tombstone remains. Flat cannot recover sources at any stage. Provenance enables downstream features: audit trails, selective re-expansion, multiplayer context.Regression
All existing compression tests must pass unchanged. The flat path must remain functional as a fallback.
Compatibility
Union-find must coexist with existing session features (
/chat save,/chat resume, session checkpointing). The forest is additive derived state alongside the conversation history, not a replacement. On/chat resume, the forest can be rebuilt from the restored message list by replaying graduation. If a persisted forest exists and matches the session, skip rebuild and load directly. Verify during Phase 2 implementation that save/resume serialization covers all necessary state.Rollout phases
Phase 0: Experiment flag (current)
Union-find is available via
COMPRESSION_STRATEGYexperiment flag or localGEMINI_EXPoverride. No user-facing exposure.Exit criteria: Review feedback addressed, CI green on the PR.
Phase 1: Evals + LRU eviction
Two PRs, independent of each other:
Compression evals. Add compression-specific benchmarks to the existing
evals/framework:Useful beyond union-find: any future compression change can be evaluated against these evals.
LRU eviction with tombstones. Replace closest-pair merge with cluster-level LRU. Evicted clusters compress to one-line tombstones in a session history section. Infrastructure (
removeDocument()) already exists.Exit criteria: Compression evals merged with baseline results for flat. LRU eviction merged and validated against the evals.
Phase 2: User-facing opt-in + session persistence
Settings. Expose compression strategy in
.gemini/settings.json:{ "coreTools": { "compression": { "strategy": "union-find" } } }Power users opt in without experiment flags. Generates real-world usage data.
Session persistence. Serialize the forest to
.gemini/context/union-find.jsonper directory. New sessions read the forest on startup. Parent pointers are integers; serialization is straightforward.Persistence concerns:
.gemini/context/keeps it alongside other gemini-cli state. Should be excluded from version control (see open items on.gitignoreapproach)..gemini/state files. Summaries inherit the sanitization already applied at the<context_clusters>wrap site.flock) prevents mid-write corruption.Exit criteria: Settings schema updated and documented. Forest persists across session restarts.
Phase 3: A/B at scale
Enable union-find for a percentage of users via the experiment flag infrastructure. Measure:
Exit criteria: Union-find matches or beats flat on primary metrics (fact retention, clustering quality, user-reported coherence). Acceptable tradeoffs: up to 15% higher token usage if recall improves by ≥5pp absolute. Latency p95 must not regress by more than 10% vs flat baseline.
Phase 4: Default strategy
Flip the default to
'union-find'. Flat remains as a fallback via explicit setting. The existing fallback-on-summarization-failure path inchatCompressionService.ts(catches summarizer errors and falls back to flat) applies to union-find as well.Exit criteria: Stable in production for ≥2 weeks with no regressions.
Risks, failure modes, and mitigations
resolveDirty(). Monitor p95 on first compression.sanitizePromptStringescapes<>and backticks at the<context_clusters>wrap site. Persisted summaries inherit this sanitization. Untrusted repo content (e.g. user-controlled file names or error messages) passes through the same escaping path.resolveDirty()re-summarizes on next run. Persistence writes are atomic (write-to-temp, rename). No partial state reaches disk.Eviction policy
LRU on clusters. When the cluster count exceeds
maxColdClusters, drop the cluster whose newest member is oldest. Evicted clusters compress to a one-line tombstone in the session history.The current cap enforcement merges the two most similar clusters (
findClosestPair+union), creating progressively larger mega-clusters. After enough merges, one cluster can contain 50+ messages summarized into a single paragraph. Summary quality degrades because the summarizer compresses increasingly diverse content. LRU avoids this: clusters stay small, summaries stay coherent.In the gemini-cli experiment (12 conversations, 120 messages each), clusters followed a heavy-tailed distribution: 1–2 clusters per conversation contained the majority of messages, with the rest being short tangents of 3–8 messages. The dominant cluster is almost always the most recent (still in the hot window or freshly graduated) and is never the eviction candidate. The candidates are the small, old, self-contained tangents, which is what LRU selects. Whether this distribution holds across real user sessions is an open question for Phase 3 telemetry.
expand()on mega-clusterremoveDocument()already exists in the TF-IDF embedder for clean teardown on eviction.Security and context hygiene
Union-find compaction plugs into gemini-cli's existing multi-layer security infrastructure rather than adding new mechanisms.
sanitizePromptString()escapes code fences, HTML, control chars<context_clusters>wrap site and inclusterSummarizer.tsbefore summarizationagent-sanitization-utils.tsredacts PEM, JWT, tokens, credentials;environmentSanitization.tsfilters env varshistoryHardening.tsenforces role alternation, tool call/response pairingsnippets.tswraps memory in<loaded_context>, hooks in<hook_context>with injection disclaimers<context_clusters>(same pattern)storage.tscentral.geminipath map;chatRecordingService.tssanitizes session IDs;fileKeychain.tsuses0700/0600for credentials.gemini/context/union-find.json, same directory conventionstoolOutputMaskingService.tsoffloads large outputs to disk;toolDistillationService.tssummarizesKnown gaps (see open items):
Open items
Issues recognized but deferred to implementation PRs:
coreTools.compression.strategypath is a placeholder. Needs alignment with gemini-cli's actual settings schema before the Phase 2 PR.<context_clusters>wrap site). Persisted summaries need sanitize-on-write, not just sanitize-on-read..gitignoremutation. Auto-adding.gemini/context/to.gitignoreassumes git, assumes the user wants mutation. Needs a more cautious plan (check, warn, suggest rather than auto-edit).flockis Unix-only. Node-compatible atomic write story needed for Windows..gemini/context/union-find.json? Probably: leave it, add a cleanup command, document manual removal.What I can contribute next
Ordered by value to the project:
ContextProcessoradaptation — refactor union-find to work as aContextProcessor(akin toStateSnapshotProcessor+ async worker), integrating with thegeneralistProfileexperiment. Per @joshualitt's review; he offered to handle this post-merge if preferred.removeDocument()) already exists.Open to prioritization input from maintainers.
Beta Was this translation helpful? Give feedback.
All reactions