You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lossless context compression for LLMs. Zero dependencies. Zero API calls. Works everywhere JavaScript runs.
7
8
9
+
> **1.3-6.1x compression** on synthetic scenarios, **1.5x on real Claude Code sessions** (11.7M chars across 8,004 messages) — fully deterministic, no LLM needed. Largest session: 4,257 messages / 5.8M chars compressed in 651ms with zero negatives. Every compression is losslessly reversible.
10
+
8
11
## The problem
9
12
10
13
Context is the RAM of LLMs. As conversations grow, model attention spreads thin — a phenomenon known as **context rot**. Tokens spent on stale prose are tokens not spent on the task at hand.
@@ -35,7 +38,7 @@ The engine ships with a full benchmark suite that pits deterministic compression
35
38
36
39
**The deterministic engine achieves 1.3-6.1x compression with zero latency and zero cost.** It scores sentences, packs a budget, strips filler — and in most scenarios, it compresses tighter than an LLM.
37
40
38
-
Why? LLMs try to be *helpful*. They write fuller summaries that happen to be longer. The deterministic engine is optimized purely for compression — it doesn't care about readability, just signal density.
41
+
Why? LLMs try to be _helpful_. They write fuller summaries that happen to be longer. The deterministic engine is optimized purely for compression — it doesn't care about readability, just signal density.
39
42
40
43
**LLM summarization is opt-in for cases where semantic understanding improves summary quality** — long, prose-heavy conversations where the LLM's ability to paraphrase and merge concepts across many messages genuinely helps. The engine supports this via a pluggable `summarizer` option with a built-in fallback chain that automatically rejects LLM output when it's longer than the deterministic result.
41
44
@@ -80,9 +83,13 @@ Works in Node 18+, Deno, Bun, and edge runtimes. This is an ESM-only package —
Detects near-duplicates using line-level Jaccard similarity. Useful when the same file is read across edit cycles — the content evolves slightly but remains largely the same.
301
306
302
307
The algorithm:
308
+
303
309
1.**Fingerprint bucketing** — groups candidates by their first 5 non-empty normalized lines (requires 3+ shared)
304
310
2.**Length-ratio pre-filter** — skips pairs where `min/max < 0.7`
305
311
3.**Line-level Jaccard** — `|A ∩ B| / |A ∪ B|` using multiset frequency maps of normalized lines
@@ -445,19 +451,19 @@ If the summarizer throws or returns text longer than the input, the engine falls
445
451
446
452
The classifier automatically preserves content that should never be summarized:
| Tool calls | Messages with `tool_calls` array | Yes|
464
+
| System messages |`role: 'system'` (default) | Yes|
465
+
| Duplicates | Repeated content (exact or fuzzy) |**Replaced with reference**|
466
+
| Long prose | General discussion, explanations |**Compressed**|
461
467
462
468
Code-mixed messages get split: prose is summarized, code fences stay verbatim.
463
469
@@ -514,11 +520,11 @@ The benchmark covers seven conversation scenarios (coding assistant, long Q&A, t
514
520
515
521
The benchmark runner includes an opt-in LLM section that compares deterministic compression against real LLM-powered summarization head-to-head. Set one or more environment variables to enable it:
0 commit comments