@@ -46,44 +46,44 @@ function compress(
4646
4747### Parameters
4848
49- | Parameter | Type | Description |
50- | ---------- | ----------------- | ------------------ |
51- | ` messages ` | ` Message[] ` | Messages to compress |
49+ | Parameter | Type | Description |
50+ | ---------- | ----------------- | ------------------------------- |
51+ | ` messages ` | ` Message[] ` | Messages to compress |
5252| ` options ` | ` CompressOptions ` | Compression options (see below) |
5353
5454### CompressOptions
5555
56- | Option | Type | Default | Description |
57- | ------------------ | -------------------------- | --------------------- | ------------------------------------------------------------------------------------------- |
58- | ` preserve ` | ` string[] ` | ` ['system'] ` | Roles to never compress |
59- | ` recencyWindow ` | ` number ` | ` 4 ` | Protect the last N messages from compression |
60- | ` sourceVersion ` | ` number ` | ` 0 ` | Version tag for [ provenance tracking] ( provenance.md ) |
56+ | Option | Type | Default | Description |
57+ | ------------------ | -------------------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------ |
58+ | ` preserve ` | ` string[] ` | ` ['system'] ` | Roles to never compress |
59+ | ` recencyWindow ` | ` number ` | ` 4 ` | Protect the last N messages from compression |
60+ | ` sourceVersion ` | ` number ` | ` 0 ` | Version tag for [ provenance tracking] ( provenance.md ) |
6161| ` summarizer ` | ` Summarizer ` | - | LLM-powered summarizer. When provided, ` compress() ` returns a ` Promise ` . See [ LLM integration] ( llm-integration.md ) |
62- | ` tokenBudget ` | ` number ` | - | Target token count. Binary-searches ` recencyWindow ` to fit. See [ Token budget] ( token-budget.md ) |
63- | ` minRecencyWindow ` | ` number ` | ` 0 ` | Floor for ` recencyWindow ` when using ` tokenBudget ` |
64- | ` dedup ` | ` boolean ` | ` true ` | Replace earlier exact-duplicate messages with a compact reference. See [ Deduplication] ( deduplication.md ) |
65- | ` fuzzyDedup ` | ` boolean ` | ` false ` | Detect near-duplicate messages using line-level similarity. See [ Deduplication] ( deduplication.md ) |
66- | ` fuzzyThreshold ` | ` number ` | ` 0.85 ` | Similarity threshold for fuzzy dedup (0-1) |
67- | ` embedSummaryId ` | ` boolean ` | ` false ` | Embed ` summary_id ` in compressed content for downstream reference. See [ Provenance] ( provenance.md ) |
68- | ` forceConverge ` | ` boolean ` | ` false ` | Hard-truncate non-recency messages when binary search bottoms out. See [ Token budget] ( token-budget.md ) |
69- | ` tokenCounter ` | ` (msg: Message) => number ` | ` defaultTokenCounter ` | Custom token counter per message. See [ Token budget] ( token-budget.md ) |
62+ | ` tokenBudget ` | ` number ` | - | Target token count. Binary-searches ` recencyWindow ` to fit. See [ Token budget] ( token-budget.md ) |
63+ | ` minRecencyWindow ` | ` number ` | ` 0 ` | Floor for ` recencyWindow ` when using ` tokenBudget ` |
64+ | ` dedup ` | ` boolean ` | ` true ` | Replace earlier exact-duplicate messages with a compact reference. See [ Deduplication] ( deduplication.md ) |
65+ | ` fuzzyDedup ` | ` boolean ` | ` false ` | Detect near-duplicate messages using line-level similarity. See [ Deduplication] ( deduplication.md ) |
66+ | ` fuzzyThreshold ` | ` number ` | ` 0.85 ` | Similarity threshold for fuzzy dedup (0-1) |
67+ | ` embedSummaryId ` | ` boolean ` | ` false ` | Embed ` summary_id ` in compressed content for downstream reference. See [ Provenance] ( provenance.md ) |
68+ | ` forceConverge ` | ` boolean ` | ` false ` | Hard-truncate non-recency messages when binary search bottoms out. See [ Token budget] ( token-budget.md ) |
69+ | ` tokenCounter ` | ` (msg: Message) => number ` | ` defaultTokenCounter ` | Custom token counter per message. See [ Token budget] ( token-budget.md ) |
7070
7171### CompressResult
7272
73- | Field | Type | Description |
74- | ------------------------------------ | ----------------------- | ------------------------------------------------------------------------ |
75- | ` messages ` | ` Message[] ` | Compressed message array |
76- | ` verbatim ` | ` VerbatimMap ` | Original messages keyed by ID. Must be persisted atomically with ` messages ` |
77- | ` compression.original_version ` | ` number ` | Mirrors ` sourceVersion ` |
78- | ` compression.ratio ` | ` number ` | Character-based compression ratio. >1 means savings |
79- | ` compression.token_ratio ` | ` number ` | Token-based compression ratio. >1 means savings |
80- | ` compression.messages_compressed ` | ` number ` | Messages that were compressed |
81- | ` compression.messages_preserved ` | ` number ` | Messages kept as-is |
82- | ` compression.messages_deduped ` | ` number \| undefined ` | Exact duplicates replaced (when ` dedup: true ` ) |
83- | ` compression.messages_fuzzy_deduped ` | ` number \| undefined ` | Near-duplicates replaced (when ` fuzzyDedup: true ` ) |
84- | ` fits ` | ` boolean \| undefined ` | Whether result fits within ` tokenBudget ` . Present when ` tokenBudget ` is set |
85- | ` tokenCount ` | ` number \| undefined ` | Estimated token count. Present when ` tokenBudget ` is set |
86- | ` recencyWindow ` | ` number \| undefined ` | The ` recencyWindow ` the binary search settled on. Present when ` tokenBudget ` is set |
73+ | Field | Type | Description |
74+ | ------------------------------------ | ---------------------- | ----------- ------------------------------------------------------------------------ |
75+ | ` messages ` | ` Message[] ` | Compressed message array |
76+ | ` verbatim ` | ` VerbatimMap ` | Original messages keyed by ID. Must be persisted atomically with ` messages ` |
77+ | ` compression.original_version ` | ` number ` | Mirrors ` sourceVersion ` |
78+ | ` compression.ratio ` | ` number ` | Character-based compression ratio. >1 means savings |
79+ | ` compression.token_ratio ` | ` number ` | Token-based compression ratio. >1 means savings |
80+ | ` compression.messages_compressed ` | ` number ` | Messages that were compressed |
81+ | ` compression.messages_preserved ` | ` number ` | Messages kept as-is |
82+ | ` compression.messages_deduped ` | ` number \| undefined ` | Exact duplicates replaced (when ` dedup: true ` ) |
83+ | ` compression.messages_fuzzy_deduped ` | ` number \| undefined ` | Near-duplicates replaced (when ` fuzzyDedup: true ` ) |
84+ | ` fits ` | ` boolean \| undefined ` | Whether result fits within ` tokenBudget ` . Present when ` tokenBudget ` is set |
85+ | ` tokenCount ` | ` number \| undefined ` | Estimated token count. Present when ` tokenBudget ` is set |
86+ | ` recencyWindow ` | ` number \| undefined ` | The ` recencyWindow ` the binary search settled on. Present when ` tokenBudget ` is set |
8787
8888### Example
8989
@@ -121,26 +121,26 @@ function uncompress(
121121
122122### Parameters
123123
124- | Parameter | Type | Description |
125- | ---------- | ------------------ | ----------- |
126- | ` messages ` | ` Message[] ` | Compressed messages to expand |
127- | ` store ` | ` StoreLookup ` | ` VerbatimMap ` object or ` (id: string) => Message \| undefined ` function |
128- | ` options ` | ` UncompressOptions ` | Expansion options (see below) |
124+ | Parameter | Type | Description |
125+ | ---------- | ------------------- | ------------------------------------------------------------ ----------- |
126+ | ` messages ` | ` Message[] ` | Compressed messages to expand |
127+ | ` store ` | ` StoreLookup ` | ` VerbatimMap ` object or ` (id: string) => Message \| undefined ` function |
128+ | ` options ` | ` UncompressOptions ` | Expansion options (see below) |
129129
130130### UncompressOptions
131131
132- | Option | Type | Default | Description |
133- | ----------- | --------- | ------- | ----------- |
132+ | Option | Type | Default | Description |
133+ | ----------- | --------- | ------- | --------------------------------------------------------------------------------- |
134134| ` recursive ` | ` boolean ` | ` false ` | Recursively expand messages whose originals are also compressed (up to 10 levels) |
135135
136136### UncompressResult
137137
138- | Field | Type | Description |
139- | --------------------- | ---------- | ----------- |
140- | ` messages ` | ` Message[] ` | Expanded messages |
141- | ` messages_expanded ` | ` number ` | How many compressed messages were restored |
142- | ` messages_passthrough ` | ` number ` | How many messages passed through unchanged |
143- | ` missing_ids ` | ` string[] ` | IDs looked up but not found. Non-empty = data loss |
138+ | Field | Type | Description |
139+ | ---------------------- | ----------- | --------------------------------------- ----------- |
140+ | ` messages ` | ` Message[] ` | Expanded messages |
141+ | ` messages_expanded ` | ` number ` | How many compressed messages were restored |
142+ | ` messages_passthrough ` | ` number ` | How many messages passed through unchanged |
143+ | ` missing_ids ` | ` string[] ` | IDs looked up but not found. Non-empty = data loss |
144144
145145### Example
146146
@@ -171,7 +171,7 @@ function defaultTokenCounter(msg: Message): number;
171171### Formula
172172
173173``` ts
174- Math .ceil (msg .content .length / 3.5 )
174+ Math .ceil (msg .content .length / 3.5 );
175175```
176176
177177Approximates ~ 3.5 characters per token. Suitable for rough estimates. For accurate budgeting, replace with a real tokenizer. See [ Token budget] ( token-budget.md ) .
@@ -195,7 +195,7 @@ function createSummarizer(
195195
196196| Option | Type | Default | Description |
197197| ------------------- | -------------------------- | ---------- | -------------------------------------------------------------------- |
198- | ` maxResponseTokens ` | ` number ` | ` 300 ` | Hint for maximum tokens in the LLM response |
198+ | ` maxResponseTokens ` | ` number ` | ` 300 ` | Hint for maximum tokens in the LLM response |
199199| ` systemPrompt ` | ` string ` | - | Domain-specific instructions prepended to the built-in rules |
200200| ` mode ` | ` 'normal' \| 'aggressive' ` | ` 'normal' ` | ` 'aggressive' ` produces terse bullet points at half the token budget |
201201| ` preserveTerms ` | ` string[] ` | - | Domain-specific terms appended to the built-in preserve list |
@@ -209,14 +209,11 @@ The prompt always preserves: code references, file paths, function/variable name
209209``` ts
210210import { createSummarizer , compress } from ' context-compression-engine' ;
211211
212- const summarizer = createSummarizer (
213- async (prompt ) => myLlm .complete (prompt ),
214- {
215- maxResponseTokens: 300 ,
216- systemPrompt: ' This is a legal contract. Preserve all clause numbers.' ,
217- preserveTerms: [' clause numbers' , ' party names' ],
218- },
219- );
212+ const summarizer = createSummarizer (async (prompt ) => myLlm .complete (prompt ), {
213+ maxResponseTokens: 300 ,
214+ systemPrompt: ' This is a legal contract. Preserve all clause numbers.' ,
215+ preserveTerms: [' clause numbers' , ' party names' ],
216+ });
220217
221218const result = await compress (messages , { summarizer });
222219```
0 commit comments