Skip to content

Commit c6a718d

Browse files
committed
docs(compression): document tokenCounter, forceConverge, embedSummaryId options
1 parent f9df318 commit c6a718d

1 file changed

Lines changed: 30 additions & 2 deletions

File tree

README.md

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,9 @@ result.compression.messages_fuzzy_deduped; // near-duplicates replaced (when
112112
| `dedup` | `boolean` | `true` | Replace earlier exact-duplicate messages with a compact reference |
113113
| `fuzzyDedup` | `boolean` | `false` | Detect near-duplicate messages using line-level similarity |
114114
| `fuzzyThreshold` | `number` | `0.85` | Similarity threshold for fuzzy dedup (0–1) |
115+
| `embedSummaryId` | `boolean` | `false` | Embed `summary_id` in compressed content for downstream reference |
116+
| `forceConverge` | `boolean` | `false` | Hard-truncate non-recency messages when binary search bottoms out and budget still exceeded |
117+
| `tokenCounter` | `(msg: Message) => number` | `defaultTokenCounter` | Custom token counter per message. Default: `ceil(content.length / 3.5)` |
115118

116119
#### Summarizer fallback
117120

@@ -123,16 +126,31 @@ When a `summarizer` is provided, each message goes through a three-level fallbac
123126

124127
#### Token budget
125128

126-
Use `tokenBudget` to automatically find the least compression needed to fit a token limit. The engine binary-searches `recencyWindow` internally. Token counts are estimated at ~3.5 characters per token — a reasonable average across models, but not exact. For precise budgeting, use `tokenBudget` as an approximate guide and verify with your model's tokenizer.
129+
Use `tokenBudget` to automatically find the least compression needed to fit a token limit. The engine binary-searches `recencyWindow` internally.
130+
131+
By default, tokens are estimated at ~3.5 characters per token. For accurate budgeting, pass a `tokenCounter` that uses your model's tokenizer — the counter is used for all budget decisions, binary search iterations, force-converge deltas, and `token_ratio` stats.
127132

128133
```ts
134+
import { compress, defaultTokenCounter } from '@ultracontext/compression';
135+
129136
const result = compress(messages, {
130137
tokenBudget: 4000,
131138
minRecencyWindow: 2,
132139
});
133140

134141
result.fits; // true if result fits within budget
135-
result.tokenCount; // estimated token count
142+
result.tokenCount; // token count (via tokenCounter)
143+
144+
// Plug in a real tokenizer
145+
import { encode } from 'gpt-tokenizer';
146+
147+
const result = compress(messages, {
148+
tokenBudget: 4000,
149+
tokenCounter: (msg) => {
150+
const text = typeof msg.content === 'string' ? msg.content : '';
151+
return encode(text).length;
152+
},
153+
});
136154

137155
// With LLM summarizer for tighter fits
138156
const result = await compress(messages, {
@@ -141,6 +159,16 @@ const result = await compress(messages, {
141159
});
142160
```
143161

162+
When `forceConverge` is enabled, the engine hard-truncates non-recency messages to 512 characters if the binary search bottoms out and the budget is still exceeded. This mirrors LCM's Level 3 `DeterministicTruncate` — no LLM involved, guaranteed convergence.
163+
164+
```ts
165+
const result = compress(messages, {
166+
tokenBudget: 4000,
167+
forceConverge: true,
168+
});
169+
// result.fits is guaranteed true (unless only system/recency messages remain)
170+
```
171+
144172
### uncompress
145173

146174
Restore originals from the verbatim store. Always sync. Supports recursive expansion for multi-layer compression (up to 10 levels deep).

0 commit comments

Comments
 (0)