Skip to content

Commit fa16341

Browse files
committed
chore: switch to AGPL-3.0 dual licensing
AGPL-3.0 for open-source use, commercial license available for proprietary projects. Contact lisa@tastehub.io for terms.
1 parent 7f2ab7a commit fa16341

13 files changed

Lines changed: 833 additions & 302 deletions

LICENSE

Lines changed: 650 additions & 136 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[![CI](https://github.com/SimplyLiz/ContextCompressionEngine/actions/workflows/ci.yml/badge.svg)](https://github.com/SimplyLiz/ContextCompressionEngine/actions/workflows/ci.yml)
44
[![npm version](https://img.shields.io/npm/v/context-compression-engine.svg)](https://www.npmjs.com/package/context-compression-engine)
5-
[![license](https://img.shields.io/badge/license-PolyForm%20Noncommercial-blue)](LICENSE)
5+
[![license](https://img.shields.io/badge/license-AGPL--3.0-blue)](LICENSE)
66

77
Lossless context compression for LLMs. Zero dependencies. Zero API calls. Works everywhere JavaScript runs.
88

@@ -135,6 +135,7 @@ Built-in estimator: `ceil(content.length / 3.5)`. Replace with a real tokenizer
135135

136136
## License
137137

138-
[PolyForm Noncommercial 1.0.0](LICENSE) — free for personal use, open-source projects, and non-commercial purposes.
138+
This project is dual-licensed:
139139

140-
**Commercial use** requires a separate license. Contact [lisa@tastehub.io](mailto:lisa@tastehub.io) — happy to discuss collaboration and licensing options.
140+
- **Open source**[AGPL-3.0](LICENSE). You can use, modify, and distribute this library freely, provided your project is also open-sourced under AGPL-3.0 or a compatible license.
141+
- **Commercial** — If you want to use this library in proprietary software without open-sourcing your project, a commercial license is available. Contact [lisa@tastehub.io](mailto:lisa@tastehub.io) for terms.

docs/README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22

33
[Back to README](../README.md)
44

5-
| Page | Description |
6-
| ---- | ----------- |
7-
| [API Reference](api-reference.md) | All exports, types, options, and result fields |
5+
| Page | Description |
6+
| ----------------------------------------------- | --------------------------------------------------------------- |
7+
| [API Reference](api-reference.md) | All exports, types, options, and result fields |
88
| [Compression Pipeline](compression-pipeline.md) | How compression works: classify, dedup, merge, summarize, guard |
9-
| [Deduplication](deduplication.md) | Exact + fuzzy dedup algorithms, tuning thresholds |
10-
| [Token Budget](token-budget.md) | Budget-driven compression, binary search, custom tokenizers |
11-
| [LLM Integration](llm-integration.md) | Provider examples: Claude, OpenAI, Gemini, Grok, Ollama |
12-
| [Round-trip](round-trip.md) | Lossless compress/uncompress, VerbatimMap, atomicity |
13-
| [Provenance](provenance.md) | `_cce_original` metadata, summary_id, parent_ids |
14-
| [Preservation Rules](preservation-rules.md) | What gets preserved, classification tiers, code-aware splitting |
15-
| [Benchmarks](benchmarks.md) | Running benchmarks, LLM comparison, interpreting results |
9+
| [Deduplication](deduplication.md) | Exact + fuzzy dedup algorithms, tuning thresholds |
10+
| [Token Budget](token-budget.md) | Budget-driven compression, binary search, custom tokenizers |
11+
| [LLM Integration](llm-integration.md) | Provider examples: Claude, OpenAI, Gemini, Grok, Ollama |
12+
| [Round-trip](round-trip.md) | Lossless compress/uncompress, VerbatimMap, atomicity |
13+
| [Provenance](provenance.md) | `_cce_original` metadata, summary_id, parent_ids |
14+
| [Preservation Rules](preservation-rules.md) | What gets preserved, classification tiers, code-aware splitting |
15+
| [Benchmarks](benchmarks.md) | Running benchmarks, LLM comparison, interpreting results |

docs/api-reference.md

Lines changed: 50 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -46,44 +46,44 @@ function compress(
4646

4747
### Parameters
4848

49-
| Parameter | Type | Description |
50-
| ---------- | ----------------- | ------------------ |
51-
| `messages` | `Message[]` | Messages to compress |
49+
| Parameter | Type | Description |
50+
| ---------- | ----------------- | ------------------------------- |
51+
| `messages` | `Message[]` | Messages to compress |
5252
| `options` | `CompressOptions` | Compression options (see below) |
5353

5454
### CompressOptions
5555

56-
| Option | Type | Default | Description |
57-
| ------------------ | -------------------------- | --------------------- | ------------------------------------------------------------------------------------------- |
58-
| `preserve` | `string[]` | `['system']` | Roles to never compress |
59-
| `recencyWindow` | `number` | `4` | Protect the last N messages from compression |
60-
| `sourceVersion` | `number` | `0` | Version tag for [provenance tracking](provenance.md) |
56+
| Option | Type | Default | Description |
57+
| ------------------ | -------------------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------ |
58+
| `preserve` | `string[]` | `['system']` | Roles to never compress |
59+
| `recencyWindow` | `number` | `4` | Protect the last N messages from compression |
60+
| `sourceVersion` | `number` | `0` | Version tag for [provenance tracking](provenance.md) |
6161
| `summarizer` | `Summarizer` | - | LLM-powered summarizer. When provided, `compress()` returns a `Promise`. See [LLM integration](llm-integration.md) |
62-
| `tokenBudget` | `number` | - | Target token count. Binary-searches `recencyWindow` to fit. See [Token budget](token-budget.md) |
63-
| `minRecencyWindow` | `number` | `0` | Floor for `recencyWindow` when using `tokenBudget` |
64-
| `dedup` | `boolean` | `true` | Replace earlier exact-duplicate messages with a compact reference. See [Deduplication](deduplication.md) |
65-
| `fuzzyDedup` | `boolean` | `false` | Detect near-duplicate messages using line-level similarity. See [Deduplication](deduplication.md) |
66-
| `fuzzyThreshold` | `number` | `0.85` | Similarity threshold for fuzzy dedup (0-1) |
67-
| `embedSummaryId` | `boolean` | `false` | Embed `summary_id` in compressed content for downstream reference. See [Provenance](provenance.md) |
68-
| `forceConverge` | `boolean` | `false` | Hard-truncate non-recency messages when binary search bottoms out. See [Token budget](token-budget.md) |
69-
| `tokenCounter` | `(msg: Message) => number` | `defaultTokenCounter` | Custom token counter per message. See [Token budget](token-budget.md) |
62+
| `tokenBudget` | `number` | - | Target token count. Binary-searches `recencyWindow` to fit. See [Token budget](token-budget.md) |
63+
| `minRecencyWindow` | `number` | `0` | Floor for `recencyWindow` when using `tokenBudget` |
64+
| `dedup` | `boolean` | `true` | Replace earlier exact-duplicate messages with a compact reference. See [Deduplication](deduplication.md) |
65+
| `fuzzyDedup` | `boolean` | `false` | Detect near-duplicate messages using line-level similarity. See [Deduplication](deduplication.md) |
66+
| `fuzzyThreshold` | `number` | `0.85` | Similarity threshold for fuzzy dedup (0-1) |
67+
| `embedSummaryId` | `boolean` | `false` | Embed `summary_id` in compressed content for downstream reference. See [Provenance](provenance.md) |
68+
| `forceConverge` | `boolean` | `false` | Hard-truncate non-recency messages when binary search bottoms out. See [Token budget](token-budget.md) |
69+
| `tokenCounter` | `(msg: Message) => number` | `defaultTokenCounter` | Custom token counter per message. See [Token budget](token-budget.md) |
7070

7171
### CompressResult
7272

73-
| Field | Type | Description |
74-
| ------------------------------------ | ----------------------- | ------------------------------------------------------------------------ |
75-
| `messages` | `Message[]` | Compressed message array |
76-
| `verbatim` | `VerbatimMap` | Original messages keyed by ID. Must be persisted atomically with `messages` |
77-
| `compression.original_version` | `number` | Mirrors `sourceVersion` |
78-
| `compression.ratio` | `number` | Character-based compression ratio. >1 means savings |
79-
| `compression.token_ratio` | `number` | Token-based compression ratio. >1 means savings |
80-
| `compression.messages_compressed` | `number` | Messages that were compressed |
81-
| `compression.messages_preserved` | `number` | Messages kept as-is |
82-
| `compression.messages_deduped` | `number \| undefined` | Exact duplicates replaced (when `dedup: true`) |
83-
| `compression.messages_fuzzy_deduped` | `number \| undefined` | Near-duplicates replaced (when `fuzzyDedup: true`) |
84-
| `fits` | `boolean \| undefined` | Whether result fits within `tokenBudget`. Present when `tokenBudget` is set |
85-
| `tokenCount` | `number \| undefined` | Estimated token count. Present when `tokenBudget` is set |
86-
| `recencyWindow` | `number \| undefined` | The `recencyWindow` the binary search settled on. Present when `tokenBudget` is set |
73+
| Field | Type | Description |
74+
| ------------------------------------ | ---------------------- | ----------------------------------------------------------------------------------- |
75+
| `messages` | `Message[]` | Compressed message array |
76+
| `verbatim` | `VerbatimMap` | Original messages keyed by ID. Must be persisted atomically with `messages` |
77+
| `compression.original_version` | `number` | Mirrors `sourceVersion` |
78+
| `compression.ratio` | `number` | Character-based compression ratio. >1 means savings |
79+
| `compression.token_ratio` | `number` | Token-based compression ratio. >1 means savings |
80+
| `compression.messages_compressed` | `number` | Messages that were compressed |
81+
| `compression.messages_preserved` | `number` | Messages kept as-is |
82+
| `compression.messages_deduped` | `number \| undefined` | Exact duplicates replaced (when `dedup: true`) |
83+
| `compression.messages_fuzzy_deduped` | `number \| undefined` | Near-duplicates replaced (when `fuzzyDedup: true`) |
84+
| `fits` | `boolean \| undefined` | Whether result fits within `tokenBudget`. Present when `tokenBudget` is set |
85+
| `tokenCount` | `number \| undefined` | Estimated token count. Present when `tokenBudget` is set |
86+
| `recencyWindow` | `number \| undefined` | The `recencyWindow` the binary search settled on. Present when `tokenBudget` is set |
8787

8888
### Example
8989

@@ -121,26 +121,26 @@ function uncompress(
121121

122122
### Parameters
123123

124-
| Parameter | Type | Description |
125-
| ---------- | ------------------ | ----------- |
126-
| `messages` | `Message[]` | Compressed messages to expand |
127-
| `store` | `StoreLookup` | `VerbatimMap` object or `(id: string) => Message \| undefined` function |
128-
| `options` | `UncompressOptions` | Expansion options (see below) |
124+
| Parameter | Type | Description |
125+
| ---------- | ------------------- | ----------------------------------------------------------------------- |
126+
| `messages` | `Message[]` | Compressed messages to expand |
127+
| `store` | `StoreLookup` | `VerbatimMap` object or `(id: string) => Message \| undefined` function |
128+
| `options` | `UncompressOptions` | Expansion options (see below) |
129129

130130
### UncompressOptions
131131

132-
| Option | Type | Default | Description |
133-
| ----------- | --------- | ------- | ----------- |
132+
| Option | Type | Default | Description |
133+
| ----------- | --------- | ------- | --------------------------------------------------------------------------------- |
134134
| `recursive` | `boolean` | `false` | Recursively expand messages whose originals are also compressed (up to 10 levels) |
135135

136136
### UncompressResult
137137

138-
| Field | Type | Description |
139-
| --------------------- | ---------- | ----------- |
140-
| `messages` | `Message[]` | Expanded messages |
141-
| `messages_expanded` | `number` | How many compressed messages were restored |
142-
| `messages_passthrough` | `number` | How many messages passed through unchanged |
143-
| `missing_ids` | `string[]` | IDs looked up but not found. Non-empty = data loss |
138+
| Field | Type | Description |
139+
| ---------------------- | ----------- | -------------------------------------------------- |
140+
| `messages` | `Message[]` | Expanded messages |
141+
| `messages_expanded` | `number` | How many compressed messages were restored |
142+
| `messages_passthrough` | `number` | How many messages passed through unchanged |
143+
| `missing_ids` | `string[]` | IDs looked up but not found. Non-empty = data loss |
144144

145145
### Example
146146

@@ -171,7 +171,7 @@ function defaultTokenCounter(msg: Message): number;
171171
### Formula
172172

173173
```ts
174-
Math.ceil(msg.content.length / 3.5)
174+
Math.ceil(msg.content.length / 3.5);
175175
```
176176

177177
Approximates ~3.5 characters per token. Suitable for rough estimates. For accurate budgeting, replace with a real tokenizer. See [Token budget](token-budget.md).
@@ -195,7 +195,7 @@ function createSummarizer(
195195

196196
| Option | Type | Default | Description |
197197
| ------------------- | -------------------------- | ---------- | -------------------------------------------------------------------- |
198-
| `maxResponseTokens` | `number` | `300` | Hint for maximum tokens in the LLM response |
198+
| `maxResponseTokens` | `number` | `300` | Hint for maximum tokens in the LLM response |
199199
| `systemPrompt` | `string` | - | Domain-specific instructions prepended to the built-in rules |
200200
| `mode` | `'normal' \| 'aggressive'` | `'normal'` | `'aggressive'` produces terse bullet points at half the token budget |
201201
| `preserveTerms` | `string[]` | - | Domain-specific terms appended to the built-in preserve list |
@@ -209,14 +209,11 @@ The prompt always preserves: code references, file paths, function/variable name
209209
```ts
210210
import { createSummarizer, compress } from 'context-compression-engine';
211211

212-
const summarizer = createSummarizer(
213-
async (prompt) => myLlm.complete(prompt),
214-
{
215-
maxResponseTokens: 300,
216-
systemPrompt: 'This is a legal contract. Preserve all clause numbers.',
217-
preserveTerms: ['clause numbers', 'party names'],
218-
},
219-
);
212+
const summarizer = createSummarizer(async (prompt) => myLlm.complete(prompt), {
213+
maxResponseTokens: 300,
214+
systemPrompt: 'This is a legal contract. Preserve all clause numbers.',
215+
preserveTerms: ['clause numbers', 'party names'],
216+
});
220217

221218
const result = await compress(messages, { summarizer });
222219
```

0 commit comments

Comments
 (0)