|
| 1 | +# Kompact |
| 2 | + |
| 3 | +[](https://github.com/npow/kompact/actions/workflows/ci.yml) |
| 4 | +[](https://pypi.org/project/kompact/) |
| 5 | +[](https://www.python.org/downloads/) |
| 6 | +[](LICENSE) |
| 7 | +[](https://github.com/astral-sh/ruff) |
| 8 | + |
| 9 | +Multi-layer context optimization proxy for LLM agents. Reduces token usage by 40-70% with zero information loss. |
| 10 | + |
| 11 | +``` |
| 12 | + ┌──────────────────────────────────────────────┐ |
| 13 | + │ Kompact Proxy (:7878) │ |
| 14 | + │ │ |
| 15 | +Agent ─>│ 1. Schema Optimizer (TF-IDF selection) │─> LLM Provider |
| 16 | + │ 2. Content Compressors (TOON, JSON, code) │ |
| 17 | + │ 3. Extractive Compress (TF-IDF sentences) │ |
| 18 | + │ 4. Observation Masker (history mgmt) │ |
| 19 | + │ 5. Cache Aligner (prefix caching) │ |
| 20 | + │ │ |
| 21 | + └──────────────────────────────────────────────┘ |
| 22 | +``` |
| 23 | + |
| 24 | +## Quick Start |
| 25 | + |
| 26 | +```bash |
| 27 | +# Install |
| 28 | +uv sync |
| 29 | + |
| 30 | +# Start proxy |
| 31 | +uv run kompact proxy --port 7878 |
| 32 | + |
| 33 | +# Point your agent at it |
| 34 | +export ANTHROPIC_BASE_URL=http://localhost:7878 |
| 35 | +claude # or any Anthropic/OpenAI-compatible agent |
| 36 | +``` |
| 37 | + |
| 38 | +## How It Works |
| 39 | + |
| 40 | +Kompact is a transparent HTTP proxy. No code changes needed — just change your base URL. It intercepts LLM API requests, applies a pipeline of transforms to compress the context, then forwards the optimized request to the provider. |
| 41 | + |
| 42 | +| Transform | Target | Savings | Cost | |
| 43 | +|-----------|--------|--------:|------| |
| 44 | +| **TOON** | JSON arrays of objects | 30-60% | Zero (string manipulation) | |
| 45 | +| **JSON Crusher** | Structured JSON data | 40-80% | Minimal (Counter stats) | |
| 46 | +| **Code Compressor** | Code in tool results | ~70% | Regex parse | |
| 47 | +| **Log Compressor** | Repetitive log output | 60-90% | Regex dedup | |
| 48 | +| **Content Compressor** | Long prose/text | 25-55% | TF-IDF scoring | |
| 49 | +| **Schema Optimizer** | Tool definitions | 50-90% | TF-IDF cosine similarity | |
| 50 | +| **Observation Masker** | Old tool outputs | ~50% | Zero (placeholder swap) | |
| 51 | +| **Cache Aligner** | System prompts | Provider cache discount | Regex substitution | |
| 52 | + |
| 53 | +The pipeline adapts automatically — short contexts get light compression, long contexts get aggressive optimization. |
| 54 | + |
| 55 | +## Configuration |
| 56 | + |
| 57 | +```bash |
| 58 | +# Disable specific transforms |
| 59 | +uv run kompact proxy --port 7878 --disable toon --disable log_compressor |
| 60 | + |
| 61 | +# Verbose mode |
| 62 | +uv run kompact proxy --port 7878 --verbose |
| 63 | + |
| 64 | +# View live dashboard |
| 65 | +open http://localhost:7878/dashboard |
| 66 | +``` |
| 67 | + |
| 68 | +## Benchmarks |
| 69 | + |
| 70 | +Evaluated on industry-standard datasets using [context-bench](https://pypi.org/project/context-bench/). Full datasets, no cherry-picking. |
| 71 | + |
| 72 | +### BFCL — Berkeley Function Calling Leaderboard (1,431 examples) |
| 73 | + |
| 74 | +Real API schemas from the [Gorilla project](https://gorilla.cs.berkeley.edu/). The standard benchmark for tool-calling compression. |
| 75 | + |
| 76 | +| System | Compression | NIAH | Recall | Effective Ratio | Cost-of-Pass | |
| 77 | +|--------|------------:|-----:|-------:|----------------:|-------------:| |
| 78 | +| No Compression | 0.0% | 97% | 96.6% | -1.5% | 931 | |
| 79 | +| JSON Minification | 32.9% | 97% | 96.5% | 31.4% | 625 | |
| 80 | +| Truncation (50%) | 49.7% | 97% | 96.6% | 48.2% | 469 | |
| 81 | +| **Kompact** | **55.3%** | **90%** | **90.9%** | **48.2%** | **443** | |
| 82 | + |
| 83 | +### Glaive Function Calling v2 (3,959 examples) |
| 84 | + |
| 85 | +Tool-calling conversations with JSON schemas. 113K-example dataset. |
| 86 | + |
| 87 | +| System | Compression | NIAH | Recall | Effective Ratio | Cost-of-Pass | |
| 88 | +|--------|------------:|-----:|-------:|----------------:|-------------:| |
| 89 | +| No Compression | 0.0% | 100% | 100.0% | 0.0% | 157 | |
| 90 | +| JSON Minification | 35.3% | 100% | 100.0% | 35.3% | 101 | |
| 91 | +| Truncation (50%) | 47.8% | 100% | 100.0% | 47.8% | 82 | |
| 92 | +| **Kompact** | **56.6%** | **100%** | **100.0%** | **56.6%** | **68** | |
| 93 | + |
| 94 | +### HotpotQA (7,405 examples) |
| 95 | + |
| 96 | +Multi-hop QA over Wikipedia paragraphs. The standard benchmark used by Headroom and LLMLingua. |
| 97 | + |
| 98 | +| System | Compression | NIAH | Recall | Effective Ratio | Cost-of-Pass | |
| 99 | +|--------|------------:|-----:|-------:|----------------:|-------------:| |
| 100 | +| No Compression | 0.0% | 97% | 97.1% | -2.5% | 1,363 | |
| 101 | +| JSON Minification | 0.0% | 97% | 97.1% | -2.5% | 1,363 | |
| 102 | +| Truncation (50%) | 49.9% | 63% | 71.4% | 13.0% | 1,004 | |
| 103 | +| **Kompact** | **17.9%** | **91%** | **93.1%** | **8.8%** | **1,183** | |
| 104 | + |
| 105 | +*12,795 total examples. No LLM calls — measures compression quality offline.* |
| 106 | + |
| 107 | +**Key metrics:** |
| 108 | +- **Compression** — `1 - output_tokens / input_tokens`. Higher = more compressed. |
| 109 | +- **NIAH** — did the answer substring survive? Binary per example, averaged. |
| 110 | +- **Effective Ratio** — retry-adjusted. NIAH miss means you pay compressed + original (wasted attempt + retry). Negative = worse than no compression. |
| 111 | +- **Cost-of-Pass** — total output tokens / examples with recall >= 0.7 ([arXiv:2504.13359](https://arxiv.org/abs/2504.13359)). Lower = better. |
| 112 | + |
| 113 | +See [`benchmarks/README.md`](benchmarks/README.md) for synthetic scenario results and full methodology. |
| 114 | + |
| 115 | +```bash |
| 116 | +# Run on real datasets (full) |
| 117 | +uv run python benchmarks/run_dataset_eval.py --dataset bfcl |
| 118 | +uv run python benchmarks/run_dataset_eval.py --dataset hotpotqa |
| 119 | +uv run python benchmarks/run_dataset_eval.py # all datasets |
| 120 | + |
| 121 | +# Run synthetic scenarios |
| 122 | +uv run python benchmarks/run_comparison.py |
| 123 | +``` |
| 124 | + |
| 125 | +## Development |
| 126 | + |
| 127 | +```bash |
| 128 | +# Install with dev deps |
| 129 | +uv sync --extra dev |
| 130 | + |
| 131 | +# Run tests |
| 132 | +uv run pytest |
| 133 | + |
| 134 | +# Lint |
| 135 | +uv run ruff check src/ tests/ |
| 136 | + |
| 137 | +# Run single transform test |
| 138 | +uv run pytest tests/test_toon.py -v |
| 139 | +``` |
| 140 | + |
| 141 | +## Architecture |
| 142 | + |
| 143 | +``` |
| 144 | +src/kompact/ |
| 145 | +├── proxy/server.py # FastAPI proxy (Anthropic + OpenAI) |
| 146 | +├── parser/messages.py # Provider format ↔ internal types |
| 147 | +├── transforms/ |
| 148 | +│ ├── pipeline.py # Orchestration + adaptive scaling |
| 149 | +│ ├── toon.py # JSON array → tabular (TOON format) |
| 150 | +│ ├── json_crusher.py # Statistical JSON compression |
| 151 | +│ ├── code_compressor.py # Code → skeleton extraction |
| 152 | +│ ├── log_compressor.py # Log deduplication |
| 153 | +│ ├── content_compressor.py # Extractive text compression (TF-IDF) |
| 154 | +│ ├── schema_optimizer.py # TF-IDF tool selection |
| 155 | +│ ├── observation_masker.py # History management |
| 156 | +│ └── cache_aligner.py # Prefix cache optimization |
| 157 | +├── cache/store.py # Compression store + artifact index |
| 158 | +├── config.py # Per-transform configuration |
| 159 | +├── types.py # Core data models |
| 160 | +└── metrics/tracker.py # Per-request metrics |
| 161 | +``` |
| 162 | + |
| 163 | +## License |
| 164 | + |
| 165 | +MIT |
0 commit comments