Skip to content

Commit 352a5ec

Browse files
committed
Better README
1 parent 25571b3 commit 352a5ec

1 file changed

Lines changed: 40 additions & 31 deletions

File tree

README.md

Lines changed: 40 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -5,24 +5,25 @@
55
<br><em>Squeeze out the juice, leave the pulp behind.</em>
66
</p>
77

8-
LLM coding agents waste **80-95% of context tokens** on irrelevant tool output. Squeez extracts only the lines that mattercompressing tool output by ~86% on average.
8+
LLM coding agents waste 80-95% of context tokens on irrelevant tool output. Squeez extracts only the lines that matter, compressing tool output by ~91% while keeping 86% of the relevant information.
99

1010
[![PyPI](https://img.shields.io/pypi/v/squeez)](https://pypi.org/project/squeez/)
11-
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
11+
[![Model](https://img.shields.io/badge/HF-Squeez--2B-yellow.svg)](https://huggingface.co/KRLabsOrg/squeez-2b)
1212
[![Dataset](https://img.shields.io/badge/HF-Dataset-yellow.svg)](https://huggingface.co/datasets/KRLabsOrg/tool-output-extraction-swebench)
13+
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
1314

1415
## How it works
1516

1617
Squeez uses a fine-tuned Qwen 3.5 2B model to read tool output alongside a task description and return only the relevant lines.
1718

18-
### Example filtering test output
19+
### Example: filtering test output
1920

2021
Task: *"Find the test failure related to authentication"*
2122

2223
<table>
2324
<tr>
24-
<th>Before 45 lines, ~1,500 tokens</th>
25-
<th>After 6 lines, ~200 tokens</th>
25+
<th>Before (45 lines, ~1,500 tokens)</th>
26+
<th>After (6 lines, ~200 tokens)</th>
2627
</tr>
2728
<tr>
2829
<td>
@@ -74,7 +75,7 @@ E Expected: new token within 30m window
7475
E Got: rejection after 15m (timeout changed?)
7576
```
7677

77-
**87% compression** — only the failing test and its traceback survive. Passing tests and pytest boilerplate are dropped.
78+
**87% compression.** Only the failing test and its traceback survive.
7879

7980
</td>
8081
</tr>
@@ -119,6 +120,20 @@ $ kubectl describe pod api-server-7d4b | squeez "why is the pod failing"
119120

120121
</details>
121122

123+
## Results
124+
125+
Evaluated on 617 held-out test samples from SWE-bench, across 14 tool types:
126+
127+
| Model | Precision | Recall | F1 | Compression |
128+
|-------|-----------|--------|------|-------------|
129+
| **Squeez-2B** | **0.8043** | **0.8624** | **0.7895** | 0.9150 |
130+
| Qwen 3.5 35B A3B (zero-shot) | 0.7402 | 0.7498 | 0.7000 | 0.9177 |
131+
| Qwen 3.5 2B (untrained) | 0.4154 | 0.5299 | 0.4075 | 0.8197 |
132+
| BM25 (10%) | 0.1277 | 0.2172 | 0.1314 | 0.9036 |
133+
| Random (10%) | 0.0738 | 0.1009 | 0.0697 | 0.9067 |
134+
135+
Squeez-2B (2B params) outperforms a 35B MoE model at zero-shot and is 6x better than BM25 on Span F1. Full results in [RESULTS.md](RESULTS.md).
136+
122137
## Install
123138

124139
```bash
@@ -127,44 +142,38 @@ pip install squeez
127142

128143
## Quick start
129144

130-
### Just works (local inference)
131-
132-
By default, squeez downloads and runs `KRLabsOrg/squeez-qwen3.5-2b` locally:
145+
### With vLLM (recommended)
133146

134147
```bash
148+
# Start the server
149+
pip install vllm
150+
vllm serve KRLabsOrg/squeez-2b --dtype bfloat16 --max-model-len 16384
151+
152+
# Use from squeez CLI
135153
pip install squeez
154+
export SQUEEZ_SERVER_URL=http://localhost:8000/v1
155+
cat output.txt | squeez "find the bug"
136156

137-
cat output.txt | squeez "Find the failing traceback block"
138-
squeez "Fix the CSRF bug" --input-file output.txt
157+
# Or pipe directly
158+
python -m pytest tests/ -v 2>&1 | squeez "find the test failure related to authentication"
139159
```
140160

141-
### With a server (faster, recommended for production)
161+
vLLM keeps the model warm in memory with batched inference and high throughput.
142162

143-
Serve the model with vLLM, Ollama, or any OpenAI-compatible API:
163+
### Local inference (no server)
144164

145165
```bash
146-
vllm serve KRLabsOrg/squeez-qwen3.5-2b --max-model-len 32768
147-
```
148-
149-
Then point squeez at it:
150-
151-
```bash
152-
export SQUEEZ_SERVER_URL=http://localhost:8000/v1
153-
export SQUEEZ_SERVER_MODEL=KRLabsOrg/squeez-qwen3.5-2b
166+
pip install squeez
154167

155-
squeez "Find the bug" --input-file output.txt
168+
cat output.txt | squeez "Find the failing traceback block"
169+
squeez "Fix the CSRF bug" --input-file output.txt
156170
```
157171

158-
Or via CLI flags:
172+
> **Note:** Local mode loads the model on every call. Fine for one-off use, but for repeated calls (e.g. an agent piping every tool through squeez), use vLLM.
159173
160-
```bash
161-
squeez "Find the bug" \
162-
--server-url http://localhost:8000/v1 \
163-
--server-model KRLabsOrg/squeez-qwen3.5-2b \
164-
--input-file output.txt
165-
```
174+
### Any OpenAI-compatible API
166175

167-
Works with any OpenAI-compatible API (Groq, Together, etc.) — just set the URL, model name, and API key:
176+
Works with Groq, Together, or any OpenAI-compatible server. Set the URL, model name, and API key:
168177

169178
```bash
170179
export SQUEEZ_SERVER_URL=https://api.groq.com/openai/v1
@@ -177,7 +186,7 @@ export SQUEEZ_API_KEY=gsk_...
177186
```python
178187
from squeez.inference.extractor import ToolOutputExtractor
179188

180-
# Default: loads KRLabsOrg/squeez-qwen3.5-2b locally
189+
# Default: loads KRLabsOrg/squeez-2b locally
181190
extractor = ToolOutputExtractor()
182191

183192
# Or connect to a server

0 commit comments

Comments
 (0)