KRLabsOrg
diff --git a/‎README.md‎
Lines changed: 46 additions & 8 deletions b/‎README.md‎
Lines changed: 46 additions & 8 deletions
diff --git a/‎configs/default.yaml‎
Lines changed: 10 additions & 1 deletion b/‎configs/default.yaml‎
Lines changed: 10 additions & 1 deletion
diff --git a/‎docs/api/encoder.md‎
Lines changed: 17 additions & 0 deletions b/‎docs/api/encoder.md‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎docs/getting-started/configuration.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/getting-started/configuration.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/getting-started/installation.md‎
Lines changed: 21 additions & 7 deletions b/‎docs/getting-started/installation.md‎
Lines changed: 21 additions & 7 deletions
diff --git a/‎docs/getting-started/quickstart.md‎
Lines changed: 8 additions & 6 deletions b/‎docs/getting-started/quickstart.md‎
Lines changed: 8 additions & 6 deletions
@@ -15,7 +15,12 @@ Squeeze verbose LLM agent tool output down to only the relevant lines.
 
 LLM coding agents waste **80-95% of context tokens** on irrelevant tool output. When an agent reads a 500-line file to find one function, or runs `git log` to find a specific commit, most of the output is noise.
 
-Squeez trains a small (2-3B) generative model to identify and extract only the lines that matter for the task at hand — compressing tool output by ~86% on average.
+Squeez trains small models to identify and extract only the lines that matter for the task at hand — compressing tool output by ~86% on average.
+
+Two approaches are available:
+
+- **Generative** (Qwen 3.5 2B + LoRA) — high-quality extraction via JSON generation
+- **Encoder** (mmBERT 307M) — fast line-level binary classification, sliding window over long outputs
 
 ## Example
 
@@ -133,12 +138,18 @@ $ git log --oneline -25 | squeez "find the commit that changed the authenticatio
 pip install squeez
 ```
 
-For local model training, use the pinned stack in [requirements-train.txt](/Users/adamkovacs/projects/squeez/requirements-train.txt):
+For generative model training (Qwen + LoRA):
 
 ```bash
 pip install -r requirements-train.txt
 ```
 
+For encoder model training (mmBERT):
+
+```bash
+pip install -r requirements-encoder.txt
+```
+
 ## Quick Start
 
 ### CLI
@@ -162,9 +173,12 @@ from squeez.inference.extractor import ToolOutputExtractor
 # Load model from config/env
 extractor = ToolOutputExtractor()
 
-# Or load model locally
+# Or load a generative model locally
 extractor = ToolOutputExtractor(model_path="./output/squeez_qwen")
 
+# Or load an encoder model (auto-detected from config.json)
+extractor = ToolOutputExtractor(model_path="./output/squeez_encoder")
+
 # Or connect to a server explicitly
 extractor = ToolOutputExtractor(base_url="http://localhost:8000/v1", model_name="squeez")
 
@@ -175,15 +189,15 @@ filtered = extractor.extract(
 print(filtered)  # Only the relevant lines
 ```
 
-The model returns JSON: `{"relevant_lines": ["line1", "line2", ...]}` and the `extract()` method joins them into filtered text.
+Both model types use the same `extract()` API. The generative model returns JSON (`{"relevant_lines": [...]}`), the encoder classifies each line directly. Both return filtered text.
 
 ### Configuration
 
 Backend is resolved in order: CLI args > env vars > config file (`squeez.yaml` or `configs/default.yaml`).
 
 ```yaml
 # squeez.yaml
-backend: "transformers"  # optional preference
+backend: null  # auto-detect from model; or "transformers", "vllm", "encoder"
 local_model_path: "./output/squeez_qwen"
 # server_url: "https://api.groq.com/openai/v1"
 # server_model: "squeez"
@@ -235,24 +249,48 @@ python scripts/download_data.py
 
 This pulls the [SWE-bench tool output dataset](https://huggingface.co/datasets/KRLabsOrg/tool-output-extraction-swebench) (7,148 train + 436 eval samples) from HuggingFace.
 
-### 2. Train with LoRA
+### 2a. Train generative model (Qwen + LoRA)
 
 ```bash
 squeez train \
     --train-file data/train.jsonl \
-    --eval-file data/eval.jsonl
+    --eval-file data/dev.jsonl
 ```
 
 Default: Qwen 3.5 2B with LoRA (r=16, alpha=32). See `configs/default.yaml` for all hyperparameters.
 
+### 2b. Train encoder model (mmBERT)
+
+```bash
+# Prepare encoder-format data from the ChatML training data
+python scripts/prepare_encoder_data.py
+
+# Train the encoder
+python -m squeez.encoder.train \
+    --train-file data/encoder_train.jsonl \
+    --eval-file data/encoder_dev.jsonl \
+    --base-model jhu-clsp/mmBERT-base \
+    --output-dir output/squeez_encoder
+```
+
+The encoder is a 307M parameter mmBERT with a token classification head. It classifies each line as relevant/irrelevant and uses sliding windows to handle outputs longer than the 8K context.
+
 ### 3. Evaluate
 
 ```bash
+# Generative model
 squeez eval \
     --extractor-model output/squeez_qwen \
-    --eval-file data/eval.jsonl
+    --eval-file data/test.jsonl
+
+# Encoder model
+python -m squeez.encoder.evaluate \
+    --model-path output/squeez_encoder \
+    --eval-file data/encoder_test.jsonl
 ```
 
+Both produce the same metrics format (span F1, ROUGE-L, compression ratio) for direct comparison.
+
 ## Dataset
 
 Training data: [KRLabsOrg/tool-output-extraction-swebench](https://huggingface.co/datasets/KRLabsOrg/tool-output-extraction-swebench)
 
@@ -1,5 +1,5 @@
 # Inference
-backend: "transformers"  # optional preference: "transformers" or "vllm"
+backend: null  # auto-detect from model; set to "transformers", "vllm", or "encoder"
 local_model_path: "./output/squeez_qwen"
 server_url: null  # OpenAI-compatible API, e.g. "http://localhost:8000/v1"
 server_model: null  # optional remote model id when using a server
@@ -23,6 +23,15 @@ lora_r: 16
 lora_alpha: 32
 lora_dropout: 0
 
+# Encoder training hyperparameters
+encoder_base_model: "jhu-clsp/mmBERT-base"
+encoder_max_length: 8192
+encoder_batch_size: 16
+encoder_learning_rate: 2.0e-5
+encoder_num_epochs: 5
+encoder_warmup_ratio: 0.1
+encoder_output_dir: "./output/squeez_encoder"
+
 # Data generation
 distillation_model: "gpt-5.4"
 max_tool_output_lines: 500
 
@@ -0,0 +1,17 @@
+# Encoder
+
+The encoder line classifier for fast, discriminative tool output extraction.
+
+## Model
+
+::: squeez.encoder.model.SqueezEncoderConfig
+
+::: squeez.encoder.model.SqueezEncoderForLineClassification
+
+## Dataset
+
+::: squeez.encoder.dataset.LineClassificationDataset
+
+## Evaluation
+
+::: squeez.encoder.evaluate.evaluate_encoder
@@ -7,7 +7,7 @@ Squeez resolves backend configuration in order: **CLI args > env vars > config f
 Create a `squeez.yaml` in your project root, or use `configs/default.yaml`:
 
 ```yaml
-backend: "transformers"  # optional preference
+backend: null  # auto-detect from model; or "transformers", "vllm", "encoder"
 local_model_path: "./output/squeez_qwen"
 
 # Or remote API backend
@@ -29,7 +29,7 @@ Config files are searched in order:
 | `SQUEEZ_SERVER_URL` | OpenAI-compatible API URL |
 | `SQUEEZ_SERVER_MODEL` | Remote model ID on that server |
 | `SQUEEZ_API_KEY` | API key (also checks `OPENAI_API_KEY`) |
-| `SQUEEZ_BACKEND` | Optional backend preference: `transformers` or `vllm` |
+| `SQUEEZ_BACKEND` | Optional backend preference: `transformers`, `vllm`, or `encoder` |
 
 ```bash
 export SQUEEZ_LOCAL_MODEL=./output/squeez_qwen
 
@@ -22,13 +22,27 @@ pip install -e ".[dev]"
 
 This adds `pytest` and `ruff` for testing and linting.
 
+## Training dependencies
+
+For generative model training (Qwen + LoRA):
+
+```bash
+pip install -r requirements-train.txt
+```
+
+For encoder model training (mmBERT):
+
+```bash
+pip install -r requirements-encoder.txt
+```
+
 ## Dependencies
 
-Squeez requires Python 3.10+ and depends on:
+Squeez requires Python 3.10+. Base install only needs `openai` and `pyyaml`.
+
+Optional dependency groups:
 
-- `torch` — model inference and training
-- `transformers` — model loading and tokenization
-- `peft` — LoRA adapters
-- `datasets` — HuggingFace dataset loading
-- `openai` — vLLM/API backend
-- `pyyaml` — config file parsing
+- `pip install squeez[local]` — `torch`, `transformers`, `peft` for local inference
+- `pip install squeez[encoder]` — `torch`, `transformers`, `datasets` for encoder training
+- `pip install squeez[train]` — adds `trl`, `unsloth` for generative training
+- `pip install squeez[dev]` — adds `pytest`, `ruff`
@@ -29,9 +29,12 @@ from squeez.inference.extractor import ToolOutputExtractor
 # Load backend from config or env
 extractor = ToolOutputExtractor()
 
-# Or load model locally
+# Or load a generative model locally
 extractor = ToolOutputExtractor(model_path="./output/squeez_qwen")
 
+# Or load an encoder model (auto-detected from config.json)
+extractor = ToolOutputExtractor(model_path="./output/squeez_encoder")
+
 # Or connect to a server explicitly
 extractor = ToolOutputExtractor(base_url="http://localhost:8000/v1", model_name="squeez")
 
@@ -44,10 +47,9 @@ print(filtered)  # Only the relevant lines
 
 ## How it works
 
-The model receives the task description and raw tool output, then returns a JSON object:
+Squeez supports two model types behind the same `extract()` API:
 
-```json
-{"relevant_lines": ["class CsrfViewMiddleware(MiddlewareMixin):", "    def _check_referer(self, request):", ...]}
-```
+- **Generative** (default): The model returns a JSON object `{"relevant_lines": [...]}` and `extract()` joins them into text.
+- **Encoder**: A token classifier labels each line as relevant/irrelevant. Uses sliding windows for outputs longer than the context window.
 
-The `extract()` method parses this JSON and joins the lines into filtered text.
+The backend is auto-detected from the model's `config.json`. You can also set it explicitly via `SQUEEZ_BACKEND=encoder` or `backend: "encoder"` in config.