Pure-Rust runtime for the
screenpipe/pii-redactor
PII redaction model. Local, fast, no Python in the loop.
Status: v0 working end-to-end. Numerical parity vs PyTorch reference verified layer-by-layer. Tracking issue: TBD.
The screenpipe-pii-redactor
model is a 1.4 B-parameter mixture-of-experts NER tagger fine-tuned for
the kinds of text an AI agent reads off a desktop — accessibility-tree
nodes, OCR'd window chrome, computer-use traces. Upstream inference
runs in PyTorch under the opf
Python package. That's a heavy dependency, ships triton kernels that
don't build on Mac, and bundles a 4-GB CUDA install on Linux even when
you don't want it.
opf-rs re-implements the model forward pass directly in
candle — pure Rust, ~30 MB binary,
Metal acceleration on Apple Silicon, native CPU on Windows + Linux. Same
weights, same outputs, no Python.
- ✅ candle Transformer port (RMSNorm, RoPE w/ NTK scaling, GQA attention with sinks + bidirectional band mask, MoE w/ topk routing, NER head)
- ✅ BIOES decode + lenient span merge (matches upstream
labels_to_spans) - ✅ o200k_base tokenizer with per-token UTF-8 byte offsets
- ✅ Numerical-parity tests vs PyTorch reference (max|Δ| ≤ 1e-4 per layer)
- ✅ End-to-end:
Redactor::redact(text) -> RedactionOutputwith category labels + byte-range spans + per-span confidence
| Layer | max |Δ| vs PyTorch | |---|---:| | RMSNorm (attn / mlp) | 1.4e-6 / 4.8e-7 | | RoPE (q / k) | 5.2e-6 / 1.5e-5 | | AttentionBlock | 1.8e-4 | | MoeBlock | 6.1e-5 | | Full Transformer logits | 3.2e-5 | | argmax label agreement | 100% (64/64 tokens) |
| Backend | opf-rs p50 | PyTorch reference p50 |
|---|---|---|
| Mac CPU | 73.9 ms | 70 ms |
| Mac Metal | 41.2 ms | 40 ms (MPS) |
opf-rs matches PyTorch within measurement noise on both backends, in
pure Rust, no Python. (Reference numbers from
screenpipe-pii-redactor-runtime/results/macos_m1_max_2026-05-01.md.)
Reproduce with cargo run --release --example bench_devices.
Long-form (~50 tokens, criterion): 295.5 ms via Device::best(). Same
surface in PyTorch CPU is 614 ms — about 2× faster.
- Quality bench against the private corpus in
screenpipe-pii-bench(results not committed) - Wire into
screenpipe-redactto replace the ONNX stub (sketch below) -
Metal device path— works;Device::best()selects Metal on Apple Silicon - Real batched forward (one padded forward across N inputs)
-
Redactor::from_huggingface— auto-download the checkpoint via HF hub
The screenpipe-redact crate in the screenpipe monorepo currently has
a TODO stub at crates/screenpipe-redact/src/adapters/onnx.rs. To wire
opf-rs in, replace the stub body with an OpfError-to-RedactError
shim:
// pseudo: crates/screenpipe-redact/src/adapters/opf.rs
use opf::{Redactor as OpfRedactor, Device};
pub struct OpfAdapter {
inner: OpfRedactor,
}
impl OpfAdapter {
pub fn load(model_dir: &Path) -> Result<Self, RedactError> {
let inner = OpfRedactor::from_dir(model_dir, Device::best())
.map_err(|e| RedactError::Unavailable(e.to_string()))?;
Ok(Self { inner })
}
}
#[async_trait]
impl Redactor for OpfAdapter {
fn name(&self) -> &str { "opf-rs" }
fn version(&self) -> u32 { 3 } // OPF v3 fine-tune
async fn redact_batch(&self, texts: &[String]) -> Result<Vec<RedactionOutput>, RedactError> {
// tokio::task::block_in_place — inference is sync CPU/Metal work.
tokio::task::block_in_place(|| {
texts.iter().map(|t| {
let out = self.inner.redact(t).map_err(map_err)?;
Ok(RedactionOutput {
input: out.input,
redacted: out.redacted,
spans: out.spans.into_iter().map(map_span).collect(),
})
}).collect()
})
}
}The integration is intentionally NOT applied yet — it should land
alongside the quality benchmark that gates re-enabling the AI PII
worker spawn (currently disabled in v2.4.161+ due to the rfdetr_v8
incident). See apps/screenpipe-app-tauri/src-tauri/src/server_core.rs
in the screenpipe monorepo for the full rationale.
| Platform | Default | GPU? | Install needed |
|---|---|---|---|
| macOS (Apple Silicon) | Metal | ✅ via system Metal | none |
| Windows x86_64 | CPU | — (no DirectML in candle) | none |
| Linux x86_64 | CPU | — (no Vulkan in candle, CUDA excluded by design) | none |
The async reconciliation worker in screenpipe-redact is off the
capture hot path, so CPU latency is acceptable on Win/Linux. If GPU on
Win/Linux becomes a real need, the path is to add a candle Vulkan
backend or graduate to ggml; not v0 work.
use opf::{Redactor, Device};
let r = Redactor::from_huggingface(
"screenpipe/pii-redactor",
Device::best(), // Metal on Mac, CPU elsewhere
)?;
let out = r.redact("Welcome | Marcus Chen — Confluence")?;
for span in out.spans {
println!("{:?} -> {}", span.label, &out.input[span.byte_range()]);
}
// Person -> Marcus ChenThe fine-tuned weights live at
huggingface.co/screenpipe/pii-redactor
under CC BY-NC 4.0 (commercial licensing via louis@screenpi.pe). This
crate does NOT redistribute the weights — Redactor::from_huggingface
fetches them via the HF hub on first run.
The Rust source in this repository is licensed CC BY-NC 4.0; see LICENSE and NOTICE.
For commercial licensing — production deployment, redistribution, SaaS / API embedding, custom integration — contact louis@screenpi.pe.