opf-rs

Pure-Rust runtime for the screenpipe/pii-redactor PII redaction model. Local, fast, no Python in the loop.

Status: v0 working end-to-end. Numerical parity vs PyTorch reference verified layer-by-layer. Tracking issue: TBD.

Why

The screenpipe-pii-redactor model is a 1.4 B-parameter mixture-of-experts NER tagger fine-tuned for the kinds of text an AI agent reads off a desktop — accessibility-tree nodes, OCR'd window chrome, computer-use traces. Upstream inference runs in PyTorch under the opf Python package. That's a heavy dependency, ships triton kernels that don't build on Mac, and bundles a 4-GB CUDA install on Linux even when you don't want it.

opf-rs re-implements the model forward pass directly in candle — pure Rust, ~30 MB binary, Metal acceleration on Apple Silicon, native CPU on Windows + Linux. Same weights, same outputs, no Python.

What works today

✅ candle Transformer port (RMSNorm, RoPE w/ NTK scaling, GQA attention with sinks + bidirectional band mask, MoE w/ topk routing, NER head)
✅ BIOES decode + lenient span merge (matches upstream labels_to_spans)
✅ o200k_base tokenizer with per-token UTF-8 byte offsets
✅ Numerical-parity tests vs PyTorch reference (max|Δ| ≤ 1e-4 per layer)
✅ End-to-end: Redactor::redact(text) -> RedactionOutput with category labels + byte-range spans + per-span confidence

Parity numbers (8-layer Transformer, fp32 CPU)

| Layer | max |Δ| vs PyTorch | |---|---:| | RMSNorm (attn / mlp) | 1.4e-6 / 4.8e-7 | | RoPE (q / k) | 5.2e-6 / 1.5e-5 | | AttentionBlock | 1.8e-4 | | MoeBlock | 6.1e-5 | | Full Transformer logits | 3.2e-5 | | argmax label agreement | 100% (64/64 tokens) |

Latency (M-series MacBook, release; window title, ~10 tokens)

Backend	opf-rs p50	PyTorch reference p50
Mac CPU	73.9 ms	70 ms
Mac Metal	41.2 ms	40 ms (MPS)

opf-rs matches PyTorch within measurement noise on both backends, in pure Rust, no Python. (Reference numbers from screenpipe-pii-redactor-runtime/results/macos_m1_max_2026-05-01.md.) Reproduce with cargo run --release --example bench_devices.

Long-form (~50 tokens, criterion): 295.5 ms via Device::best(). Same surface in PyTorch CPU is 614 ms — about 2× faster.

Roadmap

Quality bench against the private corpus in screenpipe-pii-bench (results not committed)
Wire into screenpipe-redact to replace the ONNX stub (sketch below)
~~Metal device path~~ — works; Device::best() selects Metal on Apple Silicon
Real batched forward (one padded forward across N inputs)
Redactor::from_huggingface — auto-download the checkpoint via HF hub

Integrating with `screenpipe-redact`

The screenpipe-redact crate in the screenpipe monorepo currently has a TODO stub at crates/screenpipe-redact/src/adapters/onnx.rs. To wire opf-rs in, replace the stub body with an OpfError-to-RedactError shim:

// pseudo: crates/screenpipe-redact/src/adapters/opf.rs
use opf::{Redactor as OpfRedactor, Device};

pub struct OpfAdapter {
    inner: OpfRedactor,
}

impl OpfAdapter {
    pub fn load(model_dir: &Path) -> Result<Self, RedactError> {
        let inner = OpfRedactor::from_dir(model_dir, Device::best())
            .map_err(|e| RedactError::Unavailable(e.to_string()))?;
        Ok(Self { inner })
    }
}

#[async_trait]
impl Redactor for OpfAdapter {
    fn name(&self) -> &str { "opf-rs" }
    fn version(&self) -> u32 { 3 }  // OPF v3 fine-tune

    async fn redact_batch(&self, texts: &[String]) -> Result<Vec<RedactionOutput>, RedactError> {
        // tokio::task::block_in_place — inference is sync CPU/Metal work.
        tokio::task::block_in_place(|| {
            texts.iter().map(|t| {
                let out = self.inner.redact(t).map_err(map_err)?;
                Ok(RedactionOutput {
                    input: out.input,
                    redacted: out.redacted,
                    spans: out.spans.into_iter().map(map_span).collect(),
                })
            }).collect()
        })
    }
}

The integration is intentionally NOT applied yet — it should land alongside the quality benchmark that gates re-enabling the AI PII worker spawn (currently disabled in v2.4.161+ due to the rfdetr_v8 incident). See apps/screenpipe-app-tauri/src-tauri/src/server_core.rs in the screenpipe monorepo for the full rationale.

Backends

Platform	Default	GPU?	Install needed
macOS (Apple Silicon)	Metal	✅ via system Metal	none
Windows x86_64	CPU	— (no DirectML in candle)	none
Linux x86_64	CPU	— (no Vulkan in candle, CUDA excluded by design)	none

The async reconciliation worker in screenpipe-redact is off the capture hot path, so CPU latency is acceptable on Win/Linux. If GPU on Win/Linux becomes a real need, the path is to add a candle Vulkan backend or graduate to ggml; not v0 work.

Usage (when ready)

use opf::{Redactor, Device};

let r = Redactor::from_huggingface(
    "screenpipe/pii-redactor",
    Device::best(), // Metal on Mac, CPU elsewhere
)?;
let out = r.redact("Welcome | Marcus Chen — Confluence")?;
for span in out.spans {
    println!("{:?} -> {}", span.label, &out.input[span.byte_range()]);
}
// Person -> Marcus Chen

Model weights

The fine-tuned weights live at huggingface.co/screenpipe/pii-redactor under CC BY-NC 4.0 (commercial licensing via louis@screenpi.pe). This crate does NOT redistribute the weights — Redactor::from_huggingface fetches them via the HF hub on first run.

License

The Rust source in this repository is licensed CC BY-NC 4.0; see LICENSE and NOTICE.

For commercial licensing — production deployment, redistribution, SaaS / API embedding, custom integration — contact louis@screenpi.pe.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
benches		benches
examples		examples
src		src
tests		tests
tools		tools
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
LICENSE.upstream-apache2.txt		LICENSE.upstream-apache2.txt
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

opf-rs

Why

What works today

Parity numbers (8-layer Transformer, fp32 CPU)

Latency (M-series MacBook, release; window title, ~10 tokens)

Roadmap

Integrating with `screenpipe-redact`

Backends

Usage (when ready)

Model weights

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

opf-rs

Why

What works today

Parity numbers (8-layer Transformer, fp32 CPU)

Latency (M-series MacBook, release; window title, ~10 tokens)

Roadmap

Integrating with screenpipe-redact

Backends

Usage (when ready)

Model weights

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages

Integrating with `screenpipe-redact`