Skip to content

screenpipe/opf-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

opf-rs

Pure-Rust runtime for the screenpipe/pii-redactor PII redaction model. Local, fast, no Python in the loop.

Status: v0 working end-to-end. Numerical parity vs PyTorch reference verified layer-by-layer. Tracking issue: TBD.

Why

The screenpipe-pii-redactor model is a 1.4 B-parameter mixture-of-experts NER tagger fine-tuned for the kinds of text an AI agent reads off a desktop — accessibility-tree nodes, OCR'd window chrome, computer-use traces. Upstream inference runs in PyTorch under the opf Python package. That's a heavy dependency, ships triton kernels that don't build on Mac, and bundles a 4-GB CUDA install on Linux even when you don't want it.

opf-rs re-implements the model forward pass directly in candle — pure Rust, ~30 MB binary, Metal acceleration on Apple Silicon, native CPU on Windows + Linux. Same weights, same outputs, no Python.

What works today

  • ✅ candle Transformer port (RMSNorm, RoPE w/ NTK scaling, GQA attention with sinks + bidirectional band mask, MoE w/ topk routing, NER head)
  • ✅ BIOES decode + lenient span merge (matches upstream labels_to_spans)
  • ✅ o200k_base tokenizer with per-token UTF-8 byte offsets
  • ✅ Numerical-parity tests vs PyTorch reference (max|Δ| ≤ 1e-4 per layer)
  • ✅ End-to-end: Redactor::redact(text) -> RedactionOutput with category labels + byte-range spans + per-span confidence

Parity numbers (8-layer Transformer, fp32 CPU)

| Layer | max |Δ| vs PyTorch | |---|---:| | RMSNorm (attn / mlp) | 1.4e-6 / 4.8e-7 | | RoPE (q / k) | 5.2e-6 / 1.5e-5 | | AttentionBlock | 1.8e-4 | | MoeBlock | 6.1e-5 | | Full Transformer logits | 3.2e-5 | | argmax label agreement | 100% (64/64 tokens) |

Latency (M-series MacBook, release; window title, ~10 tokens)

Backend opf-rs p50 PyTorch reference p50
Mac CPU 73.9 ms 70 ms
Mac Metal 41.2 ms 40 ms (MPS)

opf-rs matches PyTorch within measurement noise on both backends, in pure Rust, no Python. (Reference numbers from screenpipe-pii-redactor-runtime/results/macos_m1_max_2026-05-01.md.) Reproduce with cargo run --release --example bench_devices.

Long-form (~50 tokens, criterion): 295.5 ms via Device::best(). Same surface in PyTorch CPU is 614 ms — about 2× faster.

Roadmap

  • Quality bench against the private corpus in screenpipe-pii-bench (results not committed)
  • Wire into screenpipe-redact to replace the ONNX stub (sketch below)
  • Metal device path — works; Device::best() selects Metal on Apple Silicon
  • Real batched forward (one padded forward across N inputs)
  • Redactor::from_huggingface — auto-download the checkpoint via HF hub

Integrating with screenpipe-redact

The screenpipe-redact crate in the screenpipe monorepo currently has a TODO stub at crates/screenpipe-redact/src/adapters/onnx.rs. To wire opf-rs in, replace the stub body with an OpfError-to-RedactError shim:

// pseudo: crates/screenpipe-redact/src/adapters/opf.rs
use opf::{Redactor as OpfRedactor, Device};

pub struct OpfAdapter {
    inner: OpfRedactor,
}

impl OpfAdapter {
    pub fn load(model_dir: &Path) -> Result<Self, RedactError> {
        let inner = OpfRedactor::from_dir(model_dir, Device::best())
            .map_err(|e| RedactError::Unavailable(e.to_string()))?;
        Ok(Self { inner })
    }
}

#[async_trait]
impl Redactor for OpfAdapter {
    fn name(&self) -> &str { "opf-rs" }
    fn version(&self) -> u32 { 3 }  // OPF v3 fine-tune

    async fn redact_batch(&self, texts: &[String]) -> Result<Vec<RedactionOutput>, RedactError> {
        // tokio::task::block_in_place — inference is sync CPU/Metal work.
        tokio::task::block_in_place(|| {
            texts.iter().map(|t| {
                let out = self.inner.redact(t).map_err(map_err)?;
                Ok(RedactionOutput {
                    input: out.input,
                    redacted: out.redacted,
                    spans: out.spans.into_iter().map(map_span).collect(),
                })
            }).collect()
        })
    }
}

The integration is intentionally NOT applied yet — it should land alongside the quality benchmark that gates re-enabling the AI PII worker spawn (currently disabled in v2.4.161+ due to the rfdetr_v8 incident). See apps/screenpipe-app-tauri/src-tauri/src/server_core.rs in the screenpipe monorepo for the full rationale.

Backends

Platform Default GPU? Install needed
macOS (Apple Silicon) Metal ✅ via system Metal none
Windows x86_64 CPU — (no DirectML in candle) none
Linux x86_64 CPU — (no Vulkan in candle, CUDA excluded by design) none

The async reconciliation worker in screenpipe-redact is off the capture hot path, so CPU latency is acceptable on Win/Linux. If GPU on Win/Linux becomes a real need, the path is to add a candle Vulkan backend or graduate to ggml; not v0 work.

Usage (when ready)

use opf::{Redactor, Device};

let r = Redactor::from_huggingface(
    "screenpipe/pii-redactor",
    Device::best(), // Metal on Mac, CPU elsewhere
)?;
let out = r.redact("Welcome | Marcus Chen — Confluence")?;
for span in out.spans {
    println!("{:?} -> {}", span.label, &out.input[span.byte_range()]);
}
// Person -> Marcus Chen

Model weights

The fine-tuned weights live at huggingface.co/screenpipe/pii-redactor under CC BY-NC 4.0 (commercial licensing via louis@screenpi.pe). This crate does NOT redistribute the weights — Redactor::from_huggingface fetches them via the HF hub on first run.

License

The Rust source in this repository is licensed CC BY-NC 4.0; see LICENSE and NOTICE.

For commercial licensing — production deployment, redistribution, SaaS / API embedding, custom integration — contact louis@screenpi.pe.

About

Pure-Rust runtime for the screenpipe/pii-redactor (OPF v3) computer use traces PII redaction model. Local, fast, no Python in the loop.

Resources

License

Stars

Watchers

Forks

Contributors