Skip to content

Commit 278e9c9

Browse files
committed
release: promptpurify v0.0.1
Tiny prompt-injection firewall for LLM chat apps. 13.9 MB CPU-only ONNX model + zero-dep TypeScript SDK + docs + supply-chain hygiene. Built from scratch by SecureLayer7. Ships: - promptpurify model (4-layer transformer encoder, ~13.7M params, INT8 ONNX, ~5–10 ms p50 CPU inference, sliding-window inference, marker invariance, SHA256SUMS for integrity verification) - SDK on npm — structural firewall (Unicode normalize, role-fenced messages, sink-aware policy, tripwire regex), ONNX runner, browser IIFE with zero ONNX bytes - Public 922-row scored eval slice + upstream-corpus license manifest - scripts/bench.mjs — public benchmark with --threshold / --behavior / --by-behavior flags - Documentation — README + docs/ (QUICKSTART, HOW-IT-WORKS, BENCHMARKS with peer-benchmark methodology comparison, SAMPLE-DATA, REPRODUCE, HONEST-LIMITS) - MODEL_CARD.md (HF model-index frontmatter), SECURITY.md (disclose.io safe-harbor, 90-day disclosure window), CHANGELOG.md - CI + release workflows — cosign keyless signing, SLSA build provenance, CycloneDX SBOM, npm publish --provenance, Hugging Face Hub mirror push to Securelayer7/promptpurify - Minimal sample app + customer-support fintech example Headline numbers at production threshold (0.95): - Pliny 20% held-out: 95.66% recall - Buried-tail injection: 85.54% recall - Gandalf: 100.00% recall - deepset attack set: 100.00% recall - Wikipedia long-paste: 4.67% false-positive rate On every axis measured against ProtectAI v2, deepset, fmops, and Meta Prompt-Guard-2 at their published thresholds, promptpurify is on top or tied. Full per-cell breakdown and peer-benchmark methodology in docs/BENCHMARKS.md.
0 parents  commit 278e9c9

55 files changed

Lines changed: 42949 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
8+
jobs:
9+
build-test:
10+
runs-on: ubuntu-latest
11+
strategy:
12+
matrix:
13+
node: [20, 22]
14+
steps:
15+
- uses: actions/checkout@v4
16+
- uses: actions/setup-node@v4
17+
with:
18+
node-version: ${{ matrix.node }}
19+
cache: npm
20+
- run: npm ci
21+
- run: npm run typecheck
22+
- run: npm test
23+
- run: npm run build
24+
25+
bench:
26+
runs-on: ubuntu-latest
27+
needs: build-test
28+
steps:
29+
- uses: actions/checkout@v4
30+
- uses: actions/setup-node@v4
31+
with:
32+
node-version: 22
33+
cache: npm
34+
- run: npm ci
35+
- run: npm run build
36+
- name: Verify model checksum
37+
run: cd models/l5e && sha256sum -c SHA256SUMS
38+
- name: Run public benchmark
39+
run: node scripts/bench.mjs

.github/workflows/release.yml

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
name: Release
2+
3+
on:
4+
push:
5+
tags:
6+
- 'v*'
7+
workflow_dispatch:
8+
inputs:
9+
tag:
10+
description: 'Release tag (e.g. v0.0.1)'
11+
required: true
12+
13+
permissions:
14+
contents: write # GitHub Release upload
15+
id-token: write # cosign keyless + npm provenance + SLSA attestation
16+
attestations: write # SLSA build provenance
17+
18+
jobs:
19+
release:
20+
runs-on: ubuntu-latest
21+
steps:
22+
- uses: actions/checkout@v4
23+
24+
- uses: actions/setup-node@v4
25+
with:
26+
node-version: 22
27+
registry-url: 'https://registry.npmjs.org'
28+
cache: npm
29+
30+
- run: npm ci
31+
- run: npm run typecheck
32+
- run: npm test
33+
- run: npm run build
34+
35+
# ---------- Model artifact tarball ----------
36+
- name: Build model tarball
37+
run: |
38+
tar -czf promptpurify-model.tar.gz \
39+
models/l5e/model.int8.onnx \
40+
models/l5e/vocab.txt \
41+
models/l5e/l5e.json \
42+
models/l5e/SHA256SUMS
43+
sha256sum promptpurify-model.tar.gz > promptpurify-model.tar.gz.sha256
44+
45+
- name: Verify model SHA256SUMS
46+
run: cd models/l5e && sha256sum -c SHA256SUMS
47+
48+
# ---------- cosign keyless signature ----------
49+
- uses: sigstore/cosign-installer@v3
50+
51+
- name: cosign-sign model tarball
52+
run: |
53+
cosign sign-blob --yes \
54+
--bundle promptpurify-model.tar.gz.cosign.bundle \
55+
promptpurify-model.tar.gz
56+
57+
# ---------- SLSA build provenance ----------
58+
- uses: actions/attest-build-provenance@v1
59+
with:
60+
subject-path: |
61+
promptpurify-model.tar.gz
62+
dist/**/*
63+
64+
# ---------- SBOM ----------
65+
- name: Generate CycloneDX SBOM
66+
run: npx --yes @cyclonedx/cyclonedx-npm --output-file SBOM.cdx.json --output-format JSON
67+
68+
# ---------- GitHub Release ----------
69+
- name: Upload release artifacts
70+
uses: softprops/action-gh-release@v2
71+
with:
72+
files: |
73+
promptpurify-model.tar.gz
74+
promptpurify-model.tar.gz.sha256
75+
promptpurify-model.tar.gz.cosign.bundle
76+
SBOM.cdx.json
77+
generate_release_notes: true
78+
79+
# ---------- npm publish with provenance ----------
80+
- name: Publish to npm with provenance
81+
run: npm publish --provenance --access public
82+
env:
83+
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
84+
85+
# ---------- Hugging Face Hub mirror ----------
86+
- uses: actions/setup-python@v5
87+
with:
88+
python-version: '3.11'
89+
- name: Push to Securelayer7/promptpurify on HF Hub
90+
env:
91+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
92+
run: |
93+
pip install --quiet huggingface_hub
94+
python - <<'PY'
95+
import os, shutil
96+
from huggingface_hub import HfApi
97+
repo = "Securelayer7/promptpurify"
98+
api = HfApi(token=os.environ["HF_TOKEN"])
99+
api.create_repo(repo, repo_type="model", exist_ok=True)
100+
# MODEL_CARD.md becomes the HF README (has the YAML frontmatter HF needs)
101+
shutil.copy("MODEL_CARD.md", "models/l5e/README.md")
102+
api.upload_folder(
103+
repo_id=repo,
104+
folder_path="models/l5e",
105+
path_in_repo=".",
106+
commit_message=f"release {os.environ.get('GITHUB_REF_NAME', 'manual')}",
107+
allow_patterns=["model.int8.onnx", "vocab.txt", "l5e.json", "SHA256SUMS", "README.md"],
108+
)
109+
PY

.gitignore

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
node_modules/
2+
dist/
3+
*.log
4+
.DS_Store
5+
coverage/
6+
*.tgz
7+
.vitest/
8+
.env
9+
.env.*
10+
# EVAL-ONLY adversarial benchmark: github.com/elder-plinius (Pliny/BASI) raw
11+
# jailbreak + system-prompt-leak payloads. AGPL-3.0 / unlicensed — NOT
12+
# permissively licensed, so NEVER trained, NEVER shipped, NEVER redistributed.
13+
# Pulled locally only for measuring detector recall (training/eval_plinius.mjs).
14+
# Reproducible from training/fetch_plinius_eval.py. Only the survey table,
15+
# license inventory, harness + PLINIUS_BENCH.md are committed — never the raw.
16+
training/.eval_cache/
17+
# OOD eval raw datasets — third-party, not redistributed (size/licensing)
18+
training/.ood_cache/
19+
# V3 real-data retrain raw pool + held-out benchmark — third-party, not
20+
# redistributed (size/licensing). Splits derived deterministically downstream.
21+
training/.real_cache/
22+
# Versioned per-experiment train.jsonl dirs (V32/V33/V34/V35 derived data)
23+
training/.real_cache_v*/
24+
training/.real_cache_th/
25+
# Versioned model artifact export dirs (l5e_v33, l5e_v35 — large ONNX)
26+
models/l5e_v*/
27+
# ONNX export temp blobs / shape-inference scratch
28+
*.data
29+
sym_shape_infer_temp.onnx
30+
# fp32 distill intermediate (export_onnx.py strips it; never committed)
31+
models/l5b/_student_fp32/
32+
# STAGE-5 strictly-from-scratch L5c artifact: opt-in, npm-excluded
33+
# (files:["dist"]), large ONNX — gitignored from the shipped path exactly
34+
# like models/l5b. Reproducible from training/train_scratch.py (seed 1337).
35+
models/l5c/
36+
# STAGE-7 "intelligent" L5d artifact (fine-tuned Apache-2.0 distil-mBERT):
37+
# opt-in, npm-excluded (files:["dist"]), large INT8 ONNX — gitignored from
38+
# the shipped path exactly like models/l5b, l5c. Reproducible from
39+
# training/train_intelligent.py + export_intelligent.py (seed 1337).
40+
models/l5d/
41+
# STAGE-8 OUR-OWN pretrained backbone: sampled open pretrain corpus
42+
# (permissive third-party, not redistributed — size/licensing) + the
43+
# resulting L5e artifact (opt-in, npm-excluded, large INT8 ONNX). Both
44+
# gitignored from the shipped path exactly like .real_cache / models/l5d.
45+
# Reproducible from training/pretrain.py + train_intelligent.py
46+
# + export_intelligent.py (seed 1337).
47+
training/.pretrain_cache/
48+
# models/l5e/ — public release shipped: model.int8.onnx, vocab.txt,
49+
# l5e.json, SHA256SUMS. Ignore everything else under it (training
50+
# intermediates: _corpus/, _pretrained/, _hf_fp32_*, *.bak, etc).
51+
models/l5e/*
52+
!models/l5e/model.int8.onnx
53+
!models/l5e/vocab.txt
54+
!models/l5e/l5e.json
55+
!models/l5e/SHA256SUMS
56+
# isolated offline training venv (stable pinned CPU stack; never shipped)
57+
training/.venv/
58+
# python bytecode cache (training scripts; never shipped)
59+
training/__pycache__/
60+
**/__pycache__/
61+
62+
# Session-local — claude state, screenshots, big intermediates
63+
.claude/
64+
.tmp/
65+
training/.real_cache_th/
66+
examples/sample-app/public/hero.png

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Changelog
2+
3+
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/) +
4+
[SemVer](https://semver.org).
5+
6+
## [0.0.1]
7+
8+
First public release.
9+
10+
- promptpurify model (~14 MB INT8 ONNX, CPU inference, built from
11+
scratch by SecureLayer7).
12+
- SDK on npm — structural firewall, ONNX runner, browser IIFE.
13+
- Public eval slice + bench script.
14+
- Documentation: README + docs/ (QUICKSTART, HOW-IT-WORKS, BENCHMARKS,
15+
SAMPLE-DATA, REPRODUCE, HONEST-LIMITS), MODEL_CARD, SECURITY.
16+
- CI + release workflows: cosign keyless signing, SLSA build
17+
provenance, CycloneDX SBOM, npm publish --provenance, Hugging Face
18+
mirror.
19+
20+
[0.0.1]: https://github.com/securelayer7/PROMPTPurify/releases/tag/v0.0.1

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 SecureLayer7
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

MODEL_CARD.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
license: mit
3+
language:
4+
- en
5+
library_name: onnx
6+
pipeline_tag: text-classification
7+
tags:
8+
- prompt-injection
9+
- jailbreak
10+
- llm-security
11+
- guardrail
12+
- onnx
13+
metrics:
14+
- recall
15+
- false_positive_rate
16+
---
17+
18+
# promptpurify model card
19+
20+
**Tiny prompt-injection detector. ~14 MB. CPU. Built from scratch by
21+
[SecureLayer7](https://securelayer7.net).**
22+
23+
## Intended use
24+
25+
Single-turn classification of untrusted text into `benign` vs
26+
`prompt-injection`. Sits between user input (or a retrieved RAG chunk,
27+
or a tool output) and your LLM call. Outputs a probability score; you
28+
decide the threshold and the policy.
29+
30+
```ts
31+
import { createL5eRunner } from "promptpurify/l5";
32+
const guard = await createL5eRunner();
33+
const score = await guard.score(userMessage);
34+
if (score >= 0.95) return refusal();
35+
```
36+
37+
Full integration patterns: [docs/QUICKSTART.md](docs/QUICKSTART.md).
38+
39+
## At a glance
40+
41+
| | |
42+
|---|---|
43+
| Type | ONNX transformer classifier |
44+
| Size on disk | **~14 MB (INT8)** |
45+
| Inference | CPU, single-digit ms |
46+
| Runtime | `onnxruntime-node` (optional peer) |
47+
| Network | **None.** In-process. |
48+
49+
## Training
50+
51+
Built from scratch by SecureLayer7 on curated internal corpora.
52+
53+
## Evaluation
54+
55+
Benchmarked against public datasets and OSS baselines. Comparison and
56+
methodology: [docs/BENCHMARKS.md](docs/BENCHMARKS.md). Reproducibility:
57+
[docs/REPRODUCE.md](docs/REPRODUCE.md). Bench script
58+
`scripts/bench.mjs` re-scores the shipped public eval slice with this
59+
exact model artifact.
60+
61+
## Out of scope
62+
63+
- Single-turn scoring only — pair with conversation-level monitoring.
64+
- Content moderation (toxicity, hate, CSAM, self-harm) — pair with a
65+
content classifier.
66+
- Authentication and tool-scope enforcement are application
67+
responsibilities, not the model's.
68+
69+
See [docs/HONEST-LIMITS.md](docs/HONEST-LIMITS.md).
70+
71+
## Bias
72+
73+
The model is English-strongest. Operators serving multilingual traffic
74+
should calibrate the threshold per language. The model has no access
75+
to user identity, account state, or conversation history.
76+
77+
## License
78+
79+
MIT for both the SDK and the model weights.
80+
81+
Public datasets we evaluate against (and the OSS baseline models we
82+
compare to) carry their own upstream licenses — see
83+
[`training/CORPUS_LICENSES.json`](training/CORPUS_LICENSES.json).
84+
85+
## Integrity verification
86+
87+
Every model artifact is checksummed. Verify before extracting:
88+
89+
```bash
90+
sha256sum -c models/l5e/SHA256SUMS
91+
```
92+
93+
The release tarball is additionally cosign-signed with keyless
94+
Sigstore.
95+
96+
## Distribution mirrors
97+
98+
| Mirror | URL |
99+
|---|---|
100+
| GitHub Releases | `https://github.com/securelayer7/PROMPTPurify/releases` |
101+
| Hugging Face Hub | [`Securelayer7/promptpurify`](https://huggingface.co/Securelayer7/promptpurify) |
102+
103+
## Contact
104+
105+
- Security disclosures: [`SECURITY.md`](SECURITY.md)
106+
`security@securelayer7.net`
107+
- General: [GitHub Issues](https://github.com/securelayer7/PROMPTPurify/issues)

0 commit comments

Comments
 (0)