You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* fix(logs): run PII redaction over HTTP and fix Presidio provisioning
- resolve the guardrails venv via candidate paths and fail fast instead of
silently falling back to system python3 (the misleading "Presidio not
installed" that broke redaction and the guardrails block in deployed runtimes)
- install the en_core_web_lg spaCy model in setup.sh and app.Dockerfile
- route log redaction through an internal /api/guardrails/mask-batch endpoint
so Presidio always runs in the app container, including async executions that
persist inside the trigger.dev runtime
* fix(guardrails): chunk + time-bound internal PII mask requests
- chunk maskPIIBatchViaHttp by count (2000) and bytes (256KB) so large
executions split across requests and never hit the contract's 100k cap
- add AbortSignal.timeout(45s) per request so a slow/unreachable app container
aborts and the caller scrubs, instead of hanging the trigger.dev job
- catch maskPIIBatch failures in the route: log and return a structured 500
(broken venv fails loudly server-side; caller still scrubs, no leak)
- add mask-client tests (order across chunks, count split, non-2xx, empty)
* fix(guardrails): mint internal token per mask request
A single token (5min TTL) could expire mid-batch when a large execution
fans out into many sequential chunk requests; mint one per request instead.
* feat(guardrails): run PII via Presidio sidecars + TS recognizer registry
- replace the per-call python3 subprocess (cold spaCy load every call) with
two long-lived Presidio sidecars (analyzer + anonymizer) reached over HTTP;
the app image no longer carries Python/Presidio/venv
- add PRESIDIO_ANALYZER_URL / PRESIDIO_ANONYMIZER_URL
- move VIN out of Python into a TS recognizer (check-digit validated) behind a
CUSTOM_RECOGNIZERS registry so new custom detectors are one entry; masking is
handled uniformly by the anonymizer
- drive the guardrails block's PII type picker from the shared pii-entities
catalog (adds VIN, fixes drift) so block + Data Retention never diverge
- delete validate_pii.py, requirements.txt, setup.sh and the Dockerfile venv step
* fix(guardrails): bound-parallelize mask batch; refresh stale comments
- maskPIIBatch runs per-string sidecar calls with bounded concurrency (8) via
mapWithConcurrency, so a chunk of many small leaves finishes within the 45s
request timeout instead of aborting and scrubbing; order + fail-on-error kept
- drop stale comments referencing the deleted Python venv / 30s subprocess timeout
* refactor(guardrails): single Presidio image, native VIN, per-rule redaction language
- collapse the analyzer/anonymizer URLs into one PRESIDIO_URL (combined image
serves /analyze + /anonymize)
- remove the TS VIN recognizer (vin.ts, recognizers.ts) — VIN is now native +
multi-language in the image; validate_pii is a thin analyze→anonymize client
- trim KR_RRN/TH_TNIN from the catalog (no Korean/Thai model in the image)
- add per-rule redaction language: PII_LANGUAGES catalog drives the contract enum,
the Data Retention rule modal, and the guardrails block dropdown; resolver +
logger thread it through to maskPIIBatch (default en), so non-English entity
rules (e.g. ES_NIF) actually fire instead of silently no-op'ing under en
* fix(guardrails): correct sidecar port (5001) + README for combined image
The combined Presidio image (docker/pii.Dockerfile) serves /analyze + /anonymize
on a single port 5001 with native VIN + multi-language recognizers. Fix the
PRESIDIO_URL default (was 5002) and rewrite the README, which still described two
stock containers and a TS VIN recognizer.
* fix(guardrails): coerce stored redaction language in the resolver
The persist-path resolver accepted any stored language string, so a stale/invalid
code (e.g. a dropped locale) would reach Presidio and scrub the log even though the
admin UI shows English. Coerce against the supported set via a shared
coercePiiLanguage helper (now reused by the data-retention route too), falling back
to en for unknown values.
* fix(guardrails): rename PRESIDIO_URL env var to PII_URL
Match the infra taskdef, which sets PII_URL on the app container for the
combined Presidio sidecar.
0 commit comments