Commit 9d2a6ef
authored
feat(logs): redact PII from workflow logs via configurable rules (#5136)
* feat(logs): redact PII from workflow logs via configurable rules
Enterprise PII redaction for workflow execution logs, configured under
Data Retention as org-scoped rules (each rule picks entity types + which
workspaces it applies to). Reuses the guardrails Presidio engine in mask
mode at the log-persist choke point, with a check-digit-validated VIN
recognizer. Also adds per-workspace data-retention-hours overrides.
* fix(logs): widen PII entity visibleValues to string[] for strict build typecheck
* fix(logs): redact error/trigger/executionState; keep guardrails import lazy
- Extend PII redaction to span error/errorMessage/toolCalls and top-level
error/completionFailure/trigger/executionState (Bugbot: PII in execution
metadata). executionState is safe to redact — resume reads from the separate
pausedExecutions table, not the log copy.
- Lazy-import validate_pii in pii-redaction so the Python/child_process
guardrails module stays out of the static middleware/RSC graph.
- Type the org retention mutation to the contract body (optional, non-null).
* refactor(logs): drop per-workspace retention override; PII redaction stays org-scoped
- Remove the unused per-workspace data-retention-hours override (no UI; superseded
by workspace-scoped PII rules). Reverts cleanup-dispatcher to org-only retention,
drops resolveEffectiveRetentionHours, the workspace.dataRetentionSettings column +
migration, and the workspace data-retention route/contract/hooks. Fixes Bugbot's
null-as-unset finding by removing the buggy path entirely; org retention behavior
is unchanged.
- Stop re-checking isWorkspaceOnEnterprisePlan at persist time (it returns false on
transient errors, which would fail-open and leak PII). Enabled rules already imply
entitlement; redact whenever rules apply (fail-safe).
* fix(logs): redact oversized strings and executionData.environment
- Drop the per-string size cap in PII redaction: oversized strings were left
unmasked (leak). Nothing is skipped now; large payloads still fail-safe via the
total-bytes ceiling + per-chunk timeout (scrub, never leak).
- Add executionData.environment (incl. variables) to the redaction set.
* refactor(logs): single-scope PII rules with most-specific-wins resolution
Each rule now targets one scope — all workspaces (workspaceId: null) or a single
workspace — with workspaceId unique across rules. Resolution is most-specific-wins
(a workspace's own rule overrides the all rule), not union; an empty specific rule
exempts that workspace. Matches Access Control's resolveWorkspaceGroup precedence.
UI 'Applies to' becomes a single-select; Add rule disables when all scopes are taken.
* feat(logs): default + workspace-overrides UI for PII redaction
Reshape the PII redaction settings into a 'Default (all workspaces)' block plus a
'Workspace overrides' list, making the most-specific-wins precedence explicit
(overrides replace the default; unlisted workspaces use it). Same data model
(workspaceId null = default), UI only.
* improvement(logs): clearer default/overrides PII UI
Drop the uppercase section labels and the overrides description; gate the
Workspace overrides section behind a configured default; use a single Delete
action; 'Add redaction' creates the all-workspaces default and disappears once set.
* fix(guardrails): handle stdin EPIPE in PII python spawns
Attach an 'error' listener to the child's stdin in both runPythonScript (the
batch masking hot path) and executePythonPIIDetection. A 256KB chunk can exceed
the OS pipe buffer, so if the Python process exits mid-read (OOM/kill) the EPIPE
emitted on stdin was unhandled and would crash the Node process. Funnel it into
the promise rejection so the fail-safe scrub path handles it gracefully.
* fix(logs): redact executionData.correlation
The top-level correlation field is copied from pre-redaction trigger data, so
webhook/schedule correlation values could persist unredacted. Add it to the
redaction set alongside trigger/environment.
* fix(logs): enforce unique PII rule scope server-side
The contract accepted multiple rules with the same workspaceId (or several
null all-rules); resolution is first-match, so duplicates could disagree with
the UI. Add a schema refine rejecting duplicate scopes.
* fix(logs): re-hydrate data-retention form on org switch
The form hydrated once via a boolean ref, so switching the active org left stale
retention days + PII rules and saves targeted the new org with old config. Key
hydration on orgId so it re-loads per org.1 parent 13b5d21 commit 9d2a6ef
14 files changed
Lines changed: 1364 additions & 105 deletions
File tree
- apps/sim
- app/api/organizations/[id]/data-retention
- ee/data-retention
- components
- hooks
- lib
- api/contracts
- billing
- guardrails
- logs/execution
- packages/db
Lines changed: 15 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
7 | | - | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
8 | 12 | | |
9 | 13 | | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
20 | | - | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
29 | | - | |
30 | | - | |
| 31 | + | |
| 32 | + | |
31 | 33 | | |
32 | 34 | | |
33 | 35 | | |
34 | 36 | | |
| 37 | + | |
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
| |||
152 | 155 | | |
153 | 156 | | |
154 | 157 | | |
155 | | - | |
| 158 | + | |
156 | 159 | | |
157 | 160 | | |
158 | 161 | | |
| |||
162 | 165 | | |
163 | 166 | | |
164 | 167 | | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
165 | 171 | | |
166 | 172 | | |
167 | 173 | | |
| |||
0 commit comments