Skip to content

Commit 9d2a6ef

Browse files
feat(logs): redact PII from workflow logs via configurable rules (#5136)
* feat(logs): redact PII from workflow logs via configurable rules Enterprise PII redaction for workflow execution logs, configured under Data Retention as org-scoped rules (each rule picks entity types + which workspaces it applies to). Reuses the guardrails Presidio engine in mask mode at the log-persist choke point, with a check-digit-validated VIN recognizer. Also adds per-workspace data-retention-hours overrides. * fix(logs): widen PII entity visibleValues to string[] for strict build typecheck * fix(logs): redact error/trigger/executionState; keep guardrails import lazy - Extend PII redaction to span error/errorMessage/toolCalls and top-level error/completionFailure/trigger/executionState (Bugbot: PII in execution metadata). executionState is safe to redact — resume reads from the separate pausedExecutions table, not the log copy. - Lazy-import validate_pii in pii-redaction so the Python/child_process guardrails module stays out of the static middleware/RSC graph. - Type the org retention mutation to the contract body (optional, non-null). * refactor(logs): drop per-workspace retention override; PII redaction stays org-scoped - Remove the unused per-workspace data-retention-hours override (no UI; superseded by workspace-scoped PII rules). Reverts cleanup-dispatcher to org-only retention, drops resolveEffectiveRetentionHours, the workspace.dataRetentionSettings column + migration, and the workspace data-retention route/contract/hooks. Fixes Bugbot's null-as-unset finding by removing the buggy path entirely; org retention behavior is unchanged. - Stop re-checking isWorkspaceOnEnterprisePlan at persist time (it returns false on transient errors, which would fail-open and leak PII). Enabled rules already imply entitlement; redact whenever rules apply (fail-safe). * fix(logs): redact oversized strings and executionData.environment - Drop the per-string size cap in PII redaction: oversized strings were left unmasked (leak). Nothing is skipped now; large payloads still fail-safe via the total-bytes ceiling + per-chunk timeout (scrub, never leak). - Add executionData.environment (incl. variables) to the redaction set. * refactor(logs): single-scope PII rules with most-specific-wins resolution Each rule now targets one scope — all workspaces (workspaceId: null) or a single workspace — with workspaceId unique across rules. Resolution is most-specific-wins (a workspace's own rule overrides the all rule), not union; an empty specific rule exempts that workspace. Matches Access Control's resolveWorkspaceGroup precedence. UI 'Applies to' becomes a single-select; Add rule disables when all scopes are taken. * feat(logs): default + workspace-overrides UI for PII redaction Reshape the PII redaction settings into a 'Default (all workspaces)' block plus a 'Workspace overrides' list, making the most-specific-wins precedence explicit (overrides replace the default; unlisted workspaces use it). Same data model (workspaceId null = default), UI only. * improvement(logs): clearer default/overrides PII UI Drop the uppercase section labels and the overrides description; gate the Workspace overrides section behind a configured default; use a single Delete action; 'Add redaction' creates the all-workspaces default and disappears once set. * fix(guardrails): handle stdin EPIPE in PII python spawns Attach an 'error' listener to the child's stdin in both runPythonScript (the batch masking hot path) and executePythonPIIDetection. A 256KB chunk can exceed the OS pipe buffer, so if the Python process exits mid-read (OOM/kill) the EPIPE emitted on stdin was unhandled and would crash the Node process. Funnel it into the promise rejection so the fail-safe scrub path handles it gracefully. * fix(logs): redact executionData.correlation The top-level correlation field is copied from pre-redaction trigger data, so webhook/schedule correlation values could persist unredacted. Add it to the redaction set alongside trigger/environment. * fix(logs): enforce unique PII rule scope server-side The contract accepted multiple rules with the same workspaceId (or several null all-rules); resolution is first-match, so duplicates could disagree with the UI. Add a schema refine rejecting duplicate scopes. * fix(logs): re-hydrate data-retention form on org switch The form hydrated once via a boolean ref, so switching the active org left stale retention days + PII rules and saves targeted the new org with old config. Key hydration on orgId so it re-loads per org.
1 parent 13b5d21 commit 9d2a6ef

14 files changed

Lines changed: 1364 additions & 105 deletions

File tree

apps/sim/app/api/organizations/[id]/data-retention/route.ts

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,40 @@
11
import { AuditAction, AuditResourceType, recordAudit } from '@sim/audit'
22
import { db } from '@sim/db'
3+
import type { DataRetentionSettings } from '@sim/db/schema'
34
import { member, organization } from '@sim/db/schema'
45
import { createLogger } from '@sim/logger'
56
import { and, eq } from 'drizzle-orm'
67
import { type NextRequest, NextResponse } from 'next/server'
7-
import { updateOrganizationDataRetentionContract } from '@/lib/api/contracts/organization'
8+
import {
9+
type OrganizationRetentionValues,
10+
updateOrganizationDataRetentionContract,
11+
} from '@/lib/api/contracts/organization'
812
import { parseRequest, validationErrorResponse } from '@/lib/api/server'
913
import { getSession } from '@/lib/auth'
10-
import {
11-
CLEANUP_CONFIG,
12-
type OrganizationRetentionSettings,
13-
} from '@/lib/billing/cleanup-dispatcher'
14+
import { CLEANUP_CONFIG } from '@/lib/billing/cleanup-dispatcher'
1415
import { isOrganizationOnEnterprisePlan } from '@/lib/billing/core/subscription'
1516
import { isBillingEnabled } from '@/lib/core/config/env-flags'
1617
import { withRouteHandler } from '@/lib/core/utils/with-route-handler'
1718

1819
const logger = createLogger('DataRetentionAPI')
1920

20-
function enterpriseDefaults(): OrganizationRetentionSettings {
21+
function enterpriseDefaults(): OrganizationRetentionValues {
2122
return {
2223
logRetentionHours: CLEANUP_CONFIG['cleanup-logs'].defaults.enterprise,
2324
softDeleteRetentionHours: CLEANUP_CONFIG['cleanup-soft-deletes'].defaults.enterprise,
2425
taskCleanupHours: CLEANUP_CONFIG['cleanup-tasks'].defaults.enterprise,
26+
piiRedaction: null,
2527
}
2628
}
2729

2830
function normalizeConfigured(
29-
settings: Partial<OrganizationRetentionSettings> | null | undefined
30-
): OrganizationRetentionSettings {
31+
settings: DataRetentionSettings | null | undefined
32+
): OrganizationRetentionValues {
3133
return {
3234
logRetentionHours: settings?.logRetentionHours ?? null,
3335
softDeleteRetentionHours: settings?.softDeleteRetentionHours ?? null,
3436
taskCleanupHours: settings?.taskCleanupHours ?? null,
37+
piiRedaction: settings?.piiRedaction?.rules ? { rules: settings.piiRedaction.rules } : null,
3538
}
3639
}
3740

@@ -152,7 +155,7 @@ export const PUT = withRouteHandler(
152155
}
153156

154157
const current = normalizeConfigured(currentOrg.dataRetentionSettings)
155-
const merged: OrganizationRetentionSettings = { ...current }
158+
const merged: DataRetentionSettings = { ...current }
156159
if (body.logRetentionHours !== undefined) {
157160
merged.logRetentionHours = body.logRetentionHours
158161
}
@@ -162,6 +165,9 @@ export const PUT = withRouteHandler(
162165
if (body.taskCleanupHours !== undefined) {
163166
merged.taskCleanupHours = body.taskCleanupHours
164167
}
168+
if (body.piiRedaction !== undefined) {
169+
merged.piiRedaction = body.piiRedaction
170+
}
165171

166172
const [updated] = await db
167173
.update(organization)

0 commit comments

Comments
 (0)