Skip to content

Commit d024ffc

Browse files
therealbradclaude
andauthored
fix(audit): stop silent audit-capture loss — self-heal triggers on every boot (#467)
* fix(audit): keep audit worker UPDATE/DELETE on DataChangeLog grant apply-triggers.ts revoked UPDATE/DELETE on DataChangeLog from CURRENT_USER on the premise that the connecting role always owns the table (where a REVOKE is a no-op against the owner's implicit rights). In production the runtime role is not the table owner, so the REVOKE actually stripped the audit worker's UPDATE/DELETE — silently stalling the CDC processed-cursor advance and the retention purge. Converge on a working grant set on every run instead: GRANT INSERT/SELECT/UPDATE/DELETE to the connecting role and REVOKE UPDATE/DELETE from PUBLIC only. The append-only invariant is unchanged — the BEFORE UPDATE/DELETE enforcement triggers reject every non-cursor mutation regardless of grant state, so they remain the real guarantee. Add an end-of-run has_table_privilege self-check that fails loudly if the connecting role lacks any of INSERT/SELECT/UPDATE/DELETE, so a future self-revoke regression breaks the deploy instead of stalling silently. This makes apply-triggers idempotent and safe to re-run on every deploy and restart. * feat(audit): self-install audit triggers on every app boot (launch-agnostic) `prisma db push` silently drops the audit triggers, and not every launch path re-runs apply-triggers: the docker entrypoint does, but a bare `node server.js`, `next start`, pm2, or a k8s `command:` override that runs `prisma db push` alone does NOT — causing silent audit-capture loss with no error (exactly what bit prod). Make the trigger substrate self-healing from the app's own startup, so it holds no matter how the app is installed, updated, or launched: - scripts/apply-triggers.ts: extract the apply logic into an importable `applyAuditTriggers()` (advisory-locked so concurrent replica boots can't deadlock on DROP/CREATE TRIGGER; injectable logger; bundle-safe SQL path resolution). CLI behavior preserved and now import-safe via a `require.main === module` guard. - lib/audit/ensureAuditTriggers.ts: once-per-process, fail-open boot helper (DIRECT_DATABASE_URL preferred; AUDIT_TRIGGER_BOOTSTRAP_FATAL=1 to fail-closed for the worker tier; AUDIT_TRIGGER_BOOTSTRAP=off to skip). - instrumentation.ts: call it on server boot (nodejs runtime). Next's instrumentation runs on every launcher, making this the one launch-agnostic install point. - next.config.ts: externalize `pg`; trace prisma/audit_row_change.sql into standalone. - package.json: add canonical `db:push` / `db:push:dev` (db push + apply-triggers) so the explicit install/update path can't run `prisma db push` alone. Builds on the grant fix in fix/datachangelog-append-only-grant. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(audit): self-install audit triggers from the worker tier too The boot-time self-install was wired only into Next's instrumentation hook, which runs in the web tier. Workers are separate pm2 processes that don't run that hook, so a worker-only boot — or a web pod that fails its startup asserts before the install — could leave the audit-log worker draining a DataChangeLog whose capture triggers were silently dropped by `prisma db push`. Re-attach the substrate from the audit-log worker's own boot. It's the natural owner: the direct consumer of the capture triggers, so its correctness depends on them existing. The call sits in the standalone (require.main) boot path before the worker starts consuming, reusing the same idempotent, advisory-locked applyAuditTriggers(). Fail-open by default so a DDL hiccup can't crash-loop the worker; AUDIT_TRIGGER_BOOTSTRAP_FATAL=1 makes it refuse to start without the substrate. Also refresh the apply-triggers.ts header to document the three re-attach points (schema sync, deploy entrypoint, and launch-agnostic app boot). * fix(audit): only self-install triggers from the worker in single-tenant mode The worker-tier bootstrap connected to DIRECT_DATABASE_URL ?? DATABASE_URL on boot, but in multi-tenant mode the audit-log worker has no single database of its own — its DATABASE_URL is a placeholder and it resolves a connection per tenant per Loop B cycle. So the bootstrap would connect to the placeholder, fail, and (fail-open) log an error on every worker boot while installing nothing. Tenant trigger install in multi-tenant deployments is owned by each tenant's web app via the instrumentation hook. Gate the call on !isMultiTenantMode() (already imported and used throughout the worker): single-tenant installs self-install as intended; multi-tenant workers skip cleanly with a log line. Robust by construction — no per-deployment AUDIT_TRIGGER_BOOTSTRAP=off needed, so a future multitenant-workers deploy can't drift into the noisy/futile path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(audit): clarify boot-time audit-trigger self-install for installers Document that prisma db push drops the audit-capture triggers but the app re-installs them on every boot, so external-database installers need no separate trigger step; document the AUDIT_TRIGGER_BOOTSTRAP and AUDIT_TRIGGER_BOOTSTRAP_FATAL knobs. Fix the DataChangeLog append-only trigger name in background-processes (tpl_dcl_no_delete/tpl_dcl_no_update). --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 7a075d2 commit d024ffc

8 files changed

Lines changed: 276 additions & 43 deletions

File tree

docs/docs/background-processes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ The application uses the following background processes:
142142
### DataChangeLog Retention Worker
143143

144144
- Wakes once per day and batch-deletes processed `DataChangeLog` rows (the audit change-capture substrate) older than 30 days
145-
- Never deletes unprocessed rows — the audit log worker must correlate them into `AuditLog` first; the append-only `datachangelog_append_only` trigger enforces this at the database level too
145+
- Never deletes unprocessed rows — the audit log worker must correlate them into `AuditLog` first; the append-only enforcement triggers (`tpl_dcl_no_delete` / `tpl_dcl_no_update`) enforce this at the database level too
146146
- Batched `LIMIT 1000` deletes to avoid lock contention with the capture path; emits one `DCL_RETENTION_PURGED` audit event per run
147147
- Multi-tenant aware: runs the purge against every configured tenant database per cycle and emits one audit row per (tenant, run)
148148
- Standalone daily loop (no BullMQ queue) — self-schedules internally rather than via the scheduler

docs/docs/external-database-deployment.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,20 @@ command at a connection that bypasses the pooler — `prisma db push` and the in
266266
setup require a real (non-pooled) session. See [Configure connection pooling](#best-practices)
267267
below for the matching `DIRECT_DATABASE_URL` setup the running container uses.
268268

269+
You do **not** need a separate step to install the audit-capture database triggers.
270+
`prisma db push` silently drops them, but the application re-installs them
271+
automatically on every boot (idempotent, advisory-locked across replicas), so a plain
272+
`db push` followed by starting the app leaves audit capture intact. Two optional
273+
environment variables tune this self-install:
274+
275+
- `AUDIT_TRIGGER_BOOTSTRAP=off` — skip the install entirely. Use this only when the
276+
app's database role cannot run DDL (e.g. a read-only role); you must then apply the
277+
triggers out-of-band.
278+
- `AUDIT_TRIGGER_BOOTSTRAP_FATAL=1` — abort startup if the install fails instead of
279+
logging and continuing. The default is fail-open so an unrelated database hiccup
280+
can't take the web tier down; set this to `1` on the **worker tier**, where audit
281+
integrity is load-bearing.
282+
269283
Or use your existing database if it's already initialized.
270284

271285
### Step 4: Monitor Deployment

testplanit/instrumentation.ts

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
/**
22
* Next.js server instrumentation hook. Runs once when the server starts
3-
* (nodejs runtime only, not edge), before any request is served.
3+
* (nodejs runtime only, not edge), before any request is served — and on
4+
* EVERY launcher: docker entrypoint, `next start`, standalone `node server.js`,
5+
* pm2, or a k8s `command:` override. That makes it the one launch-agnostic
6+
* place to guarantee runtime invariants.
47
*
58
* Use this to fail fast on missing security-critical configuration. A
69
* production deployment without ENCRYPTION_KEY would silently fall back
@@ -22,4 +25,13 @@ export async function register() {
2225
console.error("[startup] encryption misconfiguration:", error);
2326
throw error;
2427
}
28+
29+
// Re-attach the audit-trigger substrate on every boot. `prisma db push` silently drops these
30+
// triggers and not every launch path runs apply-triggers, so doing it here is what keeps audit
31+
// capture alive no matter how the app is installed/updated/launched. Idempotent, advisory-locked,
32+
// and fail-open by default (see ensureAuditTriggers); never blocks startup unless explicitly made
33+
// fatal via AUDIT_TRIGGER_BOOTSTRAP_FATAL=1.
34+
const { ensureAuditTriggers } =
35+
await import("~/lib/audit/ensureAuditTriggers");
36+
await ensureAuditTriggers();
2537
}
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
/**
2+
* Runtime guarantee that the audit-trigger substrate exists — applied from the application's own
3+
* boot, independent of HOW the process was launched.
4+
*
5+
* The audit pipeline depends on per-table DB triggers that write "DataChangeLog" (capture) plus the
6+
* append-only enforcement triggers and grants. Those are installed by scripts/apply-triggers.ts.
7+
* The catch: `prisma db push` SILENTLY DROPS triggers, and the various launch paths don't all run
8+
* apply-triggers — the docker entrypoint does, but a bare `node server.js`, `next start`, pm2, or a
9+
* k8s `command:` override that runs `prisma db push` alone does not. The result is silent audit
10+
* data loss with no error.
11+
*
12+
* Calling this from instrumentation.ts (Next's server boot hook) closes that gap: every server
13+
* start re-attaches the triggers, so capture survives no matter how the app is installed, updated,
14+
* or relaunched. It is idempotent and serialized across replicas by an advisory lock in
15+
* applyAuditTriggers.
16+
*
17+
* Behavior:
18+
* - Runs at most once per process (memoized).
19+
* - Uses DIRECT_DATABASE_URL (pooler-bypass) when set, else DATABASE_URL — DDL must not go through
20+
* a transaction-pooled PgBouncer.
21+
* - Fail-open: a failure is logged loudly but does NOT block startup, so an unrelated DB hiccup
22+
* can't take the web tier down. Set AUDIT_TRIGGER_BOOTSTRAP_FATAL=1 to abort startup instead
23+
* (recommended for the worker tier, where audit integrity is load-bearing).
24+
* - Set AUDIT_TRIGGER_BOOTSTRAP=off to skip entirely (e.g. a read-only role that cannot run DDL).
25+
*/
26+
import { applyAuditTriggers } from "~/scripts/apply-triggers";
27+
28+
let inFlight: Promise<void> | null = null;
29+
30+
export function ensureAuditTriggers(): Promise<void> {
31+
if (process.env.AUDIT_TRIGGER_BOOTSTRAP === "off") return Promise.resolve();
32+
if (inFlight) return inFlight;
33+
34+
inFlight = (async () => {
35+
try {
36+
await applyAuditTriggers({ lock: true });
37+
console.info("[startup] audit triggers ensured ✓");
38+
} catch (error) {
39+
console.error(
40+
"[startup] audit trigger bootstrap failed — audit capture may be incomplete until this is resolved:",
41+
error
42+
);
43+
if (process.env.AUDIT_TRIGGER_BOOTSTRAP_FATAL === "1") throw error;
44+
// Fail-open: clear the memo so a later explicit call can retry.
45+
inFlight = null;
46+
}
47+
})();
48+
49+
return inFlight;
50+
}

testplanit/next.config.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,8 +131,16 @@ const nextConfig: NextConfig = {
131131
"test-results-parser",
132132
"jspdf",
133133
"fflate",
134+
// The instrumentation boot hook imports apply-triggers, which uses `pg` for raw DDL.
135+
// Keep it external so the native driver isn't bundled into the server runtime.
136+
"pg",
134137
],
135138
outputFileTracingRoot: path.join(__dirname, "../"),
139+
// The instrumentation boot hook reads prisma/audit_row_change.sql at runtime to (re)install the
140+
// audit triggers. Trace it into the standalone output so it exists wherever the server runs.
141+
outputFileTracingIncludes: {
142+
"/**": ["./prisma/audit_row_change.sql"],
143+
},
136144
experimental: {
137145
// Limit number of workers to reduce memory usage during build
138146
workerThreads: false,

testplanit/package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
"generate:dev": "zenstack format && NODE_OPTIONS='--max-old-space-size=12288' zenstack generate && node scripts/fix-zenstack-symlink.js && dotenv -e .env.development -- prisma db push && dotenv -e .env.development -- tsx scripts/apply-triggers.ts",
1010
"triggers:apply": "tsx scripts/apply-triggers.ts",
1111
"triggers:apply:dev": "dotenv -e .env.development -- tsx scripts/apply-triggers.ts",
12+
"db:push": "prisma db push --skip-generate && tsx scripts/apply-triggers.ts",
13+
"db:push:dev": "dotenv -e .env.development -- prisma db push --skip-generate && dotenv -e .env.development -- tsx scripts/apply-triggers.ts",
1214
"build": "node scripts/generate-version.js && node scripts/fix-zenstack-symlink.js && pnpm build:workers && NODE_OPTIONS='--max-old-space-size=24576' next build",
1315
"build:workers": "node scripts/build-workers.js",
1416
"build:docs": "pnpm --filter docs build",

0 commit comments

Comments
 (0)