chore(agents): enforce Phase 1 engineering rules upstream

shadowdevcode · claude · shadowdevcode · commit ade568891468 · 2026-04-04T08:31:20.000+05:30
Shift-left 7 rules from issue-009 postmortem into agent and command
files: auth caller cross-verification, parent/child write sequence
check, Gate 0 smoke test before deploy-check, empty ENV var detection,
Sentry provisioning checklist in execute-plan, file size budget at
generation time, and env var grep step post-implementation.

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/agents/backend-architect-agent.md b/agents/backend-architect-agent.md
@@ -220,12 +220,29 @@ Before finalizing the architecture, answer all of the following. Any gap must be
 9. **Telemetry Latency Isolation**: For every API route with a latency SLA (P95 target), confirm that PostHog/telemetry calls are fire-and-forget (not awaited). Awaited telemetry in hot paths violates latency contracts and creates false fallback triggers in experiment flows.
    → Exception: admin/cron routes where latency SLA doesn't apply.
 
+10. **Dashboard / Report Rehydration Path**: For every dashboard, report, or results page that is linked from navigation, email CTA, push notification, or any external URL:
+    → Specify the exact authenticated read path for first-load rehydration: which API route is called, what query it runs, and what state it returns.
+    → The mutation response path (result available immediately after POST) is not sufficient — the page must hydrate from the DB on any entry point.
+    → Client-memory-only post-mutation flows are blocked for any page reachable from an email link or deep URL.
+
+11. **Parent/Child Write Atomicity**: For every user action that writes a parent record + one or more child records in sequence:
+    → Specify the atomicity strategy explicitly: if the child write fails, define whether the parent is rolled back or transitioned to a `failed` state, and confirm error telemetry fires.
+    → Partial success (parent = `processed` / `success`, children = missing) is never an acceptable terminal state.
+    → "Log and continue" on child write failure is a blocking omission in the architecture spec.
+
+12. **Fan-Out Worker HTTP Contract**: For every fan-out architecture (master cron → N worker routes):
+    → Specify the worker HTTP status contract explicitly: "Worker must return HTTP non-2xx (e.g., 502) on any failure that the master should count as failed."
+    → Master uses HTTP status only for success/failure accounting — never inspects JSON body.
+    → JSON error payloads with HTTP 200 are insufficient as a failure signal to the master.
+
 # Added: 2026-03-19 — SMB Feature Bundling Engine
 
 # Updated: 2026-03-21 — Ozi Reorder Experiment (items 4–7)
 
 # Updated: 2026-03-28 — Nykaa Personalisation (items 8–9)
 
+# Updated: 2026-04-03 — MoneyMirror (items 10–12)
+
 ---
 
 ## Anti-Sycophancy Mandate
diff --git a/agents/backend-engineer-agent.md b/agents/backend-engineer-agent.md
@@ -154,3 +154,24 @@ Optimize for MVP speed.
 Experiment Integrity & Telemetry: Ensure cryptographic salts for A/B testing are server-only (do not use NEXT_PUBLIC). Telemetry calls (e.g., PostHog `captureServerEvent`) in user-facing API routes must be fire-and-forget (`.catch(() => {})`) instead of `await`ed to prevent external latency from corrupting SLAs and experiment data. Control group API responses must return a neutral label ("default"), never the real cohort string — the true cohort is captured server-side in PostHog only.
 
 # Added: 2026-03-28 — Nykaa Personalisation (issue-008)
+
+**Authenticated Route Caller Verification**: After adding authentication to any API route, search all client-side callers of that route path and verify each sends the required auth header. A `fetch()` call to an authenticated route without an `Authorization` header is a CRITICAL bug. A route auth fix without updating all callers is an incomplete fix — both the route and every caller must be updated in the same change.
+
+# Added: 2026-04-03 — MoneyMirror (issue-009)
+
+**File Size Budget at Generation Time**: Before writing any API route or page component expected to contain multi-phase logic, identify extraction points upfront. Route handlers must stay under 200 lines; page components must stay under 250 lines. If a file would exceed these limits, extract helpers or sub-components before writing past the limit — never write a large file and refactor later.
+
+# Added: 2026-04-03 — MoneyMirror (issue-009)
+
+**Infrastructure Provisioning is a hard deliverable** — not a README suggestion. Before execute-plan can be marked DONE, the Backend Engineer must confirm all of the following are complete:
+
+1. **Database project exists** — Neon/Supabase project created and `DATABASE_URL` is a real connection string in `.env.local` (not a placeholder).
+2. **Schema applied** — `schema.sql` has been run against the live DB. Verify by querying `information_schema.tables` — every expected table must exist.
+3. **Auth provider provisioned** — If the app uses Neon Auth, `NEON_AUTH_BASE_URL` must be obtained from the Neon console Auth section and filled in `.env.local`. OTP login must work locally before execute-plan closes.
+4. **All non-optional env vars filled** — Every variable in `.env.local.example` that is not explicitly marked `# Optional` must have a real value in `.env.local`. Empty strings (`VAR=`) are a blocking violation.
+5. **Sentry project created** — Create a Sentry project (free tier), run `npx @sentry/wizard@latest -i nextjs`, and fill `NEXT_PUBLIC_SENTRY_DSN`, `SENTRY_AUTH_TOKEN`, `SENTRY_ORG`, `SENTRY_PROJECT` in `.env.local`. This is a backend setup task, not a deploy-check task.
+6. **`npm run dev` boots clean** — The app starts without errors and the core user flow works end-to-end. Auth, DB reads/writes, and the primary feature must all function before the task is closed.
+
+Infra gaps discovered at `/deploy-check` are Backend Engineer failures. Ship infra, not just code.
+
+# Added: 2026-04-03 — Shift-left infra validation (issue-009 postmortem pattern)
diff --git a/agents/code-review-agent.md b/agents/code-review-agent.md
@@ -114,6 +114,26 @@ If found: block approval and require removal of the client-side re-fire (server-
 
 # Added: 2026-03-21 — Ozi Reorder Experiment
 
+**Authenticated Route → Caller Cross-Verification** (required for every review):
+
+For every API route confirmed to require authentication:
+
+- Search all `fetch()`, `axios`, and `useSWR` calls in client components (`"use client"` files) targeting that route path.
+- If any caller omits the `Authorization` header (or equivalent auth mechanism), flag as **CRITICAL**.
+- A route auth fix without updating all callers is an incomplete fix — both sides must be verified in the same review pass.
+
+# Added: 2026-04-03 — MoneyMirror (issue-009)
+
+**Parent/Child Write Sequence** (required for every review):
+
+For every API route that writes a parent record followed by child records:
+
+- Verify the route cannot enter a success state (`processed`, `completed`, `201`) before child writes succeed.
+- If parent status is set to a success terminal state before child insert completes, flag as **CRITICAL**.
+- Verify that a child write failure either rolls back the parent or transitions it to a `failed` state — never silently logs and continues.
+
+# Added: 2026-04-03 — MoneyMirror (issue-009)
+
 ---
 
 ## 5 Performance Risks
diff --git a/agents/qa-agent.md b/agents/qa-agent.md
@@ -116,6 +116,26 @@ Verify:
 
 # Added: 2026-03-21 — Ozi Reorder Experiment
 
+**Env Var Key Name Cross-Check** (standalone QA dimension — required for all projects):
+
+Perform a grep-based audit to verify `.env.local.example` exactly matches the source code:
+
+```bash
+grep -r 'process\.env\.' src/ | grep -oP 'process\.env\.\K[A-Z_]+' | sort -u
+```
+
+Compare the output against every key listed in `.env.local.example`.
+
+Verify:
+
+1. Every key used in source code appears in `.env.local.example`.
+2. Every key name matches exactly — no `NEXT_PUBLIC_` prefix added or removed relative to source usage.
+3. If any key in source is absent from the example file, or any name diverges, this is a **blocking QA finding** — env var mismatches cause silent production failures that are nearly impossible to debug from error logs alone.
+
+Note: Pay special attention to server-side telemetry keys (PostHog, Sentry). A `NEXT_PUBLIC_` prefix on a server-only key leaks it to the browser bundle; a missing prefix means server-side clients read `undefined`.
+
+# Added: 2026-04-03 — MoneyMirror (issue-009)
+
 ---
 
 ## 4 Performance Testing
diff --git a/commands/deploy-check.md b/commands/deploy-check.md
@@ -44,6 +44,29 @@ Follow this sequence.
 
 ---
 
+## 0 Local Smoke Test (PM runs manually before triggering /deploy-check)
+
+**This gate must pass before running the command.** If any checkbox fails, fix the infra/env issue first — do not run `/deploy-check` against a broken local environment.
+
+```
+□ `npm run dev` starts without errors (port 3000 accessible)
+□ /login loads and OTP is sent successfully (Neon Auth is provisioned + NEON_AUTH_BASE_URL filled)
+□ Onboarding completes (DB write to profiles table succeeds)
+□ Core feature works end-to-end (e.g., PDF upload parses, dashboard loads with data)
+□ No 500 errors in browser console or terminal
+□ All non-optional env vars have real values in .env.local (not empty strings)
+```
+
+If any checkbox fails → diagnose and fix before proceeding. Common causes:
+
+- `NEON_AUTH_BASE_URL` empty → provision Neon Auth on the project, copy the URL
+- Missing API keys → get them from the relevant service dashboard
+- Schema not applied → run `schema.sql` in Neon/Supabase SQL editor
+
+# Added: 2026-04-03 — Shift-left infra validation; catch env/auth gaps before PR creation
+
+---
+
 ## 1 Build Verification
 
 Ensure all components build successfully.
@@ -71,12 +94,18 @@ Ensure secrets are not exposed.
 **ENV Completeness Check (blocking)**:
 
 1. Scan `apps/<project-name>/src/` for all `process.env.*` references using grep.
-2. Compare the full list against `.env.local.example`.
-3. Report any variable present in code but missing from `.env.local.example` as a **BLOCKING violation**.
-4. If any missing vars are found: stop here and require them to be added to `.env.local.example` before continuing.
+2. Compare the full list against `.env.local.example` — report any variable present in code but missing from `.env.local.example` as a **BLOCKING violation**.
+3. Read `.env.local` directly and check each variable's value. Classify each as:
+   - ✅ FILLED — has a real value
+   - ⚠️ EMPTY — present in file but value is blank (`VAR=` or `VAR=""`)
+   - ❌ MISSING — not in file at all
+4. Report EMPTY variables as a **BLOCKING violation** — a variable that exists in the file with no value is just as broken as one that's missing.
+5. Exception: variables explicitly marked `# Optional` in `.env.local.example` may be empty without blocking.
 
 # Added: 2026-04-02 — ENV completeness must be a gate, not a checklist item
 
+# Updated: 2026-04-03 — Distinguish EMPTY vs MISSING; empty values are a blocking violation
+
 ---
 
 ## 3 Infrastructure Readiness
@@ -278,9 +307,11 @@ Return output using this structure.
 
 ---
 
+Local Smoke Test (Gate 0 — PM confirmed)
+
 Build Status
 
-Environment Configuration
+Environment Configuration (FILLED / EMPTY / MISSING per var)
 
 Infrastructure Readiness
 
diff --git a/commands/execute-plan.md b/commands/execute-plan.md
@@ -103,6 +103,15 @@ implement service logic
 integrate database operations
 handle errors and validation
 
+**Sentry setup is a backend deliverable** — not a deploy-check task. During backend implementation:
+
+1. `npm install @sentry/nextjs`
+2. `npx @sentry/wizard@latest -i nextjs` (creates `sentry.client.config.ts`, `sentry.server.config.ts`, updates `next.config.ts`)
+3. Add `NEXT_PUBLIC_SENTRY_DSN`, `SENTRY_AUTH_TOKEN`, `SENTRY_ORG`, `SENTRY_PROJECT` to `.env.local.example`
+4. Wrap at least one API error handler with `Sentry.captureException(e)`
+
+# Added: 2026-04-03 — Move Sentry setup to execute-plan; deploy-check is verification not first setup
+
 ---
 
 ## 3 Database Setup
@@ -129,6 +138,26 @@ core user journey works
 data flows correctly through system
 UI interactions behave correctly
 
+**Read path / write path checkpoint** (required for every page in the plan):
+
+For every page that displays data, verify BOTH paths are implemented before marking it complete:
+
+- **Write path**: mutation fires (POST/upload) → result displayed in same request cycle
+- **Read path**: page loads fresh (refresh, direct URL, email deep link) → same result hydrated from DB via authenticated GET endpoint
+
+If only the write path is implemented, the page is incomplete. Any page linked from an email CTA, push notification, or external URL that has no implemented read endpoint is a blocking gap.
+
+**Third-party library API verification** (required for every new npm integration):
+
+After wiring any npm package for the first time:
+
+1. Check the installed version in `package.json`.
+2. Verify the generated call pattern against the package's TypeScript types or exported index — not against training knowledge.
+3. Run `npm test` to confirm the integration behaves as expected.
+4. Training knowledge of library APIs is not sufficient for version-sensitive properties (e.g., `result.total` vs `result.pages?.length`).
+
+# Added: 2026-04-03 — MoneyMirror (issue-009)
+
 ---
 
 # Output Format
@@ -151,6 +180,22 @@ Known Issues
 
 ---
 
+## 5b File Size Budget Requirement
+
+The 300-line pre-commit limit must be applied **during code generation**, not discovered at commit time.
+
+**Rules**:
+
+- API route handlers: must stay under **200 lines**. If a route handles more than 2 logical phases (e.g., validate → AI call → DB write → telemetry), extract each phase into a named helper function in a separate file before writing the route past 150 lines.
+- Page components: must stay under **250 lines**. If a page includes multiple UI states (loading, upload, result), extract each state into a named sub-component before writing the page past 200 lines.
+- **Never write a large file and refactor later.** Identify extraction points upfront during task breakdown (Step 0). If a file is projected to exceed the limit, add an extraction task to the task list before writing any code.
+
+Violations discovered at deploy-check (pre-commit hook rejection) are execute-plan failures, not deploy-check tasks.
+
+# Added: 2026-04-03 — MoneyMirror (issue-009)
+
+---
+
 ## 6 Telemetry Completeness Requirement
 
 For every API route calling an external AI service, implement PostHog events in ALL branches:
@@ -309,11 +354,24 @@ Before marking execute-plan complete, verify:
    - Key design decisions
 
 2. **`.env.local.example`** lists every `process.env.*` reference in the codebase — including any variables added during peer-review or fix cycles.
+   - **Mandatory grep verification**: Run `grep -r 'process\.env\.' src/ | grep -oP 'process\.env\.\K[A-Z_]+' | sort -u` and compare against every key in `.env.local.example`. Any key in the grep output absent from `.env.local.example` is a blocking gap. Any key name that diverges (e.g., `NEXT_PUBLIC_` added or removed) is a deploy blocker. `.env.local.example` must be generated from source, never from memory.
+
+   # Added: 2026-04-03 — MoneyMirror (issue-009)
 
-If either is missing, execute-plan is **not complete**. A deploy-check README failure that originates here is an execute-plan prompt failure — flag it in the postmortem.
+3. **Infrastructure provisioning is complete** (blocking — do not mark done until all pass):
+   - [ ] Neon/Supabase project created and `DATABASE_URL` filled in `.env.local`
+   - [ ] Database schema applied (`schema.sql` run in SQL editor; all tables verified)
+   - [ ] Auth provider provisioned (e.g., Neon Auth `NEON_AUTH_BASE_URL` obtained and filled)
+   - [ ] All non-optional env vars have real values in `.env.local` — no empty strings
+   - [ ] Sentry project created; `NEXT_PUBLIC_SENTRY_DSN`, `SENTRY_AUTH_TOKEN`, `SENTRY_ORG`, `SENTRY_PROJECT` filled in `.env.local`
+   - [ ] `npm run dev` boots without errors and the core user flow works end-to-end locally
+
+If any item above is incomplete, execute-plan is **not done** — it is blocked. Infra gaps discovered at deploy-check are execute-plan failures.
 
 # Added: 2026-03-21 — Ozi Reorder Experiment
 
+# Updated: 2026-04-03 — Add infra provisioning checklist + Sentry setup as execute-plan hard deliverables (shift-left from deploy-check)
+
 ---
 
 # Rules