|
| 1 | +--- |
| 2 | +description: Strict governance rules for D1 database instances, Drizzle ORM usage, schema changes, and migration management in the core-github-api project. |
| 3 | +--- |
| 4 | + |
| 5 | +# D1 & Drizzle ORM Governance Rules |
| 6 | + |
| 7 | +## Table Instance Ownership |
| 8 | + |
| 9 | +Every table in this project belongs to exactly ONE D1 binding. When in doubt, check this table first: |
| 10 | + |
| 11 | +| D1 Binding | Migration Config | Owns | |
| 12 | +|-----------|-----------------|------| |
| 13 | +| `DB` | `drizzle.config.core.ts` → `migrations/core/` | All application logic tables: `system_logs`, `audit_logs`, `automation_logs`, `repos`, `prs`, `reviews`, `health_*`, `cloudflare_changelog`, `jules_*`, `discord_*`, `agent_*`, `project_*` | |
| 14 | +| `DB_WEBHOOKS` | `drizzle.config.webhooks.ts` → `migrations/webhooks/` | Raw GitHub event tables ONLY: `webhook_deliveries`, `pull_request`, `push`, `check_run`, `workflow_run`, `webhook_configs`, `searches`, `repoAnalysis`, `dailyTrends`, `trendingRepos` | |
| 15 | + |
| 16 | +## ORM Client Selection |
| 17 | + |
| 18 | +| Situation | Client to Use | Import | |
| 19 | +|-----------|--------------|--------| |
| 20 | +| Reading/writing any table on `DB` | `getDb(env.DB)` | `import { getDb } from '@db'` | |
| 21 | +| Reading/writing any table on `DB_WEBHOOKS` | `getWebhooksDb(env.DB_WEBHOOKS)` | `import { getWebhooksDb } from '@db'` | |
| 22 | +| ❌ FORBIDDEN | `drizzle(env.DB)` or `drizzle(env.DB_WEBHOOKS)` | Never — always pass schema via the getters | |
| 23 | + |
| 24 | +## Pre-Table-Creation Protocol (MANDATORY) |
| 25 | + |
| 26 | +Before creating any new Drizzle table: |
| 27 | + |
| 28 | +1. **Scan existing tables** — run the following and read the relevant schema files: |
| 29 | + ```bash |
| 30 | + grep -r "sqliteTable" src/backend/src/db/schemas/ --include="*.ts" -l |
| 31 | + ``` |
| 32 | +2. **Evaluate reuse** — can you add a column to an existing table? Is there a table in a different domain that serves this purpose? |
| 33 | +3. **Only if no existing table fits** — create a new one |
| 34 | +4. **Assign to correct D1 instance** — use the ownership table above to determine which Drizzle config and migration dir to use |
| 35 | +5. **Add to the correct barrel** — update the domain's `index.ts` so it is picked up by `schema.ts` |
| 36 | + |
| 37 | +## Schema File Locations |
| 38 | + |
| 39 | +``` |
| 40 | +src/backend/src/db/ |
| 41 | +├── schema.ts ← master barrel (all domains, used by DB/core) |
| 42 | +├── schemas/ |
| 43 | +│ ├── index.ts ← re-exports all domains |
| 44 | +│ ├── agents/ |
| 45 | +│ ├── app/ ← cloudflare_changelog, etc. |
| 46 | +│ ├── discord/ |
| 47 | +│ ├── github/ |
| 48 | +│ │ └── webhooks.ts ← webhook event tables (owned by DB_WEBHOOKS) |
| 49 | +│ ├── logs/ ← system_logs, audit_logs, health_* (owned by DB) |
| 50 | +│ ├── ops/ |
| 51 | +│ ├── webhooks/ |
| 52 | +│ │ └── automations.ts ← automation_logs, webhook_configs |
| 53 | +│ └── ... |
| 54 | +└── index.ts ← exports getDb() and getWebhooksDb() |
| 55 | +``` |
| 56 | + |
| 57 | +## Migration Discipline |
| 58 | + |
| 59 | +- **NEVER** edit files in `migrations/core/` or `migrations/webhooks/` directly |
| 60 | +- Generate: `pnpm run db:generate:core` or `pnpm run db:generate:webhooks` |
| 61 | +- Apply: `pnpm run migrate:remote:core` or `pnpm run migrate:remote:webhooks` |
| 62 | +- Full reset: `pnpm run db:reset` (creates fresh DB instances, archives old migrations) |
| 63 | +- Manual repair is ONLY permitted if a migration script fails AND the user explicitly authorizes it |
| 64 | + |
| 65 | +## Fresh Instance Reset Protocol |
| 66 | + |
| 67 | +When you need a clean slate (structural errors, wrong table locations, D1 corruption): |
| 68 | + |
| 69 | +```bash |
| 70 | +pnpm run db:reset |
| 71 | +# Then, after deploy completes: |
| 72 | +pnpm run db:seed:prep # prepare seed files from pre-delete export |
| 73 | +pnpm run db:seed:run # apply seeds to fresh instances |
| 74 | +``` |
| 75 | + |
| 76 | +### What `pnpm run db:reset` does: |
| 77 | +1. Reads current D1 UUIDs **dynamically from `wrangler.jsonc`** — no hardcoded constants |
| 78 | +2. Exports all row data to `scripts/db/data_exports/{timestamp}/` (SQL + JSON) before deletion |
| 79 | +3. Deletes old D1 instances via CF REST API |
| 80 | +4. Creates new fresh instances with canonical names |
| 81 | +5. Updates `wrangler.jsonc` with new UUIDs |
| 82 | +6. Archives old migration history to `migrations/_archive/` |
| 83 | +7. Chains: `db:generate:all → migrate:remote:all → deploy` |
| 84 | + |
| 85 | +### Seeding After Reset: |
| 86 | + |
| 87 | +**Step 1:** `pnpm run db:seed:prep [-- --export-dir scripts/db/data_exports/TIMESTAMP]` |
| 88 | +- Reads the JSON export from the pre-delete backup |
| 89 | +- Applies truncation limits (e.g. `system_logs` → last 2000 rows, `webhook_deliveries` → last 500) |
| 90 | +- Chunks INSERT statements to stay within D1 limits: 100 bound params/query, 90 KB/statement |
| 91 | +- Writes `scripts/db/seeds/{timestamp}/DB.seed.sql` and `DB_WEBHOOKS.seed.sql` |
| 92 | + |
| 93 | +**Step 2:** `pnpm run db:seed:run [-- --seeds-dir scripts/db/seeds/TIMESTAMP]` |
| 94 | +- Tries bulk `--file` execution first (fastest) |
| 95 | +- Falls back to statement-by-statement with D1 error keyword detection |
| 96 | +- Retries on transient overload, aborts on fatal errors (column_notfound, schema mismatch) |
| 97 | +- Prints instructive fix guidance for every known D1 error type |
| 98 | + |
| 99 | +### D1 Execution Limits (Hardcoded in scripts as source of truth): |
| 100 | +| Limit | Value | Source | |
| 101 | +|-------|-------|--------| |
| 102 | +| Max bound params per query | 100 | CF hard limit | |
| 103 | +| Max SQL statement length | 100 KB (scripts use 90 KB) | CF hard limit | |
| 104 | +| Max query duration | 30 seconds | CF hard limit | |
| 105 | +| Safe INSERT batch size | 100 rows | CF recommendation | |
| 106 | +| Max D1 database size | 10 GB | CF hard limit | |
| 107 | + |
| 108 | +⚠️ **Do NOT put seed files in `migrations/` directories** — wrangler will try to apply them as migrations. |
| 109 | + |
| 110 | +## Health Monitors for D1 Staleness |
| 111 | + |
| 112 | +Three health checks run as part of `POST /api/health/run`: |
| 113 | + |
| 114 | +| Check ID | File | What it detects | |
| 115 | +|----------|------|-----------------| |
| 116 | +| `webhook_staleness` | `health/checks/webhook-staleness.ts` | `webhook_deliveries` freshness vs latest GitHub API event (fails if >24h lag or empty) | |
| 117 | +| `log_staleness` | `health/checks/log-staleness.ts` | `system_logs` freshness (fails if empty or >1 day old) | |
| 118 | +| `d1_table_scan` | `health/checks/d1-table-scan.ts` | All tables in both instances — flags empty (0 rows) or stale (>30 days) | |
| 119 | + |
| 120 | +To manually check D1 staleness at any time: |
| 121 | +```bash |
| 122 | +# Quick row count check |
| 123 | +wrangler d1 execute DB --remote --command "SELECT count(*) FROM system_logs;" |
| 124 | +wrangler d1 execute DB_WEBHOOKS --remote --command "SELECT count(*) FROM webhook_deliveries;" |
| 125 | + |
| 126 | +# Or trigger the full health suite |
| 127 | +curl -X POST https://core-github-api.hacolby.workers.dev/api/health/run | jq '.results[] | select(.name | test("Staleness|Table Scan"))' |
| 128 | +``` |
| 129 | + |
| 130 | +## Adding a New Table — Checklist |
| 131 | + |
| 132 | +``` |
| 133 | +[ ] Scanned existing schemas, no suitable table found |
| 134 | +[ ] Determined correct D1 instance (DB vs DB_WEBHOOKS) |
| 135 | +[ ] Created file in correct schemas/<domain>/ directory |
| 136 | +[ ] Exported from schemas/<domain>/index.ts |
| 137 | +[ ] Ran pnpm run db:generate:<core|webhooks> |
| 138 | +[ ] Reviewed generated migration SQL (do not edit it) |
| 139 | +[ ] Ran pnpm run migrate:remote:<core|webhooks> |
| 140 | +``` |
0 commit comments