Skip to content
Merged
66 changes: 66 additions & 0 deletions .agents/skills/db-migrate/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
name: db-migrate
description: Author or review a Drizzle DB migration for zero-downtime safety — expand/contract phasing, backward-compatibility with the deployed app version, and writing the `-- migration-safe` acknowledgment the check:migrations lint requires. Use when adding/editing files under `packages/db/migrations/` or changing `packages/db/schema.ts`.
---

# DB Migrate Skill

You make schema changes that survive a deploy without downtime. The `check:migrations` lint (`scripts/check-migrations-safety.ts`) is the deterministic gate; you are the judgment that decides whether a flagged change is actually safe and writes the annotation that satisfies it.

## The window (why this matters)

A deploy runs the migration, then rolls out the new app image via blue/green. The two are **not atomic and cannot be** — during cutover the old task set keeps serving against the **already-migrated** schema. So:

> Every migration must be backward-compatible with the app version that is *already deployed*.

If a migration drops a column the old code still reads, renames one, or adds a `NOT NULL` the old inserts don't populate, the old code throws until traffic fully shifts — the downtime we're guarding against. You can't fix this by reordering the pipeline; the only fix is discipline.

## Expand / contract

Split every breaking change across **two deploys**:

1. **Expand** (this PR): additive, backward-compatible schema + code that tolerates *both* the old and new shape.
2. **Contract** (a later PR, after expand is fully deployed): remove the old thing, now that nothing reads it.

Never put expand and contract in the same PR. If this PR both removes the code that used a column *and* drops the column, the old code is still live during cutover — split it.

### Per-operation playbook

| You want to | Do (deploy 1 = expand) | Do (deploy 2 = contract) |
|---|---|---|
| Add a required column | `ADD COLUMN` nullable or `DEFAULT`; code writes it | backfill, then `SET NOT NULL` |
| Rename a column/table | add the new name; code dual-writes / reads new-then-old | drop the old name |
| Drop a column/table | stop all reads/writes in code; ship it | `DROP` (annotate) |
| Change a column type | add a new column of the new type; dual-write | backfill, swap reads, drop old |
| Add FK / CHECK | `ADD CONSTRAINT ... NOT VALID` | `VALIDATE CONSTRAINT` separately |
| Index an existing table | `COMMIT;` breakpoint → `SET lock_timeout = 0` → `CREATE INDEX CONCURRENTLY IF NOT EXISTS` (see `packages/db/scripts/migrate.ts`) | — |
| Backfill data | batched + idempotent `UPDATE` (keyset/`WHERE`, bounded) | — |

A `CREATE INDEX`, `ADD COLUMN`, or `ADD CONSTRAINT` against a table **created in the same migration** is always safe (no rows, no live traffic) — the lint already suppresses those.

## The judgment the lint can't do

The lint flags risky *shapes*; it cannot know whether a given drop is *safe right now*. For each flagged statement, do the work it can't:

1. **Is the dependency gone?** Grep the app for the table/column: search `apps/sim` and `packages` for the column name, the Drizzle field (camelCase), and the table object. If any live read/write remains, it is **not** safe — fix the code first.
2. **Did the expand already ship?** The removal of that read/write must be in a deploy that is *already out*, not this same PR. If it's in this PR, split: land the code change now, do the destructive migration in a follow-up after it deploys.
3. **Backfills:** confirm the `UPDATE`/`DELETE` is batched (bounded `WHERE`/keyset, not a single whole-table statement), idempotent (safe to replay — a failed migration re-runs unjournaled files from the top), and safe under concurrent writes from the still-live old app.

## Workflow

1. Edit `packages/db/schema.ts`, then `cd packages/db && bunx drizzle-kit generate` to produce the SQL.
2. Hand-edit the generated SQL where the playbook requires it: `CONCURRENTLY` + `COMMIT;` breakpoint for indexes on existing tables, `NOT VALID` for constraints, batching for backfills.
3. Run `bun run check:migrations` (or `bun run scripts/check-migrations-safety.ts main` locally).
- **Hard errors** (`add-not-null-no-default`, `rename`, `index-not-concurrent`, `constraint-not-valid`, …): rewrite into expand/contract. Do **not** try to annotate them away — the lint won't accept it.
- **Annotate tier** (`drop-table`, `drop-column`, `drop-default`, `set-not-null`, `alter-type`, `drop-index`): only after you've confirmed steps 1–3 above, add a comment on the line directly above the statement:
```sql
-- migration-safe: `secret` read removed in v0.6.1 (#1234), shipped two deploys ago
ALTER TABLE "webhook" DROP COLUMN "secret";
```
The reason must be specific and name the PR/version that removed the dependency. An empty reason fails the lint.
- **Warnings** (`data-backfill`): non-blocking, but confirm the batching/idempotency before merging.
4. Verify locally: `cd packages/db && bun run db:migrate` against a dev DB.

## Hard rule

Never annotate a destructive statement just to make the lint pass. The annotation is a claim that you verified the old code no longer depends on it. If you can't make that claim truthfully, the change belongs in a later deploy — tell the user to split it.
17 changes: 11 additions & 6 deletions .agents/skills/ship/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: ship
description: Commit, push, and open a PR to staging in one shot
description: Commit, push, and open a PR to staging in one shot — runs the cleanup pass and, when migrations changed, the db-migrate safety review first
---

# Ship Command
Expand All @@ -16,12 +16,17 @@ When the user runs `/ship`:
- Types: `fix`, `feat`, `improvement`, `chore`
- Scope: short identifier (e.g., `undo-redo`, `api`, `ui`)
- Keep it concise
3. **Run pre-ship checks** from the repo root before staging:
3. **Run the cleanup pass** on the current changes before anything else: `/cleanup`
- This runs the six code-quality skills (effects, memo, callbacks, state, React Query, emcn) and applies fixes, so they land in this commit.
4. **Run migration safety** — only if the diff touches `packages/db/migrations/**` or `packages/db/schema.ts`:
- Run `/db-migrate` to review the migration for zero-downtime safety (expand/contract phasing, backward-compatibility with the deployed app version).
- `bun run check:migrations` must pass. Do not silence a flagged statement with a `-- migration-safe:` annotation unless `/db-migrate` confirmed the old code no longer depends on it; otherwise split the destructive change into a later deploy.
5. **Run pre-ship checks** from the repo root before staging:
- `bun run lint` to fix formatting issues
- `bun run check:api-validation:strict` to catch boundary contract failures before CI
4. **Stage and commit** the changes with the generated message
5. **Push to origin** using the current branch name
6. **Create a PR** to staging with a description in the user's voice
6. **Stage and commit** the changes with the generated message
7. **Push to origin** using the current branch name
8. **Create a PR** to staging with a description in the user's voice

## Commit Message Format

Expand Down Expand Up @@ -77,7 +82,7 @@ gh pr create --base staging --title "COMMIT_MESSAGE" --body "PR_BODY"

- Short, direct bullet points
- No unnecessary explanation
- "Tested manually" is acceptable for testing section; include lint and boundary validation results when run
- "Tested manually" is acceptable for testing section; include lint, boundary validation, and (when migrations changed) `check:migrations` results when run
- Checkboxes filled in appropriately
- No screenshots section unless UI changes

10 changes: 10 additions & 0 deletions .github/workflows/test-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,16 @@ jobs:
- name: Verify realtime prune graph
run: bun run check:realtime-prune

- name: Migration safety (zero-downtime) audit
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
BASE_REF="origin/${{ github.base_ref }}"
git fetch --depth=1 origin "${{ github.base_ref }}" 2>/dev/null || true
else
BASE_REF="HEAD~1"
fi
bun run check:migrations "$BASE_REF"

- name: Type-check realtime server
run: bunx turbo run type-check --filter=@sim/realtime

Comment thread
TheodoreSpeaks marked this conversation as resolved.
Expand Down
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
"check:zustand-v5": "bun run scripts/check-zustand-v5-selectors.ts",
"check:react-query": "bun run scripts/check-react-query-patterns.ts --check",
"check:utils": "bun run scripts/check-utils-enforcement.ts",
"check:migrations": "bun run scripts/check-migrations-safety.ts",
"mship-contracts:generate": "bun run scripts/sync-mothership-stream-contract.ts",
"mship-contracts:check": "bun run scripts/sync-mothership-stream-contract.ts --check",
"mship-tools:generate": "bun run scripts/sync-tool-catalog.ts",
Expand Down
154 changes: 154 additions & 0 deletions scripts/check-migrations-safety.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
/**
* Run with: bun test scripts/check-migrations-safety.test.ts
* (Root scripts are bun-native and not part of the turbo/vitest workspaces.)
*/
import { describe, expect, test } from 'bun:test'
import { lintSql } from './check-migrations-safety.ts'

const rules = (sql: string) => lintSql(sql).map((f) => `${f.tier}:${f.rule}`)

describe('additive / safe', () => {
test('nullable add column passes', () => {
expect(lintSql('ALTER TABLE "webhook" ADD COLUMN "provider_config" json;')).toEqual([])
})

test('NOT NULL with DEFAULT passes', () => {
expect(lintSql('ALTER TABLE "user" ADD COLUMN "flag" boolean DEFAULT false NOT NULL;')).toEqual(
[]
)
})

test('CREATE TABLE plus index and FK on that new table passes', () => {
const sql = `CREATE TABLE "kb" ("id" text PRIMARY KEY NOT NULL, "user_id" text NOT NULL);
--> statement-breakpoint
CREATE INDEX "kb_user_id_idx" ON "kb" USING btree ("user_id");
--> statement-breakpoint
ALTER TABLE "kb" ADD CONSTRAINT "kb_user_fk" FOREIGN KEY ("user_id") REFERENCES "user"("id");`
expect(lintSql(sql)).toEqual([])
})

test('CONCURRENTLY index after a COMMIT breakpoint passes', () => {
const sql = `COMMIT;
--> statement-breakpoint
SET lock_timeout = 0;
--> statement-breakpoint
CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_x" ON "embedding" ("kb_id");`
expect(lintSql(sql)).toEqual([])
})
})

describe('hard errors', () => {
test('ADD COLUMN NOT NULL without default', () => {
expect(rules('ALTER TABLE "user" ADD COLUMN "email" text NOT NULL;')).toEqual([
'error:add-not-null-no-default',
])
})

test('RENAME column', () => {
expect(rules('ALTER TABLE "marketplace" RENAME COLUMN "executions" TO "views";')).toEqual([
'error:rename',
])
})

test('CREATE INDEX on existing table without CONCURRENTLY', () => {
expect(rules('CREATE INDEX "idx_y" ON "embedding" ("kb_id");')).toEqual([
'error:index-not-concurrent',
])
})

test('CONCURRENTLY index without IF NOT EXISTS', () => {
const sql = `COMMIT;
--> statement-breakpoint
CREATE INDEX CONCURRENTLY "idx_z" ON "embedding" ("kb_id");`
expect(rules(sql)).toEqual(['error:concurrent-index-not-idempotent'])
})

test('CONCURRENTLY index without a preceding COMMIT', () => {
expect(
rules('CREATE INDEX CONCURRENTLY IF NOT EXISTS "idx_z" ON "embedding" ("kb_id");')
).toEqual(['error:concurrent-index-no-commit'])
})

test('ADD FOREIGN KEY on existing table without NOT VALID', () => {
expect(
rules(
'ALTER TABLE "session" ADD CONSTRAINT "s_fk" FOREIGN KEY ("uid") REFERENCES "user"("id");'
)
).toEqual(['error:constraint-not-valid'])
})
})

describe('annotate tier', () => {
const drop = 'ALTER TABLE "webhook" DROP COLUMN "secret";'

test('DROP COLUMN unannotated fails', () => {
expect(rules(drop)).toEqual(['error:drop-column'])
})

test('DROP COLUMN annotated passes', () => {
const sql = `-- migration-safe: secret read removed in v0.6.1 (#1234), shipped two deploys ago\n${drop}`
expect(lintSql(sql)).toEqual([])
})

test('annotation tolerates an intervening statement-breakpoint line', () => {
const sql = `ALTER TABLE "webhook" ADD COLUMN "provider_config" json;
--> statement-breakpoint
-- migration-safe: secret read removed in v0.6.1 (#1234)
${drop}`
expect(lintSql(sql)).toEqual([])
})

test('dangling annotation with empty reason fails', () => {
const sql = `-- migration-safe:\n${drop}`
const found = lintSql(sql)
expect(found).toHaveLength(1)
expect(found[0].tier).toBe('error')
expect(found[0].message).toContain('no reason')
})

test('annotation on the wrong statement does not bleed', () => {
const sql = `-- migration-safe: removing secret
ALTER TABLE "webhook" ADD COLUMN "x" json;
--> statement-breakpoint
${drop}`
expect(rules(sql)).toEqual(['error:drop-column'])
})

test('type change and DROP TABLE are annotate-tier', () => {
expect(
rules(
'ALTER TABLE "user_table_rows" ALTER COLUMN "order_key" SET DATA TYPE text COLLATE "C";'
)
).toEqual(['error:alter-type'])
expect(rules('DROP TABLE "marketplace_execution" CASCADE;')).toEqual(['error:drop-table'])
})
})

describe('warnings (non-blocking)', () => {
test('UPDATE backfill warns but does not error', () => {
const found = lintSql('UPDATE "user_table_definitions" SET "schema" = \'{}\' WHERE id = \'1\';')
expect(found.map((f) => f.tier)).toEqual(['warn'])
})

test('UPDATE without WHERE flags the whole-table note', () => {
const found = lintSql('UPDATE "user" SET "active" = true;')
expect(found[0].tier).toBe('warn')
expect(found[0].message).toContain('no WHERE')
})
})

describe('parser robustness', () => {
test('semicolon inside a string literal does not split', () => {
expect(lintSql(`ALTER TABLE "x" ADD COLUMN "y" text DEFAULT 'a;b' NOT NULL;`)).toEqual([])
})

test('dollar-quoted DO block is one statement; FK on a new table is suppressed', () => {
const sql = `CREATE TABLE "jobs" ("id" text PRIMARY KEY NOT NULL, "wid" text NOT NULL);
--> statement-breakpoint
DO $$ BEGIN
ALTER TABLE "jobs" ADD CONSTRAINT "jobs_fk" FOREIGN KEY ("wid") REFERENCES "workspace"("id");
EXCEPTION WHEN duplicate_object THEN null;
END $$;`
expect(lintSql(sql)).toEqual([])
})
})
Loading
Loading