Skip to content

Commit 31f7893

Browse files
committed
Merge origin/staging into dev
Resolve the add/add conflict in companion-pr-check.yml by keeping dev's directional-label version (requires-mothership-merge / requires-sim-merge), a strict superset of staging's has-companion version. Brings dev up to date with staging's other changes.
2 parents 9494744 + 0c226b9 commit 31f7893

61 files changed

Lines changed: 4364 additions & 119 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/skills/db-migrate/SKILL.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
name: db-migrate
3+
description: Author or review a Drizzle DB migration for zero-downtime safety — expand/contract phasing, backward-compatibility with the deployed app version, and writing the `-- migration-safe` acknowledgment the check:migrations lint requires. Use when adding/editing files under `packages/db/migrations/` or changing `packages/db/schema.ts`.
4+
---
5+
6+
# DB Migrate Skill
7+
8+
You make schema changes that survive a deploy without downtime. The `check:migrations` lint (`scripts/check-migrations-safety.ts`) is the deterministic gate; you are the judgment that decides whether a flagged change is actually safe and writes the annotation that satisfies it.
9+
10+
## The window (why this matters)
11+
12+
A deploy runs the migration, then rolls out the new app image via blue/green. The two are **not atomic and cannot be** — during cutover the old task set keeps serving against the **already-migrated** schema. So:
13+
14+
> Every migration must be backward-compatible with the app version that is *already deployed*.
15+
16+
If a migration drops a column the old code still reads, renames one, or adds a `NOT NULL` the old inserts don't populate, the old code throws until traffic fully shifts — the downtime we're guarding against. You can't fix this by reordering the pipeline; the only fix is discipline.
17+
18+
## Expand / contract
19+
20+
Split every breaking change across **two deploys**:
21+
22+
1. **Expand** (this PR): additive, backward-compatible schema + code that tolerates *both* the old and new shape.
23+
2. **Contract** (a later PR, after expand is fully deployed): remove the old thing, now that nothing reads it.
24+
25+
Never put expand and contract in the same PR. If this PR both removes the code that used a column *and* drops the column, the old code is still live during cutover — split it.
26+
27+
### Per-operation playbook
28+
29+
| You want to | Do (deploy 1 = expand) | Do (deploy 2 = contract) |
30+
|---|---|---|
31+
| Add a required column | `ADD COLUMN` nullable or `DEFAULT`; code writes it | backfill, then `SET NOT NULL` |
32+
| Rename a column/table | add the new name; code dual-writes / reads new-then-old | drop the old name |
33+
| Drop a column/table | stop all reads/writes in code; ship it | `DROP` (annotate) |
34+
| Change a column type | add a new column of the new type; dual-write | backfill, swap reads, drop old |
35+
| Add FK / CHECK | `ADD CONSTRAINT ... NOT VALID` | `VALIDATE CONSTRAINT` separately |
36+
| Index an existing table | `COMMIT;` breakpoint → `SET lock_timeout = 0``CREATE INDEX CONCURRENTLY IF NOT EXISTS` (see `packages/db/scripts/migrate.ts`) ||
37+
| Drop an index | `COMMIT;` breakpoint → `DROP INDEX CONCURRENTLY` — plain `DROP INDEX` takes ACCESS EXCLUSIVE on the table ||
38+
| Backfill data | batched + idempotent `UPDATE` (keyset/`WHERE`, bounded) ||
39+
40+
A `CREATE INDEX`, `ADD COLUMN`, or `ADD CONSTRAINT` against a table **created in the same migration** is always safe (no rows, no live traffic) — the lint already suppresses those.
41+
42+
## Tracking the contract (don't let it rot)
43+
44+
The contract half is deferred to a later deploy — and that is exactly when it gets forgotten, leaving dead columns, orphaned tables, and `NOT NULL`s that never land. Every deferred contract must become a durable, greppable TODO.
45+
46+
When an expand defers a drop, leave a **`contract-pending`** marker on the legacy column/table in `packages/db/schema.ts` — that is the file you will be editing when you finally do the drop, so the reminder lives where the work happens:
47+
48+
```ts
49+
// contract-pending(after #5035 is fully deployed): drop once permission-check.ts stops reading it
50+
workspaceId: text('workspace_id'),
51+
```
52+
53+
Format: `contract-pending(<precondition>): <what to drop> — <why it's safe once the precondition holds>`. The precondition names the PR/release that removes the last reader and **must be fully deployed** before the contract ships.
54+
55+
- **The TODO list is a grep** — always accurate, never drifts: `grep -rn "contract-pending" packages/db apps/sim`. Run it when starting migration work to see what is owed.
56+
- For anything with a real owner or schedule, also open a tracking issue and put its number in the marker.
57+
- **Close the loop in the contract PR:** the contract migration's `-- migration-safe:` annotation references the expand, and you **delete the `contract-pending` marker** in the same PR:
58+
```sql
59+
-- migration-safe: contract of #5035 — workspace_id readers removed there, deployed 2026-06-10
60+
ALTER TABLE "permission_group" DROP COLUMN "workspace_id";
61+
```
62+
- An expand merged **without** a marker for the drop it defers, or a contract merged **without** removing its marker, is a bug — flag it in review.
63+
64+
## The judgment the lint can't do
65+
66+
The lint flags risky *shapes*; it cannot know whether a given drop is *safe right now*. For each flagged statement, do the work it can't:
67+
68+
1. **Is the dependency gone?** Grep the app for the table/column: search `apps/sim` and `packages` for the column name, the Drizzle field (camelCase), and the table object. If any live read/write remains, it is **not** safe — fix the code first.
69+
2. **Did the expand already ship?** The removal of that read/write must be in a deploy that is *already out*, not this same PR. If it's in this PR, split: land the code change now, do the destructive migration in a follow-up after it deploys.
70+
3. **Backfills:** confirm the `UPDATE`/`DELETE` is batched (bounded `WHERE`/keyset, not a single whole-table statement), idempotent (safe to replay — a failed migration re-runs unjournaled files from the top), and safe under concurrent writes from the still-live old app.
71+
72+
## Workflow
73+
74+
1. Edit `packages/db/schema.ts`, then `cd packages/db && bunx drizzle-kit generate` to produce the SQL. If this is an expand that defers a drop, leave a `contract-pending` marker on the legacy column (see "Tracking the contract"). If this is the contract, delete the marker it resolves.
75+
2. Hand-edit the generated SQL where the playbook requires it: `CONCURRENTLY` + `COMMIT;` breakpoint for indexes on existing tables, `NOT VALID` for constraints, batching for backfills.
76+
3. Run `bun run check:migrations` (base defaults to `origin/staging`).
77+
- **Hard errors** (`add-not-null-no-default`, `rename`, `index-not-concurrent`, `constraint-not-valid`, …): rewrite into expand/contract. Do **not** try to annotate them away — the lint won't accept it.
78+
- **Annotate tier** (`drop-table`, `drop-column`, `drop-default`, `set-not-null`, `alter-type`, `drop-index`): only after you've confirmed steps 1–3 above, add a comment on the line directly above the statement:
79+
```sql
80+
-- migration-safe: `secret` read removed in v0.6.1 (#1234), shipped two deploys ago
81+
ALTER TABLE "webhook" DROP COLUMN "secret";
82+
```
83+
The reason must be specific and name the PR/version that removed the dependency. An empty reason fails the lint.
84+
- **Warnings** (`data-backfill`): non-blocking, but confirm the batching/idempotency before merging.
85+
4. Verify locally: `cd packages/db && bun run db:migrate` against a dev DB.
86+
87+
## Hard rule
88+
89+
Never annotate a destructive statement just to make the lint pass. The annotation is a claim that you verified the old code no longer depends on it. If you can't make that claim truthfully, the change belongs in a later deploy — tell the user to split it.

.agents/skills/ship/SKILL.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: ship
3-
description: Commit, push, and open a PR to staging in one shot
3+
description: Commit, push, and open a PR to staging in one shot — runs the cleanup pass and, when migrations changed, the db-migrate safety review first
44
---
55

66
# Ship Command
@@ -16,12 +16,17 @@ When the user runs `/ship`:
1616
- Types: `fix`, `feat`, `improvement`, `chore`
1717
- Scope: short identifier (e.g., `undo-redo`, `api`, `ui`)
1818
- Keep it concise
19-
3. **Run pre-ship checks** from the repo root before staging:
19+
3. **Run the cleanup pass** — only if the diff modifies UI code (any `.tsx` file, or anything under `apps/sim/components/`, `apps/sim/hooks/`, or `apps/sim/stores/`): `/cleanup`
20+
- The six code-quality skills (effects, memo, callbacks, state, React Query, emcn) only apply to React code, so skip this step entirely when no UI was touched. When it runs, it applies fixes so they land in this commit.
21+
4. **Run migration safety** — only if the diff touches `packages/db/migrations/**` or `packages/db/schema.ts`:
22+
- Run `/db-migrate` to review the migration for zero-downtime safety (expand/contract phasing, backward-compatibility with the deployed app version).
23+
- `bun run check:migrations origin/staging` must pass (staging is the PR base). Do not silence a flagged statement with a `-- migration-safe:` annotation unless `/db-migrate` confirmed the old code no longer depends on it; otherwise split the destructive change into a later deploy.
24+
5. **Run pre-ship checks** from the repo root before staging:
2025
- `bun run lint` to fix formatting issues
2126
- `bun run check:api-validation:strict` to catch boundary contract failures before CI
22-
4. **Stage and commit** the changes with the generated message
23-
5. **Push to origin** using the current branch name
24-
6. **Create a PR** to staging with a description in the user's voice
27+
6. **Stage and commit** the changes with the generated message
28+
7. **Push to origin** using the current branch name
29+
8. **Create a PR** to staging with a description in the user's voice
2530

2631
## Commit Message Format
2732

@@ -77,7 +82,7 @@ gh pr create --base staging --title "COMMIT_MESSAGE" --body "PR_BODY"
7782

7883
- Short, direct bullet points
7984
- No unnecessary explanation
80-
- "Tested manually" is acceptable for testing section; include lint and boundary validation results when run
85+
- "Tested manually" is acceptable for testing section; include lint, boundary validation, and (when migrations changed) `check:migrations` results when run
8186
- Checkboxes filled in appropriately
8287
- No screenshots section unless UI changes
8388

.github/workflows/test-build.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,16 @@ jobs:
118118
- name: Verify realtime prune graph
119119
run: bun run check:realtime-prune
120120

121+
- name: Migration safety (zero-downtime) audit
122+
run: |
123+
if [ "${{ github.event_name }}" = "pull_request" ]; then
124+
BASE_REF="origin/${{ github.base_ref }}"
125+
git fetch --depth=1 origin "${{ github.base_ref }}" 2>/dev/null || true
126+
else
127+
BASE_REF="HEAD~1"
128+
fi
129+
bun run check:migrations "$BASE_REF"
130+
121131
- name: Type-check realtime server
122132
run: bunx turbo run type-check --filter=@sim/realtime
123133

0 commit comments

Comments
 (0)