|
| 1 | +# Database Migrations Guide |
| 2 | + |
| 3 | +kagent uses [golang-migrate](https://github.com/golang-migrate/migrate) with embedded SQL files and [sqlc](https://sqlc.dev/) for type-safe query generation. Migrations run **in-app at startup** — the controller applies them before accepting traffic. |
| 4 | + |
| 5 | +## Structure |
| 6 | + |
| 7 | +``` |
| 8 | +go/core/pkg/migrations/ |
| 9 | +├── migrations.go # Embeds the FS (go:embed); exports FS for downstream consumers |
| 10 | +├── runner.go # RunUp (applies pending migrations at startup) |
| 11 | +├── core/ # Core schema (tracked in schema_migrations table) |
| 12 | +│ ├── 000001_initial.up.sql / .down.sql |
| 13 | +│ ├── 000002_add_session_source.up.sql / .down.sql |
| 14 | +│ └── ... |
| 15 | +└── vector/ # pgvector schema (tracked in vector_schema_migrations table) |
| 16 | + ├── 000001_vector_support.up.sql / .down.sql |
| 17 | + └── ... |
| 18 | +
|
| 19 | +go/core/internal/database/ |
| 20 | +├── queries/ # Hand-written SQL queries (source of truth) |
| 21 | +│ ├── sessions.sql |
| 22 | +│ ├── memory.sql |
| 23 | +│ └── ... |
| 24 | +├── gen/ # sqlc-generated Go code — DO NOT edit manually |
| 25 | +│ ├── db.go |
| 26 | +│ ├── models.go |
| 27 | +│ └── *.sql.go |
| 28 | +└── sqlc.yaml # sqlc configuration |
| 29 | +``` |
| 30 | + |
| 31 | +Migrations manage two independent tracks — `core` and `vector` — and roll back both if either fails. The `--database-vector-enabled` flag (default `true`) controls whether the vector track runs. |
| 32 | + |
| 33 | +## sqlc Workflow |
| 34 | + |
| 35 | +When you add or change a SQL query: |
| 36 | + |
| 37 | +1. Edit (or add) a `.sql` file under `go/core/internal/database/queries/` |
| 38 | +2. Regenerate: |
| 39 | + ```bash |
| 40 | + cd go/core/internal/database && sqlc generate |
| 41 | + ``` |
| 42 | +3. Commit both the query file and the updated `gen/` files together. |
| 43 | + |
| 44 | +A CI check (`.github/workflows/sqlc-generate-check.yaml`) fails the PR if `gen/` is out of sync with the queries. Never edit `gen/` by hand. |
| 45 | + |
| 46 | +**sqlc annotations used:** |
| 47 | +- `:one` — returns a single row |
| 48 | +- `:many` — returns a slice |
| 49 | +- `:exec` — returns only error (use for INSERT/UPDATE/DELETE that don't need the result) |
| 50 | + |
| 51 | +## Writing Migrations |
| 52 | + |
| 53 | +### Backward-compatible schema changes |
| 54 | + |
| 55 | +During a rolling deploy, old pods will be reading and writing a schema that has already been upgraded. **Every migration must be backward-compatible with the previous version's code.** |
| 56 | + |
| 57 | +| Change | Old code behavior | Safe? | |
| 58 | +|--------|------------------|-------| |
| 59 | +| Add nullable column | SELECT ignores it; INSERT omits it (goes NULL) | ✅ | |
| 60 | +| Add column with `DEFAULT x` | INSERT omits it; DB fills default | ✅ | |
| 61 | +| Add NOT NULL column **without** default | Old INSERT missing the column → error | ❌ | |
| 62 | +| Add index | Invisible to application code | ✅ | |
| 63 | +| Add foreign key | Old INSERT may fail constraint | ❌ | |
| 64 | +| Drop/rename column old code references | Old SELECT/INSERT errors | ❌ | |
| 65 | +| Change compatible type (e.g. `int` → `bigint`) | Usually fine | ⚠️ | |
| 66 | + |
| 67 | +**Expand-then-contract pattern for schema changes:** |
| 68 | +1. **Version N (Expand)**: add the new column/table (nullable or with default); old code still works |
| 69 | +2. **Version N (Deploy)**: ship new code that uses the new structure |
| 70 | +3. **Version N+1 (Contract)**: drop the old column/table once version N is fully deployed and no pods run version N-1 |
| 71 | + |
| 72 | +### Idempotency and cross-track safety |
| 73 | + |
| 74 | +All DDL statements must use `IF EXISTS` / `IF NOT EXISTS` guards: |
| 75 | + |
| 76 | +```sql |
| 77 | +-- Up |
| 78 | +CREATE TABLE IF NOT EXISTS foo (...); |
| 79 | +ALTER TABLE foo ADD COLUMN IF NOT EXISTS bar TEXT; |
| 80 | + |
| 81 | +-- Down |
| 82 | +DROP TABLE IF EXISTS foo; |
| 83 | +ALTER TABLE foo DROP COLUMN IF EXISTS bar; |
| 84 | +``` |
| 85 | + |
| 86 | +Guards provide defense-in-depth for crash recovery and dirty-state cleanup, where a partially-applied migration may be re-run or rolled back. |
| 87 | + |
| 88 | +### Naming |
| 89 | + |
| 90 | +Files must follow `NNNNNN_description.up.sql` / `NNNNNN_description.down.sql` with zero-padded 6-digit sequence numbers. |
| 91 | + |
| 92 | +### Down migrations |
| 93 | + |
| 94 | +Every `.up.sql` must have a corresponding `.down.sql` that exactly reverses it. Down migrations are used for rollbacks and by automatic rollback on migration failure. They must be **idempotent** — the two-track rollback logic (roll back core if vector fails) may call them more than once in failure scenarios. |
| 95 | + |
| 96 | +## Multi-Instance Safety |
| 97 | + |
| 98 | +### How the advisory lock works |
| 99 | + |
| 100 | +The migration runner acquires a PostgreSQL **session-level** advisory lock (`pg_advisory_lock`) before running. |
| 101 | + |
| 102 | +### Rolling deploy concurrency |
| 103 | + |
| 104 | +If multiple pods start simultaneously (e.g., rolling deploy with replicas > 1): |
| 105 | +1. One controller acquires the advisory lock and runs migrations. |
| 106 | +2. Others block on `pg_advisory_lock`. |
| 107 | +3. When the winner finishes and its connection closes, the next waiter acquires the lock, calls `Up()`, gets `ErrNoChange`, and exits immediately. |
| 108 | + |
| 109 | +This is safe. The only risk is if the winning controller crashes mid-migration (see Dirty State below). |
| 110 | + |
| 111 | +### Dirty state recovery |
| 112 | + |
| 113 | +If the controller crashes mid-migration, the migration runner records the version as `dirty = true` in the tracking table. The next startup detects dirty state and calls `rollbackToVersion`, which: |
| 114 | +1. Calls `mg.Force(version - 1)` to clear the dirty flag. |
| 115 | +2. Runs the down migration to restore the previous clean state. |
| 116 | +3. Re-runs the failed up migration. |
| 117 | + |
| 118 | +**Requirement**: down migrations must be idempotent and correctly reverse their up migration. A missing or broken down migration requires manual recovery. |
| 119 | + |
| 120 | +### Rollout strategy |
| 121 | + |
| 122 | +For backward-compatible migrations a rolling update is safe: |
| 123 | + |
| 124 | +1. New pod starts → migration runner applies pending migrations (advisory lock serializes concurrent runs) |
| 125 | +2. New pod passes readiness probe → old pod terminates |
| 126 | +3. Backward-compatible schema means old pods continue operating during the window |
| 127 | + |
| 128 | +For a migration that is **not** backward-compatible, restructure it using the expand-then-contract pattern (add new column/table in version N, ship code that uses it, drop the old column in version N+1). |
| 129 | + |
| 130 | +## Static Analysis Enforcement |
| 131 | + |
| 132 | +The policies above are enforced by static analysis tests in `go/core/pkg/migrations/cross_track_test.go`. These run against the embedded SQL files — no database required. |
| 133 | + |
| 134 | +| Test | What it enforces | |
| 135 | +|------|-----------------| |
| 136 | +| `TestNoCrossTrackDDL` | No track may `ALTER TABLE` or `CREATE INDEX ON` a table owned by another track | |
| 137 | +| `TestMigrationGuards` | Up migrations must use `IF NOT EXISTS` on all `CREATE`/`ADD COLUMN`; down migrations must use `IF EXISTS` on all `DROP` statements | |
| 138 | + |
| 139 | +**Adding a new track**: add the track directory name to the `tracks` slice in each test so the new track is covered by the same checks. |
| 140 | + |
| 141 | +These tests catch policy violations at PR time without needing a running database. They complement the integration tests in `runner_test.go`, which verify the runner's rollback and concurrency behavior against a real Postgres instance. |
| 142 | + |
| 143 | +## Downstream Extension Model |
| 144 | + |
| 145 | +The migration layer is designed for downstream consumers to extend with their own migrations alongside OSS. The extension points are: |
| 146 | + |
| 147 | +1. **SQL files as the contract.** The migration files in `go/core/pkg/migrations/core/` and `vector/` are the stable interface. Downstream consumers sync these files into their own repos and build their own migration runners. Don't move or reorganize migration file paths without considering downstream impact. |
| 148 | + |
| 149 | +2. **`MigrationRunner` DI callback.** Downstream consumers pass a custom `MigrationRunner` to `app.Start` to take full ownership of the migration process — running OSS migrations alongside their own in whatever order they need. The signature `func(ctx context.Context, url string, vectorEnabled bool) error` is stable. |
| 150 | + |
| 151 | +3. **Vector track stays separate.** The vector track is conditionally applied and has its own tracking table. Downstream extensions should not modify vector-owned tables (enforced by `TestNoCrossTrackDDL`). |
| 152 | + |
| 153 | +### What this means for OSS development |
| 154 | + |
| 155 | +- **Migration immutability is cross-repo.** Once a migration file is merged and tagged, downstream consumers may have synced it. Modifying it breaks their tracking table state. |
0 commit comments