fix(migrations): repair mental_models.subtype at current head (#1553)#1627
Merged
Conversation
Three production deployments (issue #1553, plus confirmations from @4Lienau and @khanhduyvt0101) report `column "subtype" of relation "mental_models" does not exist` on `create_mental_model`, despite their alembic_version showing the current head `m3rg3h3ad5f6`. Both h3c4d5e6f7g8_mental_models_v4 (which uses `CREATE TABLE IF NOT EXISTS` and is a no-op on databases that came through the reflections rename) and d5y6z7a8b9c0_backfill_mental_models_subtype were meant to ensure the column exists, but on these specific deployments neither fired successfully — likely a casualty of the divergent-heads reorganization that put d5y6z7a8b9c0 on a branch the affected DBs bypassed. Add a new migration at the current head so every stuck deployment picks it up on next container start. Idempotent (`ADD COLUMN IF NOT EXISTS`), guarded by an existence check on the table, and matches the canonical v4 column set and CHECK allowlist from d5y6z7a8b9c0. PG-only: Oracle's baseline creates mental_models with a different topology and constraint shape, so this repair does not apply there.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #1553.
Three production deployments — @mvessair-hive (OP), @4Lienau, @khanhduyvt0101 — report
column "subtype" of relation "mental_models" does not existoncreate_mental_modeleven thoughalembic_versionis atm3rg3h3ad5f6and the container logsDatabase migrations completed successfully.mental_modelsis stuck in a v3-shaped schema missing six v4 columns:subtype,description,entity_id,observations,links,last_updated.Root cause (confirmed from git history)
The chain was retroactively edited to insert a new migration "behind" the v0.5.6 head. Databases that completed migrations on v0.5.6 advanced to a head that, at that time, did not have the column-add as an ancestor. When they later upgraded to v0.6.0/v0.6.1, the new chain claimed that migration was already applied — but it never actually ran.
The mechanism, step by step:
v0.5.6 (2026-04-28) — head of the migration chain was
i4j5k6l7m8n9, withdown_revision = "8c6fa6f7230b".d5y6z7a8b9c0_backfill_mental_models_subtypedid not exist in this release (it was on a feature branch, not yet merged into a tagged release).Any user who deployed v0.5.6 and let migrations complete ended up with
alembic_version = i4j5k6l7m8n9, andmental_modelsin v3 shape — withoutsubtype, because:t5o6p7q8r9s0_rename_mental_models_to_observations(a prior ancestor) renamedreflections → mental_models, leaving the table withoutsubtypeh3c4d5e6f7g8_mental_models_v4usedCREATE TABLE IF NOT EXISTS— a no-op on the already-existing renamed table — so thesubtypecolumn was never addedd5y6z7a8b9c0) didn't exist yetOn 2026-04-29, PR fix(oracle): restore PG query semantics and clean up migration chain #1312 (
76bcd931 fix(oracle): restore PG query semantics and clean up migration chain) rewrotei4j5k6l7m8n9's parent:The intent was to insert
d5y6z7a8b9c0into the chain so any future migration runs would apply it. But for databases that had already advanced pasti4j5k6l7m8n9, this insertion is invisible — alembic treats all ancestors of the current state as "applied," and per the rewritten graph,d5y6z7a8b9c0is now an ancestor ofi4j5k6l7m8n9.v0.6.0 (2026-05-05) and v0.6.1 (2026-05-08) shipped with the rewritten chain. When a stuck v0.5.6 user upgrades, alembic walks forward from
i4j5k6l7m8n9and applies migrations downstream — never going back to applyd5y6z7a8b9c0. The DB advances tom3rg3h3ad5f6(the new head added in v0.6.0) butsubtypeis never added.End state:
alembic_version = m3rg3h3ad5f6,mental_modelsmissing six v4 columns,create_mental_model500s.Three confirmed reports is consistent with this — v0.5.6 was the current release for one week (2026-04-28 to 2026-05-05). Any deployment that ran migrations during that window now hits the trap on upgrade to 0.6.x.
Lesson: rewriting a migration's
down_revisionto insert a new migration retroactively does not retroactively apply it to databases past that point. Net-new migrations should be added as a new head, not spliced into the middle of an existing chain.Fix
Add a new migration at the current head —
86f7a033d372,down_revision = "m3rg3h3ad5f6"— so every affected deployment picks it up on next container start. It mirrors the column-add block fromd5y6z7a8b9c0and is fully idempotent:ALTER TABLEusesADD COLUMN IF NOT EXISTSDO $$ ... IF EXISTS (information_schema.tables ...) ... END $$;so it skips databases that predate themental_modelstableUPDATE ... SET subtype = 'structural' WHERE subtype = 'directive'normalizes any leftover'directive'rows from theo0j1k2l3m4n5directive-only branch so the CHECK constraint add succeeds('structural', 'emergent', 'pinned', 'learned')— same value set asd5y6z7a8b9c0andh3c4d5e6f7g8Databases that already received the columns from
d5y6z7a8b9c0see this as a no-op. Databases stuck atm3rg3h3ad5f6without the columns get them on next migration run.Scope
PG-only. Oracle's baseline (
o1a2b3c4d5e6_oracle_baseline.py) createsmental_modelswith its own topology andchk_mm_subtype CHECK (subtype IN ('directive', 'pinned')), so this PG-shaped repair does not apply there. Therun_for_dialect(pg=_pg_upgrade)dispatcher omits the Oracle slot intentionally per theDialect-only migrationsguidance in CLAUDE.md.Test plan
uv run pytest tests/test_migration_shape.py— 64/64 passed including the new file's parametrized caseuv run ruff checkon the new migration — cleanalembic_versiontom3rg3h3ad5f6, confirmedINSERT INTO mental_models (..., subtype, ...)reproduces the exact 500 error from Migration chain stuck on m3rg3h3ad5f6 — alembic reports success but doesn't move head; create_mental_model 500s on missing subtype column #1553, then ranalembic upgrade heads. Result: DB advances to86f7a033d372, all six columns present, CHECK constraint recreated, index recreated, INSERT succeeds,subtype='structural'accepted,subtype='INVALID'rejected.alembic upgrade headson the already-fixed DB is a no-op and doesn't disturb existing rows.mental_modelswith rows wheresubtype='directive'(simulating databases that came through theo0j1k2l3m4n5branch). Migration normalizes them to'structural'so the CHECK constraint applies cleanly. No data loss.CI
Two unrelated failures on the run:
test-python-client-oracle—ORA-01659: unable to allocate MINEXTENTS beyond 1 in tablespace HINDSIGHT_TS. Infrastructure issue in the Oracle CI container's tablespace, unrelated to this PR. The migration isrun_for_dialect(pg=_pg_upgrade)so it's a no-op on Oracle.LLM acceptance (gemini/gemini-2.5-flash-lite)— transient Gemini API flake (1m58s runtime suggests early API failure, not a real test failure). Similar flake seen on PR feat(claude-agent-sdk): add Claude Agent SDK integration #1582.test-api(the Postgres pytest job that actually runs the migration) passed in 17m40s. All Docker builds, all four Python version matrix builds, all client SDK builds, and all doc-example test matrices passed.Note for affected users
After the next release containing this PR, the migration runs automatically on container start and adds the missing columns + CHECK constraint + index. No manual SQL needed. Existing user-curated mental models are preserved (each row gets
subtype = 'structural'by default).