This page documents the manual prek hook that verifies SQLite
migrations round-trip safely from base to head and back, plus the
disable_sqlite_fkeys(op) placement convention every migration
must follow.
If you are adding or editing a migration, the convention section is the part that affects you. The rest is operational detail for people maintaining the check itself.
Files:
scripts/in_container/run_migration_round_trip.py— the test driver.scripts/ci/prek/migration_round_trip.py— the prek hook wrapper that drives the in-container script via Breeze.
The hook is registered under stages: [manual] in
airflow-core/.pre-commit-config.yaml so it does not fire on every
local commit (a full base→head→base walk takes a few minutes). To run
it manually:
prek run migration-round-trip --hook-stage manual --all-filesCI runs it automatically on any pull request whose changed files
include airflow-core/src/airflow/migrations/ — the dedicated
Migration round-trip check job in .github/workflows/ci-amd-arm.yml
is gated on the has-migrations selective-checks output.
Migrations that rebuild a parent table via op.batch_alter_table(...)
(any table whose children declare ON DELETE CASCADE) must wrap their
entire body in disable_sqlite_fkeys(op), before any DML statement
or any rebuild-mode op.batch_alter_table(...) runs.
The exact rule (verified empirically):
PRAGMA foreign_keys = on/offonly takes effect on a connection that is currently in SQLite's autocommit mode. Pure DDL statements (ALTER TABLE,CREATE TABLE,DROP TABLE) implicitly commit and leave the connection in autocommit, so a PRAGMA after them still works. DML statements (INSERT,UPDATE,DELETE) leave the connection in an open transaction, and any PRAGMA after a DML — including the one insidedisable_sqlite_fkeys— is silently a no-op.
The trap: op.batch_alter_table is sometimes pure DDL (e.g.
add_column only — alembic uses native ALTER TABLE on SQLite)
and sometimes a rebuild that internally does
CREATE _alembic_tmp / INSERT INTO ... SELECT FROM / DROP / ALTER
RENAME. The internal INSERT makes the connection no longer
autocommit, so a subsequent disable_sqlite_fkeys block is a no-op.
Because (a) it's hard for migration authors to predict at write time
whether a particular batch_alter_table will rebuild and (b) the
wrapper costs nothing to put earlier, the practical convention is
strictly stronger than the technical rule:
Putwith disable_sqlite_fkeys(op):as the outermost block of anyupgrade/downgradethat callsop.batch_alter_tableon a parent table, before any DML or DDL. Don't try to optimise it into a smaller scope.
Correct:
def upgrade():
from airflow.migrations.utils import disable_sqlite_fkeys
with disable_sqlite_fkeys(op):
with op.batch_alter_table("dag", schema=None) as batch_op:
batch_op.add_column(sa.Column("..."))
op.execute("UPDATE dag SET ... WHERE ...")
with op.batch_alter_table("dag", schema=None) as batch_op:
batch_op.alter_column("...", nullable=False)Anti-pattern:
def upgrade():
# The first batch_alter_table here is pure add_column, so it's
# native ALTER TABLE (DDL) and on its own would leave the
# connection in autocommit. But the op.execute("UPDATE ...")
# below it is DML — it leaves the connection in a non-
# autocommit state, so the wrapper's PRAGMA is a no-op and the
# alter_column rebuild runs with foreign_keys still ON.
with op.batch_alter_table("dag", schema=None) as batch_op:
batch_op.add_column(sa.Column("..."))
op.execute("UPDATE dag SET ... WHERE ...")
with disable_sqlite_fkeys(op): # ← TOO LATE: PRAGMA is a no-op
with op.batch_alter_table("dag", schema=None) as batch_op:
batch_op.alter_column("...", nullable=False)The same convention applies to downgrade().
The driver:
- Spins up a fresh SQLite database with
PRAGMA foreign_keys = ON. - Walks every migration in topological order, base → head, one
revision at a time. Before each step it idempotently restores the
seed fixture (
log_template,dag_bundle,dag,dag_version,dag_run,task_instance) to whatever the live schema can hold, so any FK chain a rebuild needs to fire (dag → dag_version → task_instance.dag_version_id RESTRICTis the load-bearing one) is in place. - Walks every migration in reverse, head → base, the same way.
Any failure is reported with the failing direction (up /
down) and the rev id, plus the underlying exception.
A single-call airflow_db.upgradedb(to_revision=tip) walks all
migrations on a single shared SQLAlchemy connection. Once any earlier
migration's correctly-placed disable_sqlite_fkeys(op) flips
foreign_keys to off, that state stays off on the connection — and
later migrations on the same connection silently inherit the protection.
A migration that omits disable_sqlite_fkeys or places it with
the wrong placement (where its own PRAGMA is a no-op) then looks fine
in single-call mode purely because of the inherited state.
Walking each migration through its own
airflow_db.upgradedb(to_revision=specific_rev) call sidesteps that:
each call dispose_orm() + configure_orm() builds a fresh engine,
setup_event_handlers fires PRAGMA foreign_keys=ON on the new
connection, and the migration starts from a clean slate.
SEED_VALUES in run_migration_round_trip.py lists hardcoded
values for every column the seed touches across all revisions in the
chain. If a future migration adds a brand-new NOT NULL-no-default
column to one of the seeded tables (dag, dag_version,
dag_run, task_instance, dag_bundle, log_template), the
round-trip aborts with a clear RuntimeError pointing at the table
that needs an addition. Add the new column + a sensible literal value
to SEED_VALUES['<table>'] and re-run.