When developing features, you may need to persist information to the metadata database. Airflow has Alembic built-in module to handle all schema changes. Alembic must be installed on your development machine before continuing with migration. If you had made changes to the ORM, you will need to generate a new migration file. This file will contain the changes to the database schema that you have made. To generate a new migration file, run the following:
# starting at the root of the project
# Use breeze:
$ breeze generate-migration-file -m "add new field to db"
# Or, go to the airflow directory and use alembic directly:
$ breeze --backend postgres
$ cd airflow-core/src/airflow
$ alembic revision -m "add new field to db" --autogenerate
Generating
~/airflow-core/src/airflow/migrations/versions/a1e23c41f123_add_new_field_to_db.pyNote that migration file names are standardized by prek hook update-migration-references, so that they sort alphabetically and indicate
the Airflow version in which they first appear (the alembic revision ID is removed). As a result you should expect to see a prek failure
on the first attempt. Just stage the modified file and commit again
(or run the hook manually before committing).
After your new migration file is run through prek hook it will look like this:
1234_A_B_C_add_new_field_to_db.py
This represents that your migration is the 1234th migration and expected for release in Airflow version A.B.C.
Warning
In rare cases, you may need to manually modify the migration logic of your auto-generated migration script. If you must make manual changes to your migration script, you must ensure you're not referencing any ORM classes within your migration script. Directly referring to an ORM class definition within a migration script can lead to unexpected and / or broken downgrade pathways in the future, as described here.
When rebasing your branch onto the latest main, you may encounter conflicts in certain files. This often happens when another PR updates the Metadata Database and is merged before yours.
The affected files may include:
docs/apache-airflow/migrations-ref.rstairflow/migrations/versions/1234_A_B_C_<your_migration_name>.pyThere should be another file,
1234_A_B_C_<other_migration_name>.py, with the same1234_A_B_Cprefix.
To resolve these conflicts:
- First, resolve all conflicts except those in the files listed above. This includes conflicts in other
.pyfiles within theairflow/ortests/directories. - Then, run the following command to automatically update the affected files:
prek update-migration-references --all-files- Add the updated files to the staging area and continue with the rebase.
Note
The ERD diagram (airflow_erd.svg) is no longer committed to the repository. It is
automatically generated during the documentation build by the generate_erd Sphinx extension.
The various CI migration tests are defined in .github/actions/migration_tests/action.yml. These tests ensure the
database upgrades and downgrades are still functional from the lowest supported source migration version, to the latest version,
and back down to the former. To run any of those CI tests on your machine, you can:
- Copy the relevant command (specified by the
runkey for the relevant CI job), and replace the environment variable references with their literal values defined in the siblingenvsection. - Run the command you created from step 1, troubleshooting errors as needed.
Migrations that rebuild a parent table via op.batch_alter_table must wrap their entire body in
disable_sqlite_fkeys(op) before any DML or DDL opens an implicit transaction — otherwise the
wrapper's PRAGMA is a no-op and the rebuild's implicit DROP TABLE cascade-deletes child rows
(or aborts on a RESTRICT chain). The placement convention and the round-trip prek hook that
enforces it are documented in Migration round-trip regression check.
Airflow 3.0.0 introduces a new feature that allows you to hook your application into Airflow's migration process. This feature is useful if you have a custom database schema that you want to migrate along with Airflow's schema. This guide will show you how to hook your application into Airflow's migration process.
To hook your application into Airflow's migration process, you need to subclass the BaseDBManager class from the
airflow.utils.db_manager module. This class provides methods for running Alembic migrations.
At the root of your application, run "alembic init migrations" to create a new migrations directory. Set the
version_table variable in the env.py file to the name of the table that stores the migration history. Specify this
version_table in the version_table argument of the alembic's context.configure method of the run_migration_online
and run_migration_offline functions. This will ensure that your application's migrations are stored in a separate
table from Airflow's migrations.
Next, define an include_object function in the env.py that ensures that only your application's metadata is included in the application's
migrations. This too should be specified in the context.configure method of the run_migration_online and run_migration_offline.
Next, set the config_file not to disable existing loggers:
if config.config_file_name is not None:
fileConfig(config.config_file_name, disable_existing_loggers=False)Replace the content of your application's alembic.ini file with Airflow's alembic.ini copy.
If the above is not clear, you might want to look at the FAB implementation of this migration.
After setting up those, and you want Airflow to run the migration for you when running airflow db migrate then you need to
add your DBManager to the [core] external_db_managers configuration.
You can also learn how to setup your Node environment if you want to develop Airflow UI.