✨ feat: add checksum validation#672
Conversation
6f50b23 to
89fdceb
Compare
c9d6225 to
a1496f2
Compare
beanow-at-crabnebula
left a comment
There was a problem hiding this comment.
👏 I was missing this functionality as well.
Not a maintainer, but two points that stood out.
| - `NONE`: Disable checksum validation (default). | ||
| - `LENIENT`: Warn if a migration file has changed after being applied. | ||
| - `STRICT`: Fail if a migration file has changed after being applied. |
There was a problem hiding this comment.
| - `NONE`: Disable checksum validation (default). | |
| - `LENIENT`: Warn if a migration file has changed after being applied. | |
| - `STRICT`: Fail if a migration file has changed after being applied. | |
| - `NONE`: Disable checksum validation (default). | |
| - `WARN`: Warn if a migration file has changed after being applied. | |
| - `FAIL`: Fail if a migration file has changed after being applied. |
Perhaps more self-explanatory.
Also, I would suggest WARN is a good default that strikes a balance between backwards-compatible behavior and helping people avoid this footgun.
There was a problem hiding this comment.
I can assume that it is more self-explanatory! My first intent was to describe the "checksum strictness" rather than the direct result of the checksum validation 👍
Maybe I will let maintainers judge what they want in repository
|
|
||
| func (drv *Driver) HasChecksumColumn(db *sql.DB) (bool, error) { | ||
| exists := false | ||
| err := db.QueryRow(fmt.Sprintf("SELECT 1 FROM system.columns WHERE database = '%s' AND table = '%s' AND name = 'checksum'", drv.databaseName(), drv.quotedMigrationsTableName())). |
There was a problem hiding this comment.
Keep in mind that reading system tables may require additional privileges when you're using access control.
It may be worth documenting, or considering alternative approaches.
There was a problem hiding this comment.
A quick look at how the grafana datasource does this for enumerating fields in the query builder UI.
https://github.com/grafana/clickhouse-datasource/blob/c0fb25dce3b7d9888dba574de9d73e25f23684c8/src/data/CHDatasource.ts#L561
It uses the DESCRIBE TABLE statement.
But this does not allow filtering by column name.
Instead we could SHOW COLUMNS FROM "%s"."%s" WHERE field = 'checksum'
As far as privileges goes, this requires GRANT SHOW COLUMNS on the migration table.
Rather than GRANT SELECT system.columns.
There was a problem hiding this comment.
Yes good point, I forgot this point!
I've never work with Clickhouse, I will try to test and fix that.
Thank you for the hint 🙏
b0add98 to
059b5d8
Compare
|
blocked by #679 |
84b471b to
2800ac6
Compare
- Checkout using PR head SHA instead of head_ref to work with forks - Only push changes when PR is from same repository (internal) - This should allow the sync job to run without checkout errors
5bcd93b to
41c7fec
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
is this ever going to get merged? |
Concept
Checksum validation feature can help developers and teams be aware of wrong use of database schema migration tool by displaying or raising error when an applied migration file's content has changed since application.
Supported modes are:
NONE: Disable checksum validation (default).LENIENT: Warn if a migration file has changed after being applied.STRICT: Fail if a migration file has changed after being applied.Why?
This feature is inspired from Liquibase features. I used Liquibase for years and I had some complains about it. I think dbmate is a simplier powerful alternative to it by removing complex abstraction layer and lightening the execution.
But one of cool features I found in Liquibase was the explicit wrong use of database schema migrations by performing a "checksum" validation, hashing the changeset content each time and comparing the resulting hash to corresponding persisted applied changeset's one.
This feature helps ensure that applied migrations remain unchanged, improving database integrity and team collaboration by explicitly raising bad practices using database schema migration tool.
Discussed in #367
What's changed
This change adds optional checksum validation for migrations.
DBMATE_CHECKSUM_MODEand CLI flag--checksum-modewith values NONE (default), LENIENT, STRICT.schema_migrations).Past migrations will have NULL checksum in TABLE, the codebase should be sufficiently resilient to ignore past migration checksum validation.
Design notes
PR notes
This is my first open-source project's pull request, as well as first Golang project. So please be strict in code review and explanatory in comments. I will do my best to fix what it have to be fixed.
I checked the "retro-compatibility" (column creation on already existing migrations table) on classic SQL Database engines and clickhouse but I couldn't check for BigQuery apparently due to emulator limitations or something I don't understand: After adding column (/field) to table (/schema), the updated metadata looks good but the next select fail saying the field "checksum" does not exist (yet?).
If someone knows more about BigQuery architecture or have an idea, please let me know.
Note
Medium Risk
Touches core migration discovery/recording and updates all DB drivers’ migrations-table DDL/DML, which could affect existing installations and schema dumps if any driver-specific edge cases exist.
Overview
Adds optional migration checksum validation controlled by new
--checksum-mode/DBMATE_CHECKSUM_MODE(NONE/LENIENT/STRICT). Dbmate now computes a canonicalized SHA-256 of each migration file (BOM stripped, CRLF normalized), stores it alongside the migration version, and on subsequent runs warns or fails if an applied migration’s contents no longer match.Extends the
schema_migrationsschema and driver interface to include achecksumcolumn/value, with automatic in-place upgrade (detect + add column) for existing databases across supported drivers (Postgres/MySQL/SQLite/ClickHouse/BigQuery) and updates schema-dump output accordingly; tests and README are updated to cover the new behavior and backward compatibility.Written by Cursor Bugbot for commit d8b8b6a. This will update automatically on new commits. Configure here.