Skip to content

✨ feat: add checksum validation#672

Open
noetica-cloud wants to merge 17 commits into
amacneil:mainfrom
noetica-cloud:feat/checksum
Open

✨ feat: add checksum validation#672
noetica-cloud wants to merge 17 commits into
amacneil:mainfrom
noetica-cloud:feat/checksum

Conversation

@noetica-cloud
Copy link
Copy Markdown

@noetica-cloud noetica-cloud commented Aug 16, 2025

Concept

Checksum validation feature can help developers and teams be aware of wrong use of database schema migration tool by displaying or raising error when an applied migration file's content has changed since application.

Supported modes are:

  • NONE: Disable checksum validation (default).
  • LENIENT: Warn if a migration file has changed after being applied.
  • STRICT: Fail if a migration file has changed after being applied.

Why?

This feature is inspired from Liquibase features. I used Liquibase for years and I had some complains about it. I think dbmate is a simplier powerful alternative to it by removing complex abstraction layer and lightening the execution.

But one of cool features I found in Liquibase was the explicit wrong use of database schema migrations by performing a "checksum" validation, hashing the changeset content each time and comparing the resulting hash to corresponding persisted applied changeset's one.

This feature helps ensure that applied migrations remain unchanged, improving database integrity and team collaboration by explicitly raising bad practices using database schema migration tool.

Discussed in #367

What's changed

This change adds optional checksum validation for migrations.

  • New env var: DBMATE_CHECKSUM_MODE and CLI flag --checksum-mode with values NONE (default), LENIENT, STRICT.
  • dbmate add checksum column if it is not existing yet in migrations table (default schema_migrations).
  • The checksum stored is SHA-256 (hex) of the migration contents.
  • On startup, dbmate will validate applied migrations that have a stored checksum. On mismatch:
    • STRICT: migration run fails with an error.
    • LENIENT: mismatch is logged as a WARNING but migration proceeds.
    • NONE: no checks performed (default).

Past migrations will have NULL checksum in TABLE, the codebase should be sufficiently resilient to ignore past migration checksum validation.

Design notes

  • I intentionally consider version as unique file, and choose to simpliest hashing concept : hashing the overall file contents.
  • This is opt-in only: users must define a checksum mode to enable behavior; no change occurs by default except adding checksum column to migrations table.

PR notes

This is my first open-source project's pull request, as well as first Golang project. So please be strict in code review and explanatory in comments. I will do my best to fix what it have to be fixed.

I checked the "retro-compatibility" (column creation on already existing migrations table) on classic SQL Database engines and clickhouse but I couldn't check for BigQuery apparently due to emulator limitations or something I don't understand: After adding column (/field) to table (/schema), the updated metadata looks good but the next select fail saying the field "checksum" does not exist (yet?).

If someone knows more about BigQuery architecture or have an idea, please let me know.


Note

Medium Risk
Touches core migration discovery/recording and updates all DB drivers’ migrations-table DDL/DML, which could affect existing installations and schema dumps if any driver-specific edge cases exist.

Overview
Adds optional migration checksum validation controlled by new --checksum-mode / DBMATE_CHECKSUM_MODE (NONE/LENIENT/STRICT). Dbmate now computes a canonicalized SHA-256 of each migration file (BOM stripped, CRLF normalized), stores it alongside the migration version, and on subsequent runs warns or fails if an applied migration’s contents no longer match.

Extends the schema_migrations schema and driver interface to include a checksum column/value, with automatic in-place upgrade (detect + add column) for existing databases across supported drivers (Postgres/MySQL/SQLite/ClickHouse/BigQuery) and updates schema-dump output accordingly; tests and README are updated to cover the new behavior and backward compatibility.

Written by Cursor Bugbot for commit d8b8b6a. This will update automatically on new commits. Configure here.

@noetica-cloud noetica-cloud force-pushed the feat/checksum branch 2 times, most recently from 6f50b23 to 89fdceb Compare August 16, 2025 16:12
@noetica-cloud noetica-cloud marked this pull request as draft August 16, 2025 16:13
@noetica-cloud noetica-cloud force-pushed the feat/checksum branch 2 times, most recently from c9d6225 to a1496f2 Compare August 16, 2025 17:35
@noetica-cloud noetica-cloud marked this pull request as ready for review August 16, 2025 18:06
Copy link
Copy Markdown

@beanow-at-crabnebula beanow-at-crabnebula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 I was missing this functionality as well.
Not a maintainer, but two points that stood out.

Comment thread README.md Outdated
Comment on lines +618 to +620
- `NONE`: Disable checksum validation (default).
- `LENIENT`: Warn if a migration file has changed after being applied.
- `STRICT`: Fail if a migration file has changed after being applied.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `NONE`: Disable checksum validation (default).
- `LENIENT`: Warn if a migration file has changed after being applied.
- `STRICT`: Fail if a migration file has changed after being applied.
- `NONE`: Disable checksum validation (default).
- `WARN`: Warn if a migration file has changed after being applied.
- `FAIL`: Fail if a migration file has changed after being applied.

Perhaps more self-explanatory.
Also, I would suggest WARN is a good default that strikes a balance between backwards-compatible behavior and helping people avoid this footgun.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can assume that it is more self-explanatory! My first intent was to describe the "checksum strictness" rather than the direct result of the checksum validation 👍
Maybe I will let maintainers judge what they want in repository

Comment thread pkg/driver/clickhouse/clickhouse.go Outdated

func (drv *Driver) HasChecksumColumn(db *sql.DB) (bool, error) {
exists := false
err := db.QueryRow(fmt.Sprintf("SELECT 1 FROM system.columns WHERE database = '%s' AND table = '%s' AND name = 'checksum'", drv.databaseName(), drv.quotedMigrationsTableName())).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that reading system tables may require additional privileges when you're using access control.
It may be worth documenting, or considering alternative approaches.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick look at how the grafana datasource does this for enumerating fields in the query builder UI.
https://github.com/grafana/clickhouse-datasource/blob/c0fb25dce3b7d9888dba574de9d73e25f23684c8/src/data/CHDatasource.ts#L561
It uses the DESCRIBE TABLE statement.
But this does not allow filtering by column name.

Instead we could SHOW COLUMNS FROM "%s"."%s" WHERE field = 'checksum'

As far as privileges goes, this requires GRANT SHOW COLUMNS on the migration table.
Rather than GRANT SELECT system.columns.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point, I forgot this point!
I've never work with Clickhouse, I will try to test and fix that.
Thank you for the hint 🙏

@noetica-cloud noetica-cloud force-pushed the feat/checksum branch 2 times, most recently from b0add98 to 059b5d8 Compare October 4, 2025 08:01
@noetica-cloud
Copy link
Copy Markdown
Author

blocked by #679

Comment thread pkg/driver/bigquery/bigquery.go Outdated
Comment thread pkg/driver/clickhouse/clickhouse.go
Comment thread pkg/driver/clickhouse/clickhouse.go
Comment thread pkg/driver/bigquery/bigquery_test.go Outdated
Comment thread pkg/driver/bigquery/bigquery.go
Comment thread pkg/driver/mysql/mysql.go
@noetica-cloud noetica-cloud marked this pull request as draft January 30, 2026 21:07
Thomas VOISIN added 3 commits January 31, 2026 10:29
- Checkout using PR head SHA instead of head_ref to work with forks
- Only push changes when PR is from same repository (internal)
- This should allow the sync job to run without checkout errors
@noetica-cloud noetica-cloud marked this pull request as ready for review January 31, 2026 10:58
Comment thread pkg/dbmate/checksum.go
Comment thread pkg/driver/mysql/mysql.go
Comment thread pkg/driver/clickhouse/clickhouse.go Outdated
@noetica-cloud
Copy link
Copy Markdown
Author

Hi @dossy @amacneil @sofuture, could someone can do a review of my contribution? 🙂

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread pkg/driver/mysql/mysql.go
@Xynonners
Copy link
Copy Markdown

is this ever going to get merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants