Skip to content

Commit e3de7d7

Browse files
authored
Merge pull request #22 from Shopify/sync-upstream-2026-01-22
Sync upstream 2026-01-22
2 parents 14d6691 + e35272d commit e3de7d7

17 files changed

Lines changed: 518 additions & 41 deletions

doc/command-line-flags.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@ By default, `gh-ost` would like you to connect to a replica, from where it figur
2424

2525
If, for some reason, you do not wish `gh-ost` to connect to a replica, you may connect it directly to the master and approve this via `--allow-on-master`.
2626

27+
### allow-setup-metadata-lock-instruments
28+
29+
`--allow-setup-metadata-lock-instruments` allows gh-ost to enable the [`metadata_locks`](https://dev.mysql.com/doc/refman/8.0/en/performance-schema-metadata-locks-table.html) table in `performance_schema`, if it is not already enabled. This is used for a safety check before cut-over.
30+
See also: [`skip-metadata-lock-check`](#skip-metadata-lock-check)
31+
2732
### approve-renamed-columns
2833

2934
When your migration issues a column rename (`change column old_name new_name ...`) `gh-ost` analyzes the statement to try and associate the old column name with new column name. Otherwise, the new structure may also look like some column was dropped and another was added.
@@ -247,6 +252,13 @@ Defaults to an auto-determined and advertised upon startup file. Defines Unix so
247252

248253
By default `gh-ost` verifies no foreign keys exist on the migrated table. On servers with large number of tables this check can take a long time. If you're absolutely certain no foreign keys exist (table does not reference other table nor is referenced by other tables) and wish to save the check time, provide with `--skip-foreign-key-checks`.
249254

255+
### skip-metadata-lock-check
256+
257+
By default `gh-ost` performs a check before the cut-over to ensure the rename session holds the exclusive metadata lock on the table. In case `performance_schema.metadata_locks` cannot be enabled on your setup, this check can be skipped with `--skip-metadata-lock-check`.
258+
:warning: Disabling this check involves the small chance of data loss in case a session accesses the ghost table during cut-over. See https://github.com/github/gh-ost/pull/1536 for details.
259+
260+
See also: [`allow-setup-metadata-lock-instruments`](#allow-setup-metadata-lock-instruments)
261+
250262
### skip-strict-mode
251263

252264
By default `gh-ost` enforces STRICT_ALL_TABLES sql_mode as a safety measure. In some cases this changes the behaviour of other modes (namely ERROR_FOR_DIVISION_BY_ZERO, NO_ZERO_DATE, and NO_ZERO_IN_DATE) which may lead to errors during migration. Use `--skip-strict-mode` to explicitly tell `gh-ost` not to enforce this. **Danger** This may have some unexpected disastrous side effects.

doc/hooks.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ The following variables are available on all hooks:
7777
- `GH_OST_HOOKS_HINT_OWNER` - copy of `--hooks-hint-owner` value
7878
- `GH_OST_HOOKS_HINT_TOKEN` - copy of `--hooks-hint-token` value
7979
- `GH_OST_DRY_RUN` - whether or not the `gh-ost` run is a dry run
80+
- `GH_OST_REVERT` - whether or not `gh-ost` is running in revert mode
8081

8182
The following variable are available on particular hooks:
8283

doc/resume.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
- The first `gh-ost` process was invoked with `--checkpoint`
55
- The first `gh-ost` process had at least one successful checkpoint
66
- The binlogs from the last checkpoint's binlog coordinates still exist on the replica gh-ost is inspecting (specified by `--host`)
7+
- The checkpoint table (name ends with `_ghk`) still exists
78

89
To resume, invoke `gh-ost` again with the same arguments with the `--resume` flag.
910

doc/revert.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Reverting Migrations
2+
3+
`gh-ost` can attempt to revert a previously completed migration if the follow conditions are met:
4+
- The first `gh-ost` process was invoked with `--checkpoint`
5+
- The checkpoint table (name ends with `_ghk`) still exists
6+
- The binlogs from the time of the migration's cut-over still exist on the replica gh-ost is inspecting (specified by `--host`)
7+
8+
To revert, find the name of the "old" table from the original migration e.g. `_mytable_del`. Then invoke `gh-ost` with the same arguments and the flags `--revert` and `--old-table="_mytable_del"`.
9+
gh-ost will read the binlog coordinates of the original cut-over from the checkpoint table and bring the old table up to date. Then it performs another cut-over to complete the reversion.
10+
Note that the checkpoint table (name ends with _ghk) will not be automatically dropped unless `--ok-to-drop-table` is provided.
11+
12+
> [!WARNING]
13+
> It is recommended use `--checkpoint` with `--gtid` enabled so that checkpoint binlog coordinates store GTID sets rather than file positions. In that case, `gh-ost` can revert using a different replica than it originally attached to.
14+
15+
### ❗ Note ❗
16+
Reverting is roughly equivalent to applying the "reverse" migration. _Before attempting to revert you should determine if the reverse migration is possible and does not involve any unacceptable data loss._
17+
18+
For example: if the original migration drops a `NOT NULL` column that has no `DEFAULT` then the reverse migration adds the column. In this case, the reverse migration is impossible if rows were added after the original cut-over and the revert will fail.
19+
Another example: if the original migration modifies a `VARCHAR(32)` column to `VARCHAR(64)`, the reverse migration truncates the `VARCHAR(64)` column to `VARCHAR(32)`. If values were inserted with length > 32 after the cut-over then the revert will fail.
20+
21+
22+
## Example
23+
The migration starts with a `gh-ost` invocation such as:
24+
```shell
25+
gh-ost \
26+
--chunk-size=100 \
27+
--host=replica1.company.com \
28+
--database="mydb" \
29+
--table="mytable" \
30+
--alter="drop key idx1"
31+
--gtid \
32+
--checkpoint \
33+
--checkpoint-seconds=60 \
34+
--execute
35+
```
36+
37+
In this example `gh-ost` writes a cut-over checkpoint to `_mytable_ghk` after the cut-over is successful. The original table is renamed to `_mytable_del`.
38+
39+
Suppose that dropping the index causes problems, the migration can be revert with:
40+
```shell
41+
# revert migration
42+
gh-ost \
43+
--chunk-size=100 \
44+
--host=replica1.company.com \
45+
--database="mydb" \
46+
--table="mytable" \
47+
--old-table="_mytable_del"
48+
--gtid \
49+
--checkpoint \
50+
--checkpoint-seconds=60 \
51+
--revert \
52+
--execute
53+
```
54+
55+
gh-ost then reconnects at the binlog coordinates stored in the cut-over checkpoint and applies DMLs until the old table is up-to-date.
56+
Note that the "reverse" migration is `ADD KEY idx(...)` so there is no potential data loss to consider in this case.

go/base/context.go

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,8 @@ type MigrationContext struct {
104104
AzureMySQL bool
105105
AttemptInstantDDL bool
106106
Resume bool
107+
Revert bool
108+
OldTableName string
107109

108110
// SkipPortValidation allows skipping the port validation in `ValidateConnection`
109111
// This is useful when connecting to a MySQL instance where the external port
@@ -256,6 +258,7 @@ type MigrationContext struct {
256258

257259
BinlogSyncerMaxReconnectAttempts int
258260
AllowSetupMetadataLockInstruments bool
261+
SkipMetadataLockCheck bool
259262
IsOpenMetadataLockInstruments bool
260263

261264
Log Logger
@@ -350,6 +353,10 @@ func getSafeTableName(baseName string, suffix string) string {
350353
// GetGhostTableName generates the name of ghost table, based on original table name
351354
// or a given table name
352355
func (this *MigrationContext) GetGhostTableName() string {
356+
if this.Revert {
357+
// When reverting the "ghost" table is the _del table from the original migration.
358+
return this.OldTableName
359+
}
353360
if this.ForceTmpTableName != "" {
354361
return getSafeTableName(this.ForceTmpTableName, "gho")
355362
} else {
@@ -366,14 +373,18 @@ func (this *MigrationContext) GetOldTableName() string {
366373
tableName = this.OriginalTableName
367374
}
368375

376+
suffix := "del"
377+
if this.Revert {
378+
suffix = "rev_del"
379+
}
369380
if this.TimestampOldTable {
370381
t := this.StartTime
371382
timestamp := fmt.Sprintf("%d%02d%02d%02d%02d%02d",
372383
t.Year(), t.Month(), t.Day(),
373384
t.Hour(), t.Minute(), t.Second())
374-
return getSafeTableName(tableName, fmt.Sprintf("%s_del", timestamp))
385+
return getSafeTableName(tableName, fmt.Sprintf("%s_%s", timestamp, suffix))
375386
}
376-
return getSafeTableName(tableName, "del")
387+
return getSafeTableName(tableName, suffix)
377388
}
378389

379390
// GetChangelogTableName generates the name of changelog table, based on original table name

go/cmd/gh-ost/main.go

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,8 @@ func main() {
141141
flag.Int64Var(&migrationContext.HooksStatusIntervalSec, "hooks-status-interval", 60, "how many seconds to wait between calling onStatus hook")
142142

143143
flag.UintVar(&migrationContext.ReplicaServerId, "replica-server-id", 99999, "server id used by gh-ost process. Default: 99999")
144-
flag.BoolVar(&migrationContext.AllowSetupMetadataLockInstruments, "allow-setup-metadata-lock-instruments", false, "validate rename session hold the MDL of original table before unlock tables in cut-over phase")
144+
flag.BoolVar(&migrationContext.AllowSetupMetadataLockInstruments, "allow-setup-metadata-lock-instruments", false, "Validate rename session hold the MDL of original table before unlock tables in cut-over phase")
145+
flag.BoolVar(&migrationContext.SkipMetadataLockCheck, "skip-metadata-lock-check", false, "Skip metadata lock check at cut-over time. The checks require performance_schema.metadata_lock to be enabled")
145146
flag.IntVar(&migrationContext.BinlogSyncerMaxReconnectAttempts, "binlogsyncer-max-reconnect-attempts", 0, "when master node fails, the maximum number of binlog synchronization attempts to reconnect. 0 is unlimited")
146147

147148
flag.BoolVar(&migrationContext.IncludeTriggers, "include-triggers", false, "When true, the triggers (if exist) will be created on the new table")
@@ -151,6 +152,8 @@ func main() {
151152
flag.BoolVar(&migrationContext.Checkpoint, "checkpoint", false, "Enable migration checkpoints")
152153
flag.Int64Var(&migrationContext.CheckpointIntervalSeconds, "checkpoint-seconds", 300, "The number of seconds between checkpoints")
153154
flag.BoolVar(&migrationContext.Resume, "resume", false, "Attempt to resume migration from checkpoint")
155+
flag.BoolVar(&migrationContext.Revert, "revert", false, "Attempt to revert completed migration")
156+
flag.StringVar(&migrationContext.OldTableName, "old-table", "", "The name of the old table when using --revert, e.g. '_mytable_del'")
154157

155158
maxLoad := flag.String("max-load", "", "Comma delimited status-name=threshold. e.g: 'Threads_running=100,Threads_connected=500'. When status exceeds threshold, app throttles writes")
156159
criticalLoad := flag.String("critical-load", "", "Comma delimited status-name=threshold, same format as --max-load. When status exceeds threshold, app panics and quits")
@@ -209,12 +212,35 @@ func main() {
209212

210213
migrationContext.SetConnectionCharset(*charset)
211214

212-
if migrationContext.AlterStatement == "" {
215+
if migrationContext.AlterStatement == "" && !migrationContext.Revert {
213216
log.Fatal("--alter must be provided and statement must not be empty")
214217
}
215218
parser := sql.NewParserFromAlterStatement(migrationContext.AlterStatement)
216219
migrationContext.AlterStatementOptions = parser.GetAlterStatementOptions()
217220

221+
if migrationContext.Revert {
222+
if migrationContext.Resume {
223+
log.Fatal("--revert cannot be used with --resume")
224+
}
225+
if migrationContext.OldTableName == "" {
226+
migrationContext.Log.Fatalf("--revert must be called with --old-table")
227+
}
228+
229+
// options irrelevant to revert mode
230+
if migrationContext.AlterStatement != "" {
231+
log.Warning("--alter was provided with --revert, it will be ignored")
232+
}
233+
if migrationContext.AttemptInstantDDL {
234+
log.Warning("--attempt-instant-ddl was provided with --revert, it will be ignored")
235+
}
236+
if migrationContext.IncludeTriggers {
237+
log.Warning("--include-triggers was provided with --revert, it will be ignored")
238+
}
239+
if migrationContext.DiscardForeignKeys {
240+
log.Warning("--discard-foreign-keys was provided with --revert, it will be ignored")
241+
}
242+
}
243+
218244
if migrationContext.DatabaseName == "" {
219245
if parser.HasExplicitSchema() {
220246
migrationContext.DatabaseName = parser.GetExplicitSchema()
@@ -354,7 +380,14 @@ func main() {
354380
acceptSignals(migrationContext)
355381

356382
migrator := logic.NewMigrator(migrationContext, AppVersion)
357-
if err := migrator.Migrate(); err != nil {
383+
var err error
384+
if migrationContext.Revert {
385+
err = migrator.Revert()
386+
} else {
387+
err = migrator.Migrate()
388+
}
389+
390+
if err != nil {
358391
migrator.ExecOnFailureHook()
359392
migrationContext.Log.Fatale(err)
360393
}

go/logic/applier.go

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -433,29 +433,24 @@ func (this *Applier) CreateCheckpointTable() error {
433433
colDefs := []string{
434434
"`gh_ost_chk_id` bigint auto_increment primary key",
435435
"`gh_ost_chk_timestamp` bigint",
436-
"`gh_ost_chk_coords` varchar(4096)",
436+
"`gh_ost_chk_coords` text charset ascii",
437437
"`gh_ost_chk_iteration` bigint",
438438
"`gh_ost_rows_copied` bigint",
439439
"`gh_ost_dml_applied` bigint",
440+
"`gh_ost_is_cutover` tinyint(1) DEFAULT '0'",
440441
}
441442
for _, col := range this.migrationContext.UniqueKey.Columns.Columns() {
442443
if col.MySQLType == "" {
443444
return fmt.Errorf("CreateCheckpoinTable: column %s has no type information. applyColumnTypes must be called", sql.EscapeName(col.Name))
444445
}
445446
minColName := sql.TruncateColumnName(col.Name, sql.MaxColumnNameLength-4) + "_min"
446447
colDef := fmt.Sprintf("%s %s", sql.EscapeName(minColName), col.MySQLType)
447-
if !col.Nullable {
448-
colDef += " NOT NULL"
449-
}
450448
colDefs = append(colDefs, colDef)
451449
}
452450

453451
for _, col := range this.migrationContext.UniqueKey.Columns.Columns() {
454452
maxColName := sql.TruncateColumnName(col.Name, sql.MaxColumnNameLength-4) + "_max"
455453
colDef := fmt.Sprintf("%s %s", sql.EscapeName(maxColName), col.MySQLType)
456-
if !col.Nullable {
457-
colDef += " NOT NULL"
458-
}
459454
colDefs = append(colDefs, colDef)
460455
}
461456

@@ -488,10 +483,16 @@ func (this *Applier) dropTable(tableName string) error {
488483
return nil
489484
}
490485

486+
// StateMetadataLockInstrument checks if metadata_locks is enabled in performance_schema.
487+
// If not it attempts to enable metadata_locks if this is allowed.
491488
func (this *Applier) StateMetadataLockInstrument() error {
492489
query := `select /*+ MAX_EXECUTION_TIME(300) */ ENABLED, TIMED from performance_schema.setup_instruments WHERE NAME = 'wait/lock/metadata/sql/mdl'`
493490
var enabled, timed string
494491
if err := this.db.QueryRow(query).Scan(&enabled, &timed); err != nil {
492+
if errors.Is(err, gosql.ErrNoRows) {
493+
// performance_schema may be disabled.
494+
return nil
495+
}
495496
return this.migrationContext.Log.Errorf("query performance_schema.setup_instruments with name wait/lock/metadata/sql/mdl error: %s", err)
496497
}
497498
if strings.EqualFold(enabled, "YES") && strings.EqualFold(timed, "YES") {
@@ -627,7 +628,7 @@ func (this *Applier) WriteCheckpoint(chk *Checkpoint) (int64, error) {
627628
if err != nil {
628629
return insertId, err
629630
}
630-
args := sqlutils.Args(chk.LastTrxCoords.String(), chk.Iteration, chk.RowsCopied, chk.DMLApplied)
631+
args := sqlutils.Args(chk.LastTrxCoords.String(), chk.Iteration, chk.RowsCopied, chk.DMLApplied, chk.IsCutover)
631632
args = append(args, uniqueKeyArgs...)
632633
res, err := this.db.Exec(query, args...)
633634
if err != nil {
@@ -637,15 +638,15 @@ func (this *Applier) WriteCheckpoint(chk *Checkpoint) (int64, error) {
637638
}
638639

639640
func (this *Applier) ReadLastCheckpoint() (*Checkpoint, error) {
640-
row := this.db.QueryRow(fmt.Sprintf(`select /* gh-ost */ * from %s.%s order by gh_ost_chk_id desc limit 1`, this.migrationContext.DatabaseName, this.migrationContext.GetCheckpointTableName()))
641+
row := this.db.QueryRow(fmt.Sprintf(`select /* gh-ost */ * from %s.%s order by gh_ost_chk_id desc limit 1`, sql.EscapeName(this.migrationContext.DatabaseName), sql.EscapeName(this.migrationContext.GetCheckpointTableName())))
641642
chk := &Checkpoint{
642643
IterationRangeMin: sql.NewColumnValues(this.migrationContext.UniqueKey.Columns.Len()),
643644
IterationRangeMax: sql.NewColumnValues(this.migrationContext.UniqueKey.Columns.Len()),
644645
}
645646

646647
var coordStr string
647648
var timestamp int64
648-
ptrs := []interface{}{&chk.Id, &timestamp, &coordStr, &chk.Iteration, &chk.RowsCopied, &chk.DMLApplied}
649+
ptrs := []interface{}{&chk.Id, &timestamp, &coordStr, &chk.Iteration, &chk.RowsCopied, &chk.DMLApplied, &chk.IsCutover}
649650
ptrs = append(ptrs, chk.IterationRangeMin.ValuesPointers...)
650651
ptrs = append(ptrs, chk.IterationRangeMax.ValuesPointers...)
651652
err := row.Scan(ptrs...)
@@ -1345,7 +1346,7 @@ func (this *Applier) AtomicCutOverMagicLock(sessionIdChan chan int64, tableLocke
13451346

13461347
this.migrationContext.Log.Infof("Session renameLockSessionId is %+v", *renameLockSessionId)
13471348
// Checking the lock is held by rename session
1348-
if *renameLockSessionId > 0 && this.migrationContext.IsOpenMetadataLockInstruments {
1349+
if *renameLockSessionId > 0 && this.migrationContext.IsOpenMetadataLockInstruments && !this.migrationContext.SkipMetadataLockCheck {
13491350
sleepDuration := time.Duration(10*this.migrationContext.CutOverLockTimeoutSeconds) * time.Millisecond
13501351
for i := 1; i <= 100; i++ {
13511352
err := this.ExpectMetadataLock(*renameLockSessionId)

go/logic/applier_test.go

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ func (suite *ApplierTestSuite) SetupSuite() {
214214
testmysql.WithUsername(testMysqlUser),
215215
testmysql.WithPassword(testMysqlPass),
216216
testcontainers.WithWaitStrategy(wait.ForExposedPort()),
217+
testmysql.WithConfigFile("my.cnf.test"),
217218
)
218219
suite.Require().NoError(err)
219220

@@ -272,7 +273,7 @@ func (suite *ApplierTestSuite) TestInitDBConnections() {
272273
mysqlVersion, _ := strings.CutPrefix(testMysqlContainerImage, "mysql:")
273274
suite.Require().Equal(mysqlVersion, migrationContext.ApplierMySQLVersion)
274275
suite.Require().Equal(int64(28800), migrationContext.ApplierWaitTimeout)
275-
suite.Require().Equal("SYSTEM", migrationContext.ApplierTimeZone)
276+
suite.Require().Equal("+00:00", migrationContext.ApplierTimeZone)
276277

277278
suite.Require().Equal(sql.NewColumnList([]string{"id", "item_id"}), migrationContext.OriginalTableColumnsOnApplier)
278279
}
@@ -704,6 +705,7 @@ func (suite *ApplierTestSuite) TestWriteCheckpoint() {
704705
Iteration: 2,
705706
RowsCopied: 100000,
706707
DMLApplied: 200000,
708+
IsCutover: true,
707709
}
708710
id, err := applier.WriteCheckpoint(chk)
709711
suite.Require().NoError(err)
@@ -718,6 +720,7 @@ func (suite *ApplierTestSuite) TestWriteCheckpoint() {
718720
suite.Require().Equal(chk.IterationRangeMax.String(), gotChk.IterationRangeMax.String())
719721
suite.Require().Equal(chk.RowsCopied, gotChk.RowsCopied)
720722
suite.Require().Equal(chk.DMLApplied, gotChk.DMLApplied)
723+
suite.Require().Equal(chk.IsCutover, gotChk.IsCutover)
721724
}
722725

723726
func TestApplier(t *testing.T) {

go/logic/checkpoint.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,5 @@ type Checkpoint struct {
2828
Iteration int64
2929
RowsCopied int64
3030
DMLApplied int64
31+
IsCutover bool
3132
}

go/logic/hooks.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ func (this *HooksExecutor) applyEnvironmentVariables(extraVariables ...string) [
7070
env = append(env, fmt.Sprintf("GH_OST_HOOKS_HINT_OWNER=%s", this.migrationContext.HooksHintOwner))
7171
env = append(env, fmt.Sprintf("GH_OST_HOOKS_HINT_TOKEN=%s", this.migrationContext.HooksHintToken))
7272
env = append(env, fmt.Sprintf("GH_OST_DRY_RUN=%t", this.migrationContext.Noop))
73+
env = append(env, fmt.Sprintf("GH_OST_REVERT=%t", this.migrationContext.Revert))
7374

7475
env = append(env, extraVariables...)
7576
return env

0 commit comments

Comments
 (0)