Skip to content

refactor: arcane migrator#2628

Draft
cabaucom376 wants to merge 9 commits into
breaking/v2.0.0from
refactor/migrator
Draft

refactor: arcane migrator#2628
cabaucom376 wants to merge 9 commits into
breaking/v2.0.0from
refactor/migrator

Conversation

@cabaucom376
Copy link
Copy Markdown
Collaborator

@cabaucom376 cabaucom376 commented May 17, 2026

Checklist

  • This PR is not opened from my fork’s main branch
  • Update the self-update logic to be migrator aware
  • Set manual migration version and message before release
  • Update documentation website with migration guide

What This PR Implements

Adds an explicit, container-forward database migration flow for Arcane. Migrations now run through a bundled one-shot migrator command instead of being hidden in normal app startup, and startup validates that the database schema is already ready.

services:
  arcane-migrator:
    image: ghcr.io/getarcaneapp/arcane:latest
    restart: "no"
    # To downgrade:
    # 1. Stop Arcane and back up the arcane-data volume.
    # 2. Set ARCANE_VERSION to the target version, for example v1.18.0.
    # 3. Comment out the `up` command and uncomment the `down` command.
    # 4. Run: docker compose run --rm arcane-migrator
    command:
      - migrate
      - up
    # command:
    #   - migrate
    #   - down
    #   - ${ARCANE_VERSION:?}
    volumes:
      - arcane-data:/app/data

  arcane:
    image: ghcr.io/getarcaneapp/arcane:${ARCANE_VERSION:-latest}
    container_name: arcane
    depends_on:
      arcane-migrator:
        condition: service_completed_successfully
    ports:
      - "3552:3552"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - arcane-data:/app/data
      - /host/path/to/projects:/app/data/projects
      - /host/path/to/builds:/builds
    environment:
      - ENCRYPTION_KEY=xxxxxxxxxxxxxxxxxxxxxx
      - JWT_SECRET=xxxxxxxxxxxxxxxxxxxxxxxxxx
      # Optional: run Arcane as a specific host UID/GID instead of the default runtime user.
      # When using the mounted Docker socket, Arcane will map the socket group automatically.
      # - PUID=1000
      # - PGID=1000
      # Timezone for cron job scheduling (e.g., America/New_York, Europe/London, UTC)
      # See: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
      - TZ=UTC
    # Use host cgroup namespace for better container self-detection
    # This allows Arcane to detect its own container ID more reliably
    cgroup: host
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl -fsS http://localhost:3552/api/health >/dev/null || exit 1",
        ]
      interval: 10s
      timeout: 3s
      retries: 5
      start_period: 15s
    restart: unless-stopped

volumes:
  arcane-data:

Changes Made

  • Added a bundled arcane-migrator binary with explicit up, down , status, and generate-manifest commands.
  • Changed server startup to validate schema readiness instead of running migrations automatically.
  • Added generated app-version to schema-version mapping support for downgrade targets.
  • Bundled the migrator in manager and headless agent images, including GoReleaser next-image paths.
  • Updated Docker Compose examples to run migrations as one-shot services before starting Arcane or agents, with documented downgrade commands.
  • Added tests for migration status, downgrade behavior, manifest resolution/generation, and startup validation.

Testing Done

  • Development environment started: ./scripts/development/dev.sh start
  • Frontend verified at http://localhost:3000
  • Backend verified at http://localhost:3552
  • Manual testing completed (describe):
  • No linting errors (e.g., just lint all)
  • Backend tests pass: just test backend

AI Tool Used (if applicable)

AI Tool:
Assistance Level: Significant
What AI helped with: Implementation of actual migrator binary, schema validation, and tests. This was a pattern I have used in a project of mine, I just fed all design principles and implementation details for it to apply this pattern to Arcane.
I reviewed and edited all AI-generated output: yes
I ran all required tests and manually verified changes:

Additional Context

Almost certainly considered a breaking change. People will have to update their compose files with the new migrator service.

Disclaimer Greptiles Reviews use AI, make sure to check over its work.

To better help train Greptile on our codebase, if the comment is useful and valid Like the comment, if its not helpful or invalid Dislike

To have Greptile Re-Review the changes, mention greptileai.

Greptile Summary

This PR replaces the hidden, startup-time migration runner with an explicit arcane-migrator one-shot binary, adds migrate up/down/status/generate-manifest subcommands, and updates app startup to validate that migrations have already been applied. It also introduces a manualMigrationBoundary mechanism that blocks the in-app auto-updater when the upgrade would cross a release boundary requiring manual steps (v1.20.0).

  • Database layer: Initialize now validates schema readiness instead of running migrations; MigrateUp, MigrateToVersion, and GetMigrationStatus are new public entry points used by the migrator CLI.
  • Upgrade flow: runMigratorContainerInternal runs a one-shot migrator container during the in-place Docker upgrade; TriggerUpgradeViaCLIForContainer replaces TriggerUpgradeViaCLI to allow targeting a specific container ID.
  • Frontend: manualUpdateRequired / manualUpdateMessage fields are threaded through the version types and surfaced in the update-center dialog; the auto-install button is hidden and a styled amber warning panel replaces the footer when a manual update is required.

Confidence Score: 3/5

Safe to merge with the GitHub-API fallback behavior understood and accepted; the auto-upgrade blocking under network failure should be addressed before this ships to users.

The core migration refactor is well-structured and tested, but ManualUpdateRequirement incorrectly blocks auto-upgrades for users before any declared boundary whenever GitHub is temporarily unreachable. Nine new unexported functions in command.go also consistently miss the Internal suffix required by the team convention.

backend/internal/services/version_service.go (GitHub-down fallback in ManualUpdateRequirement) and backend/cli/migrate/command.go (naming convention across all unexported helpers).

Fix All in Codex Fix All in Claude Code

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
backend/cli/migrate/command.go:32-248
**Missing "Internal" suffix on unexported functions**

The project's coding convention requires all unexported functions to have the `Internal` suffix, but nine new unexported functions in this file don't follow the rule: `newCommand`, `configureCommand`, `newStatusCommand`, `newUpCommand`, `newDownCommand`, `loadMigrationStatus`, `newGenerateManifestCommand`, `loadConfig`, and `printStatus`. This pattern appears across the whole file — the same convention is used consistently in every other package in the diff (e.g. `runMigratorContainerInternal`, `migrationBinaryPathForContainerInternal`, `getMigrationStatusInternal`).

### Issue 2 of 3
backend/internal/services/version_service.go:192-209
**GitHub API failure incorrectly blocks upgrades with "manual update required"**

When `ManualUpdateRequirement` is called with an empty `targetVersion` (as in `checkManualUpdateRequirement(ctx, "", "")`), it fetches the latest release from GitHub. If that request fails, the code conservatively returns `true` and the boundary message — blocking `CanUpgrade` and `TriggerUpgradeViaCLIForContainer` with `ErrManualUpdateRequired`. A user on any version before a migration boundary (e.g. a future `v2.0.0` boundary) who temporarily loses GitHub connectivity would be told they need to perform a manual update, even though the newest published release may not cross that boundary at all. The stale-cache path used in `GetVersionInformation` (which continues with the last known `latest`) is not applied here, making the failure mode inconsistent with the rest of the version-checking code.

### Issue 3 of 3
backend/api/handlers/system.go:513
**Fragile error-message extraction via string stripping**

`strings.TrimPrefix(err.Error(), services.ErrManualUpdateRequired.Error()+": ")` couples the handler to the exact error format string. If the error is ever wrapped with extra context upstream (e.g. `fmt.Errorf("upgrade check: %w", errManualUpdateRequired)`) the `TrimPrefix` won't match and the full raw error string — including the sentinel — is shown to users. A typed error struct carrying the message field, or a helper in the `services` package to extract it, would be more robust. The same pattern is repeated at line ~558 for `TriggerUpgrade`.

Reviews (1): Last reviewed commit: "lint" | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

Context used:

  • Rule used - What: All unexported functions must have the "Inte... (source)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 17, 2026

🔍 Deadcode Analysis

Found 3 unreachable functions in the backend.

View details
cli/migrate/command.go:32:6: unreachable func: newCommand
internal/database/database.go:123:6: unreachable func: ValidateMigrationSchema
internal/database/database.go:414:6: unreachable func: safeUintToIntInternal

Only remove deadcode that you know is 100% no longer used.

Analysis from commit 5c54a0a

@getarcaneappbot
Copy link
Copy Markdown
Contributor

getarcaneappbot commented May 17, 2026

Container images for this PR have been built successfully!

  • Manager: ghcr.io/getarcaneapp/arcane:pr-2628
  • Agent: ghcr.io/getarcaneapp/arcane-headless:pr-2628

Built from commit 1991d17

@kmendell
Copy link
Copy Markdown
Member

You can just add this as a command to the existsing /app/arcane binary like i do wit hteh self update

@cabaucom376
Copy link
Copy Markdown
Collaborator Author

Noted, I'll do that. this is very ugly right now so don't bother looking lol.

@cabaucom376
Copy link
Copy Markdown
Collaborator Author

@greptileai

Comment on lines +32 to +248
func newCommand(out io.Writer) *cobra.Command {
cmd := &cobra.Command{
Use: "migrate",
Short: "Manage Arcane database schema migrations",
Version: config.Version,
SilenceUsage: true,
SilenceErrors: true,
}
configureCommand(cmd, out)
return cmd
}

func configureCommand(cmd *cobra.Command, out io.Writer) {
if out == nil {
out = io.Discard
}

cmd.SetOut(out)
cmd.AddCommand(newStatusCommand(out))
cmd.AddCommand(newUpCommand(out))
cmd.AddCommand(newDownCommand(out))
cmd.AddCommand(newGenerateManifestCommand(out))
}

func newStatusCommand(out io.Writer) *cobra.Command {
return &cobra.Command{
Use: "status",
Short: "Show the current Arcane database migration status",
RunE: func(cmd *cobra.Command, args []string) error {
status, err := loadMigrationStatus(cmd.Context())
if err != nil {
return err
}

knownVersions, err := database.ListAppMigrationVersions()
if err != nil {
return err
}

return printStatus(out, status, knownVersions)
},
}
}

func newUpCommand(out io.Writer) *cobra.Command {
return &cobra.Command{
Use: "up",
Short: "Apply pending Arcane database migrations",
RunE: func(cmd *cobra.Command, args []string) error {
cfg, err := loadConfig(cmd.Context())
if err != nil {
return err
}
if err := database.MigrateUp(cmd.Context(), cfg.DatabaseURL); err != nil {
return err
}
_, err = fmt.Fprintln(out, "Database migrations applied successfully")
return err
},
}
}

func newDownCommand(out io.Writer) *cobra.Command {
cmd := &cobra.Command{
Use: "down <target-version>",
Short: "Downgrade the Arcane database schema for a target Arcane version",
Args: cobra.ExactArgs(1),
RunE: func(cmd *cobra.Command, args []string) error {
targetAppVersion := strings.TrimSpace(args[0])
if targetAppVersion == "" {
return fmt.Errorf("target version is required")
}

targetMigrationVersion, err := database.ResolveAppMigrationVersion(targetAppVersion)
if err != nil {
return err
}

cfg, err := loadConfig(cmd.Context())
if err != nil {
return err
}
status, err := database.GetMigrationStatus(cmd.Context(), cfg.DatabaseURL)
if err != nil {
return err
}
if status.Dirty {
return fmt.Errorf("database schema version %d is dirty; resolve the dirty migration state before downgrading", status.CurrentVersion)
}
if targetMigrationVersion > status.LatestVersion {
return fmt.Errorf("target Arcane version %s requires schema version %d, but this Arcane binary only includes migrations through %d", targetAppVersion, targetMigrationVersion, status.LatestVersion)
}
if !status.HasVersion {
return fmt.Errorf("database has no migration version; start Arcane once to initialize the schema before downgrading")
}
if status.CurrentVersion < targetMigrationVersion {
return fmt.Errorf("database schema version %d is older than target Arcane version %s schema version %d", status.CurrentVersion, targetAppVersion, targetMigrationVersion)
}
if status.CurrentVersion == targetMigrationVersion {
_, err = fmt.Fprintf(out, "Database schema is already at version %d for Arcane %s\n", targetMigrationVersion, targetAppVersion)
return err
}

slog.Warn("Downgrading Arcane database schema",
"provider", status.Provider,
"currentVersion", status.CurrentVersion,
"targetVersion", targetMigrationVersion,
"targetAppVersion", targetAppVersion,
)
if err := database.MigrateToVersion(cmd.Context(), cfg.DatabaseURL, targetMigrationVersion); err != nil {
return err
}

_, err = fmt.Fprintf(out, "Database schema downgraded from version %d to %d for Arcane %s\n", status.CurrentVersion, targetMigrationVersion, targetAppVersion)
return err
},
}

return cmd
}

func loadMigrationStatus(ctx context.Context) (*database.MigrationStatus, error) {
cfg, err := loadConfig(ctx)
if err != nil {
return nil, err
}
return database.GetMigrationStatus(ctx, cfg.DatabaseURL)
}

func newGenerateManifestCommand(out io.Writer) *cobra.Command {
var repoRoot string
var outputPath string
var includeVersion string

cmd := &cobra.Command{
Use: "generate-manifest",
Short: "Generate the app-version to schema-version migration manifest",
RunE: func(cmd *cobra.Command, args []string) error {
versions, err := database.GenerateAppMigrationVersionsFromGit(cmd.Context(), repoRoot, includeVersion)
if err != nil {
return err
}

manifestBytes, err := database.MarshalAppMigrationVersionManifest(versions)
if err != nil {
return err
}

if outputPath == "-" {
_, err = out.Write(manifestBytes)
return err
}

if err := os.WriteFile(outputPath, manifestBytes, 0o644); err != nil {
return fmt.Errorf("failed to write migration manifest: %w", err)
}

return nil
},
}

cmd.Flags().StringVar(&repoRoot, "repo-root", ".", "Repository root containing .git and backend/resources/migrations")
cmd.Flags().StringVar(&outputPath, "output", "backend/resources/migration_versions.json", "Output manifest path, or - for stdout")
cmd.Flags().StringVar(&includeVersion, "include-version", "", "Include this app version using the working tree's current highest migration")
return cmd
}

func loadConfig(ctx context.Context) (*config.Config, error) {
_ = godotenv.Load()
cfg := config.Load()

runtimeIdentityCfg := &startup.RuntimeIdentityConfig{
PUID: cfg.PUID,
PGID: cfg.PGID,
DockerHost: cfg.DockerHost,
DockerConfig: cfg.DockerConfig,
DatabaseURL: cfg.DatabaseURL,
}
if err := startup.ApplyRequestedRuntimeIdentity(ctx, runtimeIdentityCfg); err != nil {
return nil, fmt.Errorf("apply runtime identity: %w", err)
}
cfg.DockerConfig = runtimeIdentityCfg.DockerConfig

bootstrap.SetupSlogLogger(cfg)
bootstrap.ConfigureGormLogger(cfg)
return cfg, nil
}

func printStatus(out io.Writer, status *database.MigrationStatus, knownVersions []database.AppMigrationVersion) error {
if _, err := fmt.Fprintf(out, "Provider: %s\n", status.Provider); err != nil {
return err
}
if status.HasVersion {
if _, err := fmt.Fprintf(out, "Current schema version: %d\n", status.CurrentVersion); err != nil {
return err
}
} else {
if _, err := fmt.Fprintln(out, "Current schema version: none"); err != nil {
return err
}
}
if _, err := fmt.Fprintf(out, "Latest embedded schema version: %d\n", status.LatestVersion); err != nil {
return err
}
if _, err := fmt.Fprintf(out, "Dirty: %t\n", status.Dirty); err != nil {
return err
}
if _, err := fmt.Fprintln(out, "Known app-version targets:"); err != nil {
return err
}
for _, version := range knownVersions {
if _, err := fmt.Fprintf(out, " %s -> schema %d\n", version.AppVersion, version.MigrationVersion); err != nil {
return err
}
}
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing "Internal" suffix on unexported functions

The project's coding convention requires all unexported functions to have the Internal suffix, but nine new unexported functions in this file don't follow the rule: newCommand, configureCommand, newStatusCommand, newUpCommand, newDownCommand, loadMigrationStatus, newGenerateManifestCommand, loadConfig, and printStatus. This pattern appears across the whole file — the same convention is used consistently in every other package in the diff (e.g. runMigratorContainerInternal, migrationBinaryPathForContainerInternal, getMigrationStatusInternal).

Rule Used: What: All unexported functions must have the "Inte... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/cli/migrate/command.go
Line: 32-248

Comment:
**Missing "Internal" suffix on unexported functions**

The project's coding convention requires all unexported functions to have the `Internal` suffix, but nine new unexported functions in this file don't follow the rule: `newCommand`, `configureCommand`, `newStatusCommand`, `newUpCommand`, `newDownCommand`, `loadMigrationStatus`, `newGenerateManifestCommand`, `loadConfig`, and `printStatus`. This pattern appears across the whole file — the same convention is used consistently in every other package in the diff (e.g. `runMigratorContainerInternal`, `migrationBinaryPathForContainerInternal`, `getMigrationStatusInternal`).

**Rule Used:** What: All unexported functions must have the "Inte... ([source](https://app.greptile.com/review/custom-context?memory=306fc233-4d2f-4ac4-bdf7-8059588e8a43))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex Fix in Claude Code

Comment on lines +192 to +209
latest, err := s.GetLatestVersion(ctx)
if err != nil {
slog.DebugContext(ctx, "Failed to determine latest version for manual update requirement", "error", err)
return true, manualMigrationBoundaryMessageInternal(nextBoundary)
}
targetVersion = latest
}

target, targetOK := s.normalizeComparableVersionInternal(targetVersion)
if !targetOK {
return false, ""
}

if boundary, crossed := s.crossedManualMigrationBoundaryInternal(current, target); crossed {
return true, manualMigrationBoundaryMessageInternal(boundary)
}

return false, ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 GitHub API failure incorrectly blocks upgrades with "manual update required"

When ManualUpdateRequirement is called with an empty targetVersion (as in checkManualUpdateRequirement(ctx, "", "")), it fetches the latest release from GitHub. If that request fails, the code conservatively returns true and the boundary message — blocking CanUpgrade and TriggerUpgradeViaCLIForContainer with ErrManualUpdateRequired. A user on any version before a migration boundary (e.g. a future v2.0.0 boundary) who temporarily loses GitHub connectivity would be told they need to perform a manual update, even though the newest published release may not cross that boundary at all. The stale-cache path used in GetVersionInformation (which continues with the last known latest) is not applied here, making the failure mode inconsistent with the rest of the version-checking code.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/internal/services/version_service.go
Line: 192-209

Comment:
**GitHub API failure incorrectly blocks upgrades with "manual update required"**

When `ManualUpdateRequirement` is called with an empty `targetVersion` (as in `checkManualUpdateRequirement(ctx, "", "")`), it fetches the latest release from GitHub. If that request fails, the code conservatively returns `true` and the boundary message — blocking `CanUpgrade` and `TriggerUpgradeViaCLIForContainer` with `ErrManualUpdateRequired`. A user on any version before a migration boundary (e.g. a future `v2.0.0` boundary) who temporarily loses GitHub connectivity would be told they need to perform a manual update, even though the newest published release may not cross that boundary at all. The stale-cache path used in `GetVersionInformation` (which continues with the last known `latest`) is not applied here, making the failure mode inconsistent with the rest of the version-checking code.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex Fix in Claude Code

Comment thread backend/api/handlers/system.go Outdated
Body: UpgradeCheckResultData{
CanUpgrade: false,
Error: false,
Message: strings.TrimPrefix(err.Error(), services.ErrManualUpdateRequired.Error()+": "),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Fragile error-message extraction via string stripping

strings.TrimPrefix(err.Error(), services.ErrManualUpdateRequired.Error()+": ") couples the handler to the exact error format string. If the error is ever wrapped with extra context upstream (e.g. fmt.Errorf("upgrade check: %w", errManualUpdateRequired)) the TrimPrefix won't match and the full raw error string — including the sentinel — is shown to users. A typed error struct carrying the message field, or a helper in the services package to extract it, would be more robust. The same pattern is repeated at line ~558 for TriggerUpgrade.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/api/handlers/system.go
Line: 513

Comment:
**Fragile error-message extraction via string stripping**

`strings.TrimPrefix(err.Error(), services.ErrManualUpdateRequired.Error()+": ")` couples the handler to the exact error format string. If the error is ever wrapped with extra context upstream (e.g. `fmt.Errorf("upgrade check: %w", errManualUpdateRequired)`) the `TrimPrefix` won't match and the full raw error string — including the sentinel — is shown to users. A typed error struct carrying the message field, or a helper in the `services` package to extract it, would be more robust. The same pattern is repeated at line ~558 for `TriggerUpgrade`.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex Fix in Claude Code

@github-actions
Copy link
Copy Markdown

This pull request has merge conflicts. Please resolve the conflicts so the PR can stay up-to-date and reviewed.

@kmendell
Copy link
Copy Markdown
Member

@cabaucom376 this actually reminds me, i had something like this before, i scraped it though it used goose and sqlc though and i made an api endpoint to control migrating

@cabaucom376
Copy link
Copy Markdown
Collaborator Author

@kmendell well darn why didn't it make the cut? Also since you're going ahead and doing a 2.0 I don't see a point in my other pr right now I think I can close that and just polish this.

@cabaucom376 cabaucom376 changed the base branch from main to breaking/v2.0.0 May 25, 2026 19:35
@kmendell
Copy link
Copy Markdown
Member

@cabaucom376 i wnat to do it still, but sqlc was a major change as it generates go code based on the SQL stuff, i may do it in the future still though....

@kmendell kmendell force-pushed the breaking/v2.0.0 branch 4 times, most recently from ed9c6e5 to 618493b Compare May 31, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants