Context
PR #1539 locked `tale deploy` to the running CLI's version and intentionally did not lean on `tale rollback` as a recovery path. Reason: Tale's migrations are forward-only — new versions can write data the old binary cannot read, so swapping the binary back without reversing data leaves the system in a half-broken state. Keeping a command that looks like it recovers, but actually corrupts, is worse than not having it.
This issue tracks the proper redesign.
Proposed scope
1. Audit and likely remove `tale rollback`
- Today `tale rollback --version X` accepts any version with no validation (verified in tools/cli/src/commands/rollback.ts) and re-implements its own deploy path.
- Remove the command and its `setPreviousVersion` state-tracking, OR gate it behind a strict precondition (e.g. only allowed when no migrations have run since the previous version).
2. Backup-based recovery as the official supported path
- Automatic volume snapshot in `tale deploy` before `runPendingMigrations` (the only step that mutates data). Snapshot the `STATEFUL_SERVICES` volumes via `docker run --rm -v :/data -v :/backup alpine tar czf /backup/-.tar.gz /data`.
- Rotation policy (keep last N snapshots / N days) to bound disk usage.
- Companion `tale restore ` command — listing, restoring, integrity-checking snapshots.
- Failure behavior: snapshot failure aborts deploy by default; `--skip-backup` to override explicitly.
3. Docs
Open questions
- Per-project backup directory location and ownership (host path vs. dedicated docker volume).
- How to handle very large stateful volumes — incremental snapshots? Hooks for app-level dumps (Convex export, pg_dump) instead of raw volume tar?
- Should snapshot creation be on `tale deploy` only, or also on `tale start` when migrations run?
- Should existing migrations (`namespace-volumes`, `split-convex`) be retroactively wrapped in the snapshot flow on first run after this lands?
Out of scope (for this issue)
Context
PR #1539 locked `tale deploy` to the running CLI's version and intentionally did not lean on `tale rollback` as a recovery path. Reason: Tale's migrations are forward-only — new versions can write data the old binary cannot read, so swapping the binary back without reversing data leaves the system in a half-broken state. Keeping a command that looks like it recovers, but actually corrupts, is worse than not having it.
This issue tracks the proper redesign.
Proposed scope
1. Audit and likely remove `tale rollback`
2. Backup-based recovery as the official supported path
3. Docs
Open questions
Out of scope (for this issue)