You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(design): address Gemini review on logical backup proposal
HIGH:
- TTL renewal made explicit. New RenewBackup RPC; producer renews
every ttl_ms/3 and aborts the dump on renewal failure (continuing
past TTL would silently corrupt the artifact). backup_max_ttl_ms
caps a single window; multi-hour dumps work via repeated renewals.
MEDIUM:
- DynamoDB scalability. Default stays per-item (the user's stated
requirement: PK名.json で1レコード1ファイル). Added opt-in
--dynamodb-bundle-mode jsonl for tables where inode count is the
binding constraint; structured warning emitted when an unbundled
scope exceeds 1M items.
- S3 file-vs-directory collisions. Producer detects keys like
"path/to" and "path/to/obj" coexisting and renames the shorter to
"path/to.elastickv-leaf-data" with KEYMAP.jsonl + manifest record.
- KEYMAP scaled by switching to KEYMAP.jsonl (line-streamable).
- Redis strings_ttl.json -> strings_ttl.jsonl with one TTL record per
line so TTL count is unbounded.
- max_active_backup_pins (default 4) caps concurrent BeginBackup
registrations; tracker returns ErrTooManyActiveBackups, RPC surfaces
ResourceExhausted. Prevents a misbehaving caller from holding the
compactor open across the entire MVCC retention horizon.
Manifest fields added: s3_collision_strategy, dynamodb_layout,
max_active_backup_pins. CLI flags added: --ttl-ms, --scan-page-size,
--dynamodb-bundle-mode/--dynamodb-bundle-size, --rename-collisions.
Tests added: TestS3PathFileVsDirectoryCollision,
TestBeginBackupTooManyActiveBackups, TestRenewBackupExtendsDeadline.
`PinWithDeadline` records `(id → ts, deadline)`. A single sweeper
@@ -614,6 +698,30 @@ The sweeper logs a structured warning (`backup_pin_expired`) when it
614
698
drops a stuck registration so operators see crashed-producer cases in
615
699
their existing log pipeline.
616
700
701
+
**Bound on concurrent active backup pins.** To prevent a misbehaving
702
+
or malicious caller from issuing unbounded `BeginBackup` requests and
703
+
holding the compactor open across the whole MVCC retention horizon,
704
+
the tracker carries a hard cap (default `max_active_backup_pins = 4`,
705
+
configurable). When the cap is reached, `PinWithDeadline` returns
706
+
`ErrTooManyActiveBackups` and the admin RPC surfaces it as
707
+
`ResourceExhausted`. Operators raising this cap should size it
708
+
against the GC/compaction headroom — each held pin clamps the
709
+
compactor's `Oldest()` timestamp until released. The cap is
710
+
intentionally small because backups are an operator action, not
711
+
end-user traffic; if four are not enough, something is wrong with
712
+
the orchestration layer and adding pins compounds the underlying
713
+
problem rather than fixing it.
714
+
715
+
**TTL ceiling and back-pressure.** `ttl_ms` is bounded above by
716
+
`backup_max_ttl_ms` (default 1 h) — a single pin cannot block
717
+
compaction beyond that window even if a buggy caller asks for it.
718
+
For dumps that legitimately need to run longer (e.g. a 50 TiB
719
+
warehouse), the producer renews via `RenewBackup` every `ttl_ms/3`,
720
+
so total dump duration is bounded only by overall MVCC retention
721
+
budget, not by any single TTL choice. The producer surfaces a
722
+
`pin_renewals_total` metric so operators can correlate long-running
723
+
dumps with retention pressure.
724
+
617
725
### BeginBackup → EndBackup flow
618
726
619
727
1. **Pick `read_ts`**: `BeginBackup` reads the lease-read timestamp
@@ -634,10 +742,16 @@ their existing log pipeline.
634
742
the resulting `pin_token` to the producer.
635
743
4. **Producer scans** all configured adapter scopes via
636
744
`BackupScanner.Next(at_ts=read_ts)`.
637
-
5. **Renew on long dumps**: the producer calls `BeginBackup` again with
638
-
the same `read_ts` (carrying the existing token) every `ttl_ms / 3`
639
-
to extend the deadline. A multi-hour dump never relies on a single
640
-
30-minute pin.
745
+
5. **Renew on long dumps**: the producer calls
746
+
`RenewBackup(pin_token, ttl_ms)` every `ttl_ms / 3` to extend the
747
+
deadline. The `read_ts` is preserved across renewals; only the
748
+
deadline shifts. A multi-hour dump never relies on a single
749
+
30-minute pin. Renewals are cheap (in-memory map update), and the
750
+
producer's renewal goroutine logs a critical alert and aborts the
751
+
dump if a renewal call fails — letting the dump continue past the
752
+
TTL would silently produce a corrupted artifact (the compactor
753
+
would have already retired versions the in-flight scan still
754
+
depends on).
641
755
6. **`EndBackup(pin_token)`** releases the tracker entry. A producer
642
756
crash before EndBackup leaves the entry to be reaped by the sweeper.
643
757
@@ -688,7 +802,12 @@ elastickv-backup dump \
688
802
[--include-orphans] \
689
803
[--preserve-sqs-visibility] \
690
804
[--include-sqs-side-records] \
691
-
[--checksums sha256]
805
+
[--checksums sha256] \
806
+
[--ttl-ms 1800000] \
807
+
[--scan-page-size 1024] \
808
+
[--dynamodb-bundle-mode per-item|jsonl] \
809
+
[--dynamodb-bundle-size 64MiB] \
810
+
[--rename-collisions]
692
811
```
693
812
694
813
Internally it runs:
@@ -908,7 +1027,10 @@ Scope: out of this proposal; mentioned only to draw the boundary.
908
1027
| `TestPinWithDeadlineExpiry` | `PinWithDeadline(ts, now+100ms)` is auto-released by the sweeper after the deadline; compactor unblocked; `backup_pin_expired` log emitted |
909
1028
| `TestBeginBackupWaitsForLaggingShard` | Force shard B's `applied_index` to lag; `BeginBackup` polls until it catches up or times out with `FailedPrecondition`; no scan starts in the timeout case |
910
1029
| `TestBackupScannerPaging` | A range with > pageSize keys is returned across multiple `ScanAt` pages with no overlap, no gaps; iteration tolerates concurrent writes by completing at the pinned `read_ts` |
911
-
| `TestS3SidecarSuffixCollision` | A user S3 object key ending in `.elastickv-meta.json` is rejected without `--rename-collisions`; with the flag, the rename is recorded in `KEYMAP` |
1030
+
| `TestS3SidecarSuffixCollision` | A user S3 object key ending in `.elastickv-meta.json` is rejected without `--rename-collisions`; with the flag, the rename is recorded in `KEYMAP.jsonl` |
1031
+
| `TestS3PathFileVsDirectoryCollision` | Bucket holds both `path/to` (object) and `path/to/obj`; producer renames the shorter key to `path/to.elastickv-leaf-data` and records it in `KEYMAP.jsonl`; restore tool reverses it via `MANIFEST.s3_collision_strategy` |
1032
+
| `TestBeginBackupTooManyActiveBackups` | Reaching `max_active_backup_pins` returns `ResourceExhausted`; releasing one pin frees a slot for the next request |
1033
+
| `TestRenewBackupExtendsDeadline` | `RenewBackup` shifts the deadline; producer's failed-renewal path aborts the dump with a critical log line rather than continuing past the TTL |
0 commit comments