Skip to content

Commit e3b2936

Browse files
committed
docs(design): address Claude round-3 review on logical backup proposal
Round-2 carry-overs: - Move NewBackupScanner from *ShardedCoordinator to *ShardStore. ShardedCoordinator (kv/sharded_coordinator.go:124) has no *ShardStore field; ShardStore is the natural home for ScanAt-adjacent code. - Add a Phase-1 note that ScanAt's doc comment must be updated to reflect that BeginBackup's applied_index fence makes cross-shard ScanAt globally consistent. - Add pin_token to ListAdaptersAndScopesRequest so the scope scan is bounded to the pinned read_ts; concurrently-created scopes are not surfaced. - Add hll/ to the Redis db_0 example tree (top-level layout already had it; the per-adapter example lagged). - Add --stream-merge-strategy / --preserve-ttl / --preserve-visibility to the elastickv-restore apply CLI block. New nits from the Gemini changes: - Pick one form for DynamoDB bundle-mode flags. Two-flag form (--dynamodb-bundle-mode jsonl + --dynamodb-bundle-size 64MiB) throughout; drop the inline jsonl:size=64MiB form. - RenewBackupRequest.ttl_ms documents the same 60s-24h range constraint as BeginBackupRequest, bounded above by backup_max_ttl_ms; out-of-range returns InvalidArgument. Tests added: TestRenewBackupTTLRangeValidation, TestListAdaptersAndScopesAtPinTS.
1 parent bbd1bb8 commit e3b2936

1 file changed

Lines changed: 57 additions & 13 deletions

File tree

docs/design/2026_04_29_proposed_logical_backup.md

Lines changed: 57 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -312,9 +312,10 @@ item table emits 50 million inodes. On most modern filesystems
312312
slow for live filesystem operations.
313313

314314
For tables where the inode count is the binding constraint, the
315-
producer accepts an opt-in `--dynamodb-bundle-mode jsonl[:size=64MiB]`
316-
that emits items as `items/data-<part-id>.jsonl` instead, packed up
317-
to a configurable per-file size budget:
315+
producer accepts an opt-in `--dynamodb-bundle-mode jsonl` (paired
316+
with `--dynamodb-bundle-size 64MiB`, defaulting to that value) that
317+
emits items as `items/data-<part-id>.jsonl` instead, packed up to the
318+
configurable per-file size budget:
318319

319320
```
320321
dynamodb/orders/
@@ -435,8 +436,10 @@ redis/
435436
│ └── tags%3Apost1.json
436437
├── zsets/
437438
│ └── leaderboard.json
438-
└── streams/
439-
└── events.jsonl
439+
├── streams/
440+
│ └── events.jsonl
441+
└── hll/
442+
└── pfcount%3Auniques.bin
440443
```
441444

442445
Redis values are encoded so that `redis-cli --pipe` (or the equivalent in
@@ -611,7 +614,7 @@ the proposal does add is a thin **iterator wrapper** so a multi-million
611614
key range does not need to be materialized in one call:
612615

613616
```go
614-
// kv/backup_scan.go (new)
617+
// kv/shard_store.go (new method on *ShardStore — same file as ScanAt)
615618
type BackupScanner interface {
616619
// Next advances the iterator. Returns (nil, false, nil) at end-of-range.
617620
// The returned KVPair is owned by the caller until the next call to Next.
@@ -620,12 +623,36 @@ type BackupScanner interface {
620623
}
621624

622625
// NewBackupScanner returns a forward iterator over [start, end) at ts.
623-
// Internally calls ShardStore.ScanAt in pages of pageSize, chaining
626+
// Internally calls ScanAt in pages of pageSize, chaining
624627
// `start = lastReturnedKey + \x00` between pages so no lock is held
625628
// across pages and the compactor's MVCC retention budget is bounded.
626-
func (c *ShardedCoordinator) NewBackupScanner(start, end []byte, ts uint64, pageSize int) BackupScanner
629+
func (s *ShardStore) NewBackupScanner(start, end []byte, ts uint64, pageSize int) BackupScanner
627630
```
628631

632+
`NewBackupScanner` lives on `*ShardStore` (its natural home, sharing a
633+
file with `ScanAt`) rather than `*ShardedCoordinator`. The coordinator
634+
type today (`kv/sharded_coordinator.go:124`) has no `*ShardStore`
635+
field — only `engine`, `router`, `groups`, `clock`, and a raw
636+
`store.MVCCStore` — so placing the scanner on the coordinator would
637+
force either an awkward extra field plus `main.go` wiring or a
638+
roundabout coordinator → router → ShardStore lookup. Putting it on
639+
`*ShardStore` keeps the scan primitive cohesive with `ScanAt` and
640+
lets the admin server reach it directly via the existing
641+
`AdminServer` → `ShardStore` wiring used by `GetRaftGroups`.
642+
643+
> **Phase 1 also updates `ScanAt`'s doc comment**
644+
> (`kv/shard_store.go:101–105`) which today warns _"this method is
645+
> NOT a globally consistent snapshot; the caller must implement a
646+
> cross-shard snapshot fence"_. Under the backup contract the caller
647+
> *is* fencing — `BeginBackup` waits for `applied_index ≥ f(ts)` on
648+
> every group before any `Next` call. The new comment must say
649+
> something like: _"consistent across groups when the caller has
650+
> fenced `applied_index ≥ f(ts)` cluster-wide before calling, as
651+
> `BeginBackup` does. Without that fence, results are eventually-
652+
> consistent across groups."_ Without this comment update, code
653+
> review of the Phase 1 producer will flag the `ScanAt` use as
654+
> contradicting the warning.
655+
629656
`pageSize` is the same `ScanAt` `limit` parameter; the producer's
630657
default is 1024 and is exposed as a `--scan-page-size` CLI flag.
631658
Per-adapter encoders consume `BackupScanner` and emit one record per
@@ -660,17 +687,29 @@ message BeginBackupResponse {
660687
// dump calls this every ttl_ms/3 (producer default) so a multi-hour
661688
// scan never relies on a single TTL window. The read_ts is preserved
662689
// across renewals; only the deadline shifts.
663-
message RenewBackupRequest { bytes pin_token = 1; uint64 ttl_ms = 2; }
690+
message RenewBackupRequest {
691+
bytes pin_token = 1;
692+
// Same range constraint as BeginBackupRequest.ttl_ms:
693+
// 60s–24h, bounded above by backup_max_ttl_ms. Out-of-range values
694+
// are rejected with InvalidArgument.
695+
uint64 ttl_ms = 2;
696+
}
664697
message RenewBackupResponse { uint64 ttl_ms_effective = 1; }
665698

666699
message EndBackupRequest { bytes pin_token = 1; }
667700
message EndBackupResponse {}
668701

702+
// ListAdaptersAndScopes runs the per-adapter metadata-prefix scan at
703+
// the read_ts associated with pin_token, so the returned scope list
704+
// is a precise match for what BackupScanner will surface. A new scope
705+
// created by a concurrent client between BeginBackup and this call is
706+
// invisible at the pinned read_ts and therefore not listed.
707+
message ListAdaptersAndScopesRequest { bytes pin_token = 1; }
669708
message ListAdaptersAndScopesResponse {
670-
repeated string dynamodb_tables = 1; // from !ddb|meta|table| scan
671-
repeated string s3_buckets = 2; // from !s3|bucket|meta| scan
709+
repeated string dynamodb_tables = 1; // from !ddb|meta|table| scan at read_ts
710+
repeated string s3_buckets = 2; // from !s3|bucket|meta| scan at read_ts
672711
repeated uint32 redis_databases = 3; // {0} until multi-db lands
673-
repeated string sqs_queues = 4; // from !sqs|queue|meta| scan
712+
repeated string sqs_queues = 4; // from !sqs|queue|meta| scan at read_ts
674713
}
675714
```
676715

@@ -833,7 +872,10 @@ elastickv-restore apply \
833872
[--adapter dynamodb,s3,redis,sqs] \
834873
[--scope dynamodb=orders] \
835874
[--mode replace|merge|skip-existing] \
836-
[--rate-limit 5000ops/s]
875+
[--rate-limit 5000ops/s] \
876+
[--preserve-ttl] \
877+
[--preserve-visibility] \
878+
[--stream-merge-strategy reject|auto-id]
837879
```
838880

839881
`replace` deletes the target scope before re-importing; `merge`
@@ -1031,6 +1073,8 @@ Scope: out of this proposal; mentioned only to draw the boundary.
10311073
| `TestS3PathFileVsDirectoryCollision` | Bucket holds both `path/to` (object) and `path/to/obj`; producer renames the shorter key to `path/to.elastickv-leaf-data` and records it in `KEYMAP.jsonl`; restore tool reverses it via `MANIFEST.s3_collision_strategy` |
10321074
| `TestBeginBackupTooManyActiveBackups` | Reaching `max_active_backup_pins` returns `ResourceExhausted`; releasing one pin frees a slot for the next request |
10331075
| `TestRenewBackupExtendsDeadline` | `RenewBackup` shifts the deadline; producer's failed-renewal path aborts the dump with a critical log line rather than continuing past the TTL |
1076+
| `TestRenewBackupTTLRangeValidation` | `RenewBackup` with `ttl_ms < 60s` or `ttl_ms > backup_max_ttl_ms` returns `InvalidArgument`; in-range values succeed |
1077+
| `TestListAdaptersAndScopesAtPinTS` | A scope created (e.g. CreateTable) after `BeginBackup` is not surfaced by `ListAdaptersAndScopes(pin_token)`; pre-existing scopes are |
10341078

10351079
### P1
10361080

0 commit comments

Comments
 (0)