Commit 8f02bc1
authored
docs(design): scaling roadmap — multi-region, route scale-out, storage tier, coordinator (#958)
## Summary
Proposed roadmap doc covering 4 scaling subsystems elastickv will need
within one growth cycle. Each subsystem ships its own `*_proposed_*.md`
milestone doc when its work queues up — this file is the shared north
star + sequencing constraint.
Subsystems:
- §3 **Routing** (1M routes target): delta-watcher streaming, B-tree
RouteEngine, batched splits
- §4 **Multi-region** (2-3 region active-active): WAN raft tuning,
per-region HLC ceiling with monotone merge, region-local catalog mirrors
- §5 **Storage** (5-10 TB/shard): SST-ingest snapshot transfer, shared
block cache + per-shard tuning, per-shard compactor, S3 snapshot offload
- §6 **Coordinator** (5-10x QPS): per-group HLC ceiling (kill
default-group SPOF), follower reads, cross-shard 2PC, partitioned lock
resolver, leader-proxy circuit breaker
Cross-cutting: every monotone-merge across new boundaries reuses M2
hotspot-split's `SetPhysicalCeiling + Observe` primitive. Capability bit
rollout follows the existing `cap_migration_v2` contract.
Sequencing (§8): §6 M1 (per-group HLC) → §3 M1 (delta watcher) → §5
M1/M2 (SST ingest + shared cache) → §3 M2/M3 → §4 (regions) → §6 M2-M5 →
§5 M4. §6 M1 first because it removes the default-group HLC SPOF that
every later milestone otherwise inherits.
10 open questions in §10 — the apply-pipeline cost of per-group HLC,
partitioned-region migration policy, and cross-shard txn lock ordering
need design-level decisions before the first milestone PR opens.
## Self-review
1. **Data loss** — N/A doc-only; the doc itself preserves M2
hotspot-split's monotone-merge contract for every new boundary
2. **Concurrency / distributed failures** — surfaced as OQ (per-group
HLC propose cost, partitioned-region policy, cross-shard txn deadlock
prevention)
3. **Performance** — SLO targets quantified per subsystem (§3.1, §4.1,
§5.1, §6.1)
4. **Data consistency** — explicitly: every new HLC boundary uses the
same SetPhysicalCeiling primitive; capability bits gate every wire
change
5. **Test coverage** — §9 table maps each subsystem to unit/property
tests + Jepsen workloads (existing route-shuffle extended to 100k
routes; new multi-region partition workload; existing workloads with 10x
data)
## Test plan
- [x] doc reads coherent end-to-end
- [x] sequencing graph in §8 has no cycles
- [x] cross-refs to existing M2 hotspot-split design verified
- [ ] reviewer feedback on §10 open questions (these are the
design-level decisions blocking first milestone PR)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Added a comprehensive scaling roadmap detailing phased plans for
multi-region active‑active, routing scale‑out, storage tier
capacity/retention, coordinator/API gateway scaling, rollout sequencing,
observability guidance, testing strategies, and open questions.
* **Tests**
* Improved learner-promotion test to use a robust retry/wait approach
and explicit success criteria to better handle transient catch‑up
conditions during node promotion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->2 files changed
Lines changed: 712 additions & 3 deletions
0 commit comments