Skip to content

Commit b7d3449

Browse files
committed
feat(backend): async read-repair batching + unconditional ForwardSet-only repair
Introduce `WithDistReadRepairBatch(interval, maxBatchSize)` option that routes quorum-read repair fan-out through an async coalescing queue (`repairQueue` in pkg/backend/dist_read_repair.go). Repairs are queued by destination peer + key, retaining only the highest-version entry per (peer, key) — last-write-wins by version, tie-broken by origin. Concurrent reads of the same hot key produce one repair, not N; each collapsed duplicate bumps the new `dist.read.repair.coalesced` metric. The background flusher dispatches per-peer batches on the configured interval or when a peer's pending count hits `maxBatchSize`, using errgroup for parallel ForwardSet calls. `Stop()` drains the queue before returning; crash exit loses queued repairs by design, with merkle anti-entropy as the convergence safety net. Drop the defensive ForwardGet probe from `repairRemoteReplica`: every repair is now a single unconditional ForwardSet. The receiver's `applySet` already version-compares and noops downgrades, making the probe pure duplication (~50 % wire-call reduction per repair, independent of whether batching is enabled). New OTel metrics: - `dist.read.repair.batched` — ForwardSet calls dispatched by the flusher - `dist.read.repair.coalesced` — duplicate (peer, key) enqueues collapsed Eight unit tests (pkg/backend/dist_read_repair_test.go) cover coalesce semantics, distinct-peer independence, parallel per-peer flush, nil-transport noop, size-threshold inline flush, stop-drain guarantee, isHigherVersion tie-break, and concurrent-enqueue race-safety. Three integration tests (tests/hypercache_distmemory_readrepair_batch_test.go) drive a 3-node RF=3 Quorum cluster end-to-end. Also: - Refactor Stop() stop-channel teardown into closeBackgroundLoops() - Fix Makefile pre-commit target: guard pyenv activation with command -v - Add golang.org/x/sync v0.20.0 (errgroup); bump shamaton/msgpack to v3.1.1 - Add cspell words: amortisation, coalescer, distmemory, errgroup, readrepair - Document batching option in docs/operations.md under Tuning — read-repair batching
1 parent 21adb3a commit b7d3449

11 files changed

Lines changed: 1178 additions & 70 deletions

CHANGELOG.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,34 @@ All notable changes to HyperCache are recorded here. The format follows
88

99
### Added
1010

11+
- **Async read-repair batching (Phase 4) + unconditional `ForwardSet`-only repair.** Two composing changes
12+
in the same PR that together cut the wire-call cost of read-repair under quorum reads. (1) The defensive
13+
`ForwardGet` probe in `repairRemoteReplica` is gone — every repair is now exactly one `ForwardSet`,
14+
because the receiver's `applySet` already version-compares and noops downgrades, so the probe was pure
15+
duplication. ~50% wire-call reduction per repair regardless of batching. (2) New opt-in
16+
[`backend.WithDistReadRepairBatch(interval, maxBatchSize)`](pkg/backend/dist_memory.go) option queues
17+
repairs by destination peer + key (last-write-wins by `(version, origin)`) and dispatches per-peer batches
18+
on the interval or when a peer's pending count hits `maxBatchSize`. Concurrent reads of the same hot key
19+
produce ONE repair through the queue, not N — the coalescer collapses duplicate `(peer, key)` entries
20+
and bumps the new `dist.read_repair.coalesced` counter per collapsed enqueue. Disabled by default
21+
(`interval == 0` = current synchronous behavior preserved, so `TestDistMemoryReadRepair` and
22+
`TestDistMemoryRemoveReplication` pass byte-identical). Clean shutdown drains the queue inside `Stop()`;
23+
crash exit loses queued repairs by design, with merkle anti-entropy as the convergence safety net.
24+
New [`pkg/backend/dist_read_repair.go`](pkg/backend/dist_read_repair.go) hosts the `repairQueue` type
25+
with errgroup-driven per-peer parallel `ForwardSet` dispatch. Eight unit tests in
26+
[`pkg/backend/dist_read_repair_test.go`](pkg/backend/dist_read_repair_test.go) cover the coalesce rule
27+
(same `(peer, key)` keeps the higher version, distinct peers stay independent), the size-threshold
28+
inline flush, the nil-transport noop path, the `Stop()` drain semantics, the `(version, origin)`
29+
tie-break rule, and concurrent-enqueue race-safety. Three integration tests in
30+
[`tests/hypercache_distmemory_readrepair_batch_test.go`](tests/hypercache_distmemory_readrepair_batch_test.go)
31+
drive the end-to-end shape — a 3-node RF=3 ConsistencyQuorum cluster, one node's local copy dropped,
32+
N concurrent Gets from a third node — and assert the batched flush heals the dropped node, parallel
33+
reads coalesce to ≤2 dispatches (one per remote owner) regardless of N, and `Stop()` drains queued
34+
repairs before returning. Two new OTel metrics:
35+
`dist.read_repair.batched` (per actual `ForwardSet` dispatched by the queue's flusher) and
36+
`dist.read_repair.coalesced` (per duplicate-enqueue collapsed). New "Tuning — read-repair batching"
37+
section in [`docs/operations.md`](docs/operations.md) covers the option shape, the divergence-window
38+
trade-off, the two metrics, and when to enable it (high read-amplification with stable hot keys).
1139
- **Token-refresh visibility for the OIDC source.** Closes RFC 0003 open question 6: the
1240
`WithOIDCClientCredentials` source now wraps its `oauth2.TokenSource` with a logger that emits one
1341
`"oidc token rotated"` Info line per real rotation (expiry change), staying silent on cached returns.

Makefile

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -216,23 +216,18 @@ docs-serve: docs-build
216216
PYENV_VERSION=mkdocs mkdocs serve
217217

218218
pre-commit:
219-
@eval "$$(pyenv init -)" && \
220-
pyenv activate pre-commit && \
221-
pre-commit run -a trailing-whitespace && \
222-
pre-commit run -a end-of-file-fixer && \
223-
pre-commit run -a markdownlint && \
224-
pre-commit run -a yamllint && \
225-
pre-commit run -a cspell && \
226-
pre-commit run -a cspell
227-
228-
# check_command_exists is a helper function that checks if a command exists.
229-
define check_command_exists
230-
@which $(1) > /dev/null 2>&1 || (echo "$(1) command not found" && exit 1)
231-
endef
232-
233-
ifeq ($(call check_command_exists,$(1)),false)
234-
$(error "$(1) command not found")
235-
endif
219+
@if command -v pyenv >/dev/null 2>&1; then \
220+
eval "$$(pyenv init -)" && \
221+
pyenv activate pre-commit && \
222+
pre-commit run -a trailing-whitespace && \
223+
pre-commit run -a end-of-file-fixer && \
224+
pre-commit run -a markdownlint && \
225+
pre-commit run -a yamllint && \
226+
pre-commit run -a cspell && \
227+
pre-commit run -a cspell; \
228+
else \
229+
echo "pyenv command not found"; \
230+
fi
236231

237232
# help prints a list of available targets and their descriptions.
238233
help:

cspell.config.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ words:
3939
- Akudx
4040
- aliceonly
4141
- ALPN
42+
- amortisation
4243
- APITLS
4344
- APITLSCA
4445
- assertable
@@ -68,6 +69,7 @@ words:
6869
- clientcredentials
6970
- cmap
7071
- Cmder
72+
- coalescer
7173
- codacy
7274
- codebook
7375
- codegen
@@ -86,12 +88,14 @@ words:
8688
- derr
8789
- disambiguator
8890
- distconfig
91+
- distmemory
8992
- distroless
9093
- EDITMSG
9194
- elif
9295
- Equalf
9396
- errcheck
9497
- errchkjson
98+
- errgroup
9599
- Errorf
96100
- errp
97101
- eventbus
@@ -164,6 +168,7 @@ words:
164168
- keepalive
165169
- keepalives
166170
- keyf
171+
- keyfmt
167172
- keypair
168173
- lamport
169174
- lblll
@@ -216,6 +221,7 @@ words:
216221
- pygments
217222
- pymdownx
218223
- reaad
224+
- readrepair
219225
- recvcheck
220226
- rediscluster
221227
- Redocly

docs/operations.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,44 @@ otherwise mark a healthy peer suspect. `dist.heartbeat.indirect_probe.refuted` r
123123
probes are saving you from spurious flapping; rising `dist.heartbeat.indirect_probe.failure` indicates the
124124
peer is genuinely unreachable from multiple vantage points.
125125

126+
## Tuning — read-repair batching
127+
128+
Quorum reads (`ConsistencyQuorum` / `ConsistencyAll`) issue best-effort read-repair to every owner of the
129+
chosen item — one `ForwardSet` per owner per read. Under read-heavy workloads on hot keys with stale replicas,
130+
this fan-out becomes the dominant network cost on the dist transport.
131+
132+
`backend.WithDistReadRepairBatch(interval, maxBatchSize)` opts into async coalescing: repairs are queued by
133+
destination peer + key (last-write-wins by `(version, origin)`) and dispatched by a background flusher on the
134+
configured interval or when a per-peer batch hits `maxBatchSize`. Concurrent reads of the same hot key
135+
produce one repair through the queue, not N.
136+
137+
```go
138+
dm, _ := backend.NewDistMemory(ctx,
139+
backend.WithDistReadConsistency(backend.ConsistencyQuorum),
140+
backend.WithDistReadRepairBatch(100*time.Millisecond, 64),
141+
)
142+
```
143+
144+
**Trade-off.** Batched mode introduces a divergence window of up to `interval` where a stale replica stays
145+
stale. Merkle anti-entropy (`WithDistMerkleAutoSync`) is the safety net for any repair the queue drops on
146+
crash exit; clean shutdown drains the queue inside `Stop()`. Disabled by default — the existing synchronous
147+
read-repair path is preserved when the option is not set.
148+
149+
**Detection.** Two new metrics quantify the amortisation:
150+
151+
- `dist.read_repair.batched` — repairs dispatched via the queue's flusher (one bump per actual `ForwardSet`).
152+
- `dist.read_repair.coalesced` — repairs short-circuited because a same-version-or-higher entry was already
153+
queued for the same `(peer, key)`. Every concurrent same-key read past the first bumps this.
154+
155+
The aggregate `dist.read_repair` counter still bumps per repair attempt (sync dispatch or enqueue). Operators
156+
read `dist.read_repair.batched / dist.read_repair` for the fraction routed through the queue, and
157+
`dist.read_repair.coalesced` as the saved-wire-call counter.
158+
159+
**When to enable.** High read-amplification with stable hot keys; the typical pattern is a partitioned
160+
workload where a small set of keys takes the majority of reads and one of the owners drops out briefly. Not
161+
useful for write-heavy paths (which don't go through read-repair) or for workloads with a flat key
162+
distribution (the coalescer has little to collapse).
163+
126164
## Operational tasks
127165

128166
### Drain a node

go.mod

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ require (
2020
go.opentelemetry.io/otel/trace v1.43.0
2121
golang.org/x/crypto v0.51.0
2222
golang.org/x/oauth2 v0.36.0
23+
golang.org/x/sync v0.20.0
2324
gopkg.in/yaml.v3 v3.0.1
2425
)
2526

@@ -36,6 +37,7 @@ require (
3637
github.com/mattn/go-isatty v0.0.22 // indirect
3738
github.com/philhofer/fwd v1.2.0 // indirect
3839
github.com/pmezard/go-difflib v1.0.0 // indirect
40+
github.com/shamaton/msgpack/v3 v3.1.1 // indirect
3941
github.com/tinylib/msgp v1.6.4 // indirect
4042
github.com/valyala/bytebufferpool v1.0.0 // indirect
4143
github.com/valyala/fasthttp v1.71.0 // indirect

go.sum

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,8 @@ github.com/redis/go-redis/v9 v9.19.0 h1:XPVaaPSnG6RhYf7p+rmSa9zZfeVAnWsH5h3lxthO
5555
github.com/redis/go-redis/v9 v9.19.0/go.mod h1:v/M13XI1PVCDcm01VtPFOADfZtHf8YW3baQf57KlIkA=
5656
github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ=
5757
github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc=
58-
github.com/shamaton/msgpack/v3 v3.1.0 h1:jsk0vEAqVvvS9+fTZ5/EcQ9tz860c9pWxJ4Iwecz8gU=
59-
github.com/shamaton/msgpack/v3 v3.1.0/go.mod h1:DcQG8jrdrQCIxr3HlMYkiXdMhK+KfN2CitkyzsQV4uc=
58+
github.com/shamaton/msgpack/v3 v3.1.1 h1:1EkrTpc68/H9bziTVw9eDLHLeK2v8aAyyv60quMqIY4=
59+
github.com/shamaton/msgpack/v3 v3.1.1/go.mod h1:DcQG8jrdrQCIxr3HlMYkiXdMhK+KfN2CitkyzsQV4uc=
6060
github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U=
6161
github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U=
6262
github.com/tinylib/msgp v1.6.4 h1:mOwYbyYDLPj35mkA2BjjYejgJk9BuHxDdvRnb6v2ZcQ=
@@ -95,6 +95,8 @@ golang.org/x/net v0.54.0 h1:2zJIZAxAHV/OHCDTCOHAYehQzLfSXuf/5SoL/Dv6w/w=
9595
golang.org/x/net v0.54.0/go.mod h1:Sj4oj8jK6XmHpBZU/zWHw3BV3abl4Kvi+Ut7cQcY+cQ=
9696
golang.org/x/oauth2 v0.36.0 h1:peZ/1z27fi9hUOFCAZaHyrpWG5lwe0RJEEEeH0ThlIs=
9797
golang.org/x/oauth2 v0.36.0/go.mod h1:YDBUJMTkDnJS+A4BP4eZBjCqtokkg1hODuPjwiGPO7Q=
98+
golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
99+
golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
98100
golang.org/x/sys v0.44.0 h1:ildZl3J4uzeKP07r2F++Op7E9B29JRUy+a27EibtBTQ=
99101
golang.org/x/sys v0.44.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
100102
golang.org/x/text v0.37.0 h1:Cqjiwd9eSg8e0QAkyCaQTNHFIIzWtidPahFWR83rTrc=

0 commit comments

Comments
 (0)