Skip to content

Commit 2347c33

Browse files
authored
Merge pull request #108 from hyp3rd/feat/dist-mem-cache
feat(dist): add structured logging to dist backend (Phase A.1)
2 parents fd2db48 + 4d5b5f9 commit 2347c33

35 files changed

Lines changed: 1644 additions & 137 deletions

.mdl_style.rb

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,11 @@
55
rule 'MD007', :indent => 3
66

77
rule "MD029", style => "one"
8+
9+
# Keep-a-Changelog (https://keepachangelog.com) uses repeated `### Added`,
10+
# `### Fixed`, `### Security` headings under each `## [version]` heading by
11+
# design. MD024 with the default config flags those as duplicates.
12+
# allow_different_nesting permits same-text headings as long as they sit
13+
# under distinct parent headings — which is exactly the Keep-a-Changelog
14+
# shape, and still catches genuine duplicates within the same section.
15+
rule "MD024", :allow_different_nesting => true

CHANGELOG.md

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,48 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

77
## [Unreleased]
88

9-
## [2.0.1] — 2026-05-05
9+
### Added
10+
11+
- **Structured logging on the dist backend.** New `WithDistLogger(*slog.Logger)`
12+
option wires a structured logger into the dist backend's background
13+
loops (heartbeat, hint replay, rebalance, merkle sync) and operational
14+
error surfaces (HTTP listener bind failures, serve-goroutine exits,
15+
failed migrations during rebalance, dropped hints, peer state
16+
transitions). Library default is silent — `WithDistLogger` not called
17+
installs a `slog.DiscardHandler` so the dist backend never writes to
18+
stderr unless the caller opts in. Every record is pre-bound with
19+
`component=dist_memory` and `node_id=<id>` attributes for grep/filter.
20+
Phase A.1 of the production-readiness work.
21+
- **OpenTelemetry tracing on the dist backend.** New
22+
`WithDistTracerProvider(trace.TracerProvider)` option opens spans on
23+
every public `Get` / `Set` / `Remove`, with child spans
24+
(`dist.replicate.set` / `dist.replicate.remove`) per peer during
25+
fan-out. Span attributes include `cache.key.length`,
26+
`dist.consistency`, `dist.owners.count`, `dist.acks`, `cache.hit`,
27+
and `peer.id`. Cache key *values* are intentionally never recorded
28+
on spans — keys can be PII (user IDs, session tokens). Library
29+
default is a no-op tracer (`noop.NewTracerProvider`), so spans cost
30+
nothing unless the caller opts in. New `ConsistencyLevel.String()`
31+
method renders consistency levels human-readably for log/span attrs.
32+
Phase A.2 of the production-readiness work.
33+
- **OpenTelemetry metrics on the dist backend.** New
34+
`WithDistMeterProvider(metric.MeterProvider)` option registers an
35+
observable instrument for every field on `DistMetrics` — counters
36+
for cumulative totals (`dist.write.attempts`, `dist.forward.*`,
37+
`dist.hinted.*`, `dist.merkle.syncs`, `dist.rebalance.*`, etc.),
38+
gauges for current state (`dist.members.alive`,
39+
`dist.tombstones.active`, `dist.hinted.bytes`, last-operation
40+
latencies in nanoseconds, etc.). A single registered callback
41+
observes all instruments from one `Metrics()` snapshot per
42+
collection cycle, so there is no per-operation overhead beyond the
43+
existing atomic counters. Names use the `dist.` prefix so a
44+
Prometheus exporter renders them under a single subsystem.
45+
`Stop` unregisters the callback so the SDK does not invoke it
46+
against a stopped backend. Library default is a no-op meter, so
47+
metrics cost nothing unless the caller opts in. Phase A.3 of the
48+
production-readiness work.
49+
50+
## [0.5.0] — 2026-05-05
1051

1152
### Security
1253

@@ -25,7 +66,7 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
2566
in via the new `DistHTTPAuth.AllowAnonymousInbound` field. All other
2667
configurations (`Token`-only, `Token+ServerVerify`, `Token+ClientSign`,
2768
`ServerVerify`-only) are unaffected. Reported by the post-tag
28-
security review; addressed before any v2.0.0 public announcement.
69+
security review; addressed before any v0.5.0 public announcement.
2970

3071
### Added
3172

@@ -34,7 +75,7 @@ adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
3475
- `sentinel.ErrInsecureAuthConfig` — surfaced from `NewDistMemory` when
3576
the auth policy would silently disable inbound enforcement.
3677

37-
## [2.0.0] — 2026-05-04
78+
## [0.4.3] — 2026-05-04
3879

3980
A modernization release. The headline themes:
4081

@@ -86,7 +127,6 @@ RFCs that informed the design decisions live under [docs/rfcs/](docs/rfcs/).
86127
### Performance
87128

88129
Measurements on Apple M4 Pro, `go test -bench`, `count=5`, benchstat.
89-
Full release snapshot captured in [bench-v2.0.0.txt](bench-v2.0.0.txt).
90130

91131
- **Per-shard atomic `Count`.** `BenchmarkConcurrentMap_Count`:
92132
53 → ~10 ns/op. `_CountParallel`: 1181 → ~13 ns/op. Eliminates the
@@ -186,5 +226,5 @@ Worth surfacing for contributors:
186226
[RFC document](docs/rfcs/0001-backend-owned-eviction.md) preserves
187227
the measurement and the lessons.
188228

189-
[Unreleased]: https://github.com/hyp3rd/hypercache/compare/v2.0.0...HEAD
190-
[2.0.0]: https://github.com/hyp3rd/hypercache/releases/tag/v2.0.0
229+
Unreleased: <https://github.com/hyp3rd/hypercache/compare/v0.5.0...HEAD>
230+
Released: [0.5.0](https://github.com/hyp3rd/hypercache/releases/tag/v0.5.0)

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ Available algorithm names you can pass to `WithEvictionAlgorithm`:
137137

138138
Note: ARC is experimental and isn’t included in the default registry. If you choose to use it, register it manually or enable it explicitly in your build.
139139

140-
#### Sharded eviction (default since v2.0.0)
140+
#### Sharded eviction (default since v0.5.0)
141141

142142
The configured algorithm is wrapped by a 32-shard router (`pkg/eviction/sharded.go`) that uses the same key hash as `ConcurrentMap` — so a key's data shard and eviction shard line up. This eliminates the global mutex contention single-instance algorithms (LRU/LFU/Clock/CAWOLFU) suffer from. Total capacity is honored within ±32 (one slot of slack per shard), and items evict per-shard rather than in strict global LRU/LFU order.
143143

@@ -263,7 +263,7 @@ Limitations / not yet implemented:
263263
- Compression on the wire.
264264
- Persistence / durability (out of scope presently).
265265

266-
#### Transport hardening (since v2.0.0)
266+
#### Transport hardening (since v0.5.0)
267267

268268
The dist HTTP server and the auto-created HTTP client share a single configuration surface — apply the same option to every node in the cluster.
269269

@@ -347,7 +347,7 @@ Test helpers `AddPeer` and `RemovePeer` simulate join / leave events that trigge
347347
| Advanced versioning (HLC/vector) | Planned |
348348
| Client SDK (direct routing) | Planned |
349349
| Tracing spans | Planned |
350-
| Security (TLS/auth) | Done (since v2.0.0; see "Transport hardening") |
350+
| Security (TLS/auth) | Done (since v0.5.0; see "Transport hardening") |
351351
| Compression | Planned |
352352
| Persistence | Out of scope (current phase) |
353353
| Chaos / fault injection | Planned |

__examples/observability/observability.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ func main() {
3434
tracer := trace.NewNoopTracerProvider().Tracer("hypercache/examples")
3535

3636
// Apply OTel tracing and metrics middleware.
37-
svc = hypercache.ApplyMiddleware(svc,
37+
svc = hypercache.ApplyMiddleware(
38+
svc,
3839
func(next hypercache.Service) hypercache.Service {
3940
return middleware.NewOTelTracingMiddleware(next, tracer, middleware.WithCommonAttributes(
4041
attribute.String("component", "hypercache"),

__examples/service/service.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,8 @@ func main() {
3737
logger := log.Default()
3838

3939
// apply middleware in the same order as you want to execute them
40-
svc = hypercache.ApplyMiddleware(svc,
40+
svc = hypercache.ApplyMiddleware(
41+
svc,
4142
// middleware.YourMiddleware,
4243
func(next hypercache.Service) hypercache.Service {
4344
return middleware.NewLoggingMiddleware(next, logger)

cspell.config.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,8 @@ words:
132132
- longbridgeapp
133133
- maxmemory
134134
- memprofile
135+
- metricdata
136+
- metricnoop
135137
- Merkle
136138
- mfinal
137139
- Mgmt
@@ -146,6 +148,7 @@ words:
146148
- noctx
147149
- noinlineerr
148150
- nolint
151+
- nolintlint
149152
- nonamedreturns
150153
- nosec
151154
- NOVENDOR
@@ -162,6 +165,7 @@ words:
162165
- Repls
163166
- rerr
164167
- sarif
168+
- sdkmetric
165169
- sectools
166170
- securego
167171
- sess
@@ -180,6 +184,7 @@ words:
180184
- thelper
181185
- toplevel
182186
- tparallel
187+
- tracetest
183188
- traefik
184189
- ugorji
185190
- unmarshals

docs/rfcs/0001-backend-owned-eviction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -253,7 +253,7 @@ Per the RFC's own discipline (`Reject A and revisit if any criterion fails`):
253253
"slower on Get, semantically-correct LRU." Default stays legacy.
254254
1. **Do not pursue Option A2** (co-located locks) — the win Option A
255255
would have justified A2 isn't there to amortize the bigger refactor.
256-
1. **The "Get does not touch LRU" semantic gap is a separate concern**
256+
1n. **The "Get does not touch LRU" semantic gap is a separate concern**
257257
that could be addressed inside the legacy path (have HyperCache.Get
258258
call `evictionAlgorithm.Get(key)`) at similar cost to the Item-aware
259259
Touch — i.e., the cost is fundamental to "real LRU", not specific

go.mod

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,16 @@ require (
1313
github.com/ugorji/go/codec v1.3.1
1414
go.opentelemetry.io/otel v1.43.0
1515
go.opentelemetry.io/otel/metric v1.43.0
16+
go.opentelemetry.io/otel/sdk v1.43.0
17+
go.opentelemetry.io/otel/sdk/metric v1.43.0
1618
go.opentelemetry.io/otel/trace v1.43.0
1719
)
1820

1921
require (
2022
github.com/andybalholm/brotli v1.2.1 // indirect
2123
github.com/davecgh/go-spew v1.1.1 // indirect
24+
github.com/go-logr/logr v1.4.3 // indirect
25+
github.com/go-logr/stdr v1.2.2 // indirect
2226
github.com/gofiber/schema v1.7.1 // indirect
2327
github.com/gofiber/utils/v2 v2.0.4 // indirect
2428
github.com/google/uuid v1.6.0 // indirect
@@ -27,15 +31,14 @@ require (
2731
github.com/mattn/go-isatty v0.0.22 // indirect
2832
github.com/philhofer/fwd v1.2.0 // indirect
2933
github.com/pmezard/go-difflib v1.0.0 // indirect
30-
github.com/rogpeppe/go-internal v1.14.1 // indirect
3134
github.com/tinylib/msgp v1.6.4 // indirect
3235
github.com/valyala/bytebufferpool v1.0.0 // indirect
3336
github.com/valyala/fasthttp v1.71.0 // indirect
37+
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
3438
go.uber.org/atomic v1.11.0 // indirect
3539
golang.org/x/crypto v0.50.0 // indirect
3640
golang.org/x/net v0.53.0 // indirect
3741
golang.org/x/sys v0.43.0 // indirect
3842
golang.org/x/text v0.36.0 // indirect
39-
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
4043
gopkg.in/yaml.v3 v3.0.1 // indirect
4144
)

go.sum

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c
1010
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
1111
github.com/fxamacker/cbor/v2 v2.9.1 h1:2rWm8B193Ll4VdjsJY28jxs70IdDsHRWgQYAI80+rMQ=
1212
github.com/fxamacker/cbor/v2 v2.9.1/go.mod h1:vM4b+DJCtHn+zz7h3FFp/hDAI9WNWCsZj23V5ytsSxQ=
13+
github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
1314
github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
1415
github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
1516
github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
@@ -34,11 +35,8 @@ github.com/klauspost/compress v1.18.6 h1:2jupLlAwFm95+YDR+NwD2MEfFO9d4z4Prjl1XXD
3435
github.com/klauspost/compress v1.18.6/go.mod h1:cwPg85FWrGar70rWktvGQj8/hthj3wpl0PGDogxkrSQ=
3536
github.com/klauspost/cpuid/v2 v2.2.10 h1:tBs3QSyvjDyFTq3uoc/9xFpCuOsJQFNPiAhYdw2skhE=
3637
github.com/klauspost/cpuid/v2 v2.2.10/go.mod h1:hqwkgyIinND0mEev00jJYCxPNVRVXFQeu1XKlok6oO0=
37-
github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI=
3838
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
3939
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
40-
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
41-
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
4240
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
4341
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
4442
github.com/mattn/go-colorable v0.1.14 h1:9A9LHSqF/7dyVVX6g0U9cwm9pG3kP9gSzcuIPHPsaIE=
@@ -77,10 +75,16 @@ go.opentelemetry.io/otel v1.43.0 h1:mYIM03dnh5zfN7HautFE4ieIig9amkNANT+xcVxAj9I=
7775
go.opentelemetry.io/otel v1.43.0/go.mod h1:JuG+u74mvjvcm8vj8pI5XiHy1zDeoCS2LB1spIq7Ay0=
7876
go.opentelemetry.io/otel/metric v1.43.0 h1:d7638QeInOnuwOONPp4JAOGfbCEpYb+K6DVWvdxGzgM=
7977
go.opentelemetry.io/otel/metric v1.43.0/go.mod h1:RDnPtIxvqlgO8GRW18W6Z/4P462ldprJtfxHxyKd2PY=
78+
go.opentelemetry.io/otel/sdk v1.43.0 h1:pi5mE86i5rTeLXqoF/hhiBtUNcrAGHLKQdhg4h4V9Dg=
79+
go.opentelemetry.io/otel/sdk v1.43.0/go.mod h1:P+IkVU3iWukmiit/Yf9AWvpyRDlUeBaRg6Y+C58QHzg=
80+
go.opentelemetry.io/otel/sdk/metric v1.43.0 h1:S88dyqXjJkuBNLeMcVPRFXpRw2fuwdvfCGLEo89fDkw=
81+
go.opentelemetry.io/otel/sdk/metric v1.43.0/go.mod h1:C/RJtwSEJ5hzTiUz5pXF1kILHStzb9zFlIEe85bhj6A=
8082
go.opentelemetry.io/otel/trace v1.43.0 h1:BkNrHpup+4k4w+ZZ86CZoHHEkohws8AY+WTX09nk+3A=
8183
go.opentelemetry.io/otel/trace v1.43.0/go.mod h1:/QJhyVBUUswCphDVxq+8mld+AvhXZLhe+8WVFxiFff0=
8284
go.uber.org/atomic v1.11.0 h1:ZvwS0R+56ePWxUNi+Atn9dWONBPp/AUETXlHW0DxSjE=
8385
go.uber.org/atomic v1.11.0/go.mod h1:LUxbIzbOniOlMKjJjyPfpl4v+PKK2cNJn91OQbhoJI0=
86+
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto=
87+
go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE=
8488
golang.org/x/crypto v0.50.0 h1:zO47/JPrL6vsNkINmLoo/PH1gcxpls50DNogFvB5ZGI=
8589
golang.org/x/crypto v0.50.0/go.mod h1:3muZ7vA7PBCE6xgPX7nkzzjiUq87kRItoJQM1Yo8S+Q=
8690
golang.org/x/net v0.53.0 h1:d+qAbo5L0orcWAr0a9JweQpjXF19LMXJE8Ey7hwOdUA=

pkg/backend/dist_http_server.go

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import (
44
"context"
55
"crypto/subtle"
66
"crypto/tls"
7+
"log/slog"
78
"net"
89
"net/http"
910
"strconv"
@@ -47,6 +48,12 @@ type distHTTPServer struct {
4748
// use, TLS handshake failure on accept) instead of having them
4849
// silently swallowed.
4950
serveErr atomic.Pointer[error]
51+
// logger is the structured logger inherited from the parent
52+
// DistMemory. Used to surface serve-goroutine errors that previously
53+
// only landed in serveErr (LastServeError accessor) — operators
54+
// running with a configured logger now see them in their log stream
55+
// at the moment of failure, not just on demand.
56+
logger *slog.Logger
5057
}
5158

5259
// DistHTTPAuth configures authentication for the dist HTTP server
@@ -482,8 +489,18 @@ func (s *distHTTPServer) listen(ctx context.Context) error {
482489
if serveErr != nil {
483490
// Stash so operators can read it via LastServeError(); a
484491
// listener that crashed silently is the worst kind of
485-
// production bug.
492+
// production bug. Also surface to the structured logger when
493+
// configured so the failure shows up in the operator's log
494+
// stream at the moment it happens, not just on demand.
486495
s.serveErr.Store(&serveErr)
496+
497+
if s.logger != nil {
498+
s.logger.Error(
499+
"dist HTTP serve goroutine exited",
500+
slog.String("addr", s.addr),
501+
slog.Any("err", serveErr),
502+
)
503+
}
487504
}
488505
}()
489506

0 commit comments

Comments
 (0)