@@ -8,27 +8,95 @@ All notable changes to HyperCache are recorded here. The format follows
88
99### Added
1010
11- - ** Migration-source observability for the hint queue.** Hints produced by rebalance migrations are now
12- tagged at queue time and tracked in a dedicated set of counters alongside the existing aggregate
13- metrics. Five new OTel metrics: ` dist.migration.queued ` , ` dist.migration.replayed ` ,
14- ` dist.migration.expired ` , ` dist.migration.dropped ` , and ` dist.migration.last_age_ns ` (queue residency of
15- the most-recently-replayed migration hint — direct signal of new-primary reachability during rolling
16- deploys). Existing ` dist.hinted.* ` counters keep their meaning as the aggregate across both sources, so
17- operators can derive replication-only as ` aggregate - migration ` . Implementation reuses the proven hint
18- queue infrastructure (TTL, caps, replay, drop logic) — no second queue, no second drain loop.
19- Tests in [ ` pkg/backend/dist_migration_hint_test.go ` ] ( pkg/backend/dist_migration_hint_test.go ) cover
20- source-tag preservation through queue→replay, per-source counter increments on every terminal path
21- (replay success, expired, transport drop, global-cap drop), and the not-found keep-in-queue path.
11+ - ** Batch operations on the client SDK.** ` BatchSet ` , ` BatchGet ` , ` BatchDelete ` close the v1 SDK gap PR3's
12+ stopping conditions called out — the raw OIDC example demonstrated batch round-trips but the SDK had no
13+ equivalent. Each method takes a slice and returns per-item results so a single HTTP call can carry
14+ mixed-outcome batches (some stored, some draining) without forcing the caller to either fail-the-whole-batch
15+ or parse the wire envelope by hand. Per-item ` Err ` is the standard ` *StatusError ` , so
16+ ` errors.Is(result.Err, client.ErrDraining) ` works inside per-item handling the same way it does for
17+ single-key calls. Empty input short-circuits to an empty result slice without dispatching an HTTP request.
18+ Eight new test cases in [ ` pkg/client/batch_test.go ` ] ( pkg/client/batch_test.go ) cover the happy path for each
19+ verb, per-item failures, mixed found/missing in ` BatchGet ` , empty-input no-op, and the HTTP-level
20+ failure-wraps-` ErrAllEndpointsFailed ` regression guard. The OIDC example
21+ ([ ` __examples/distributed-oidc-client/main.go ` ] ( __examples/distributed-oidc-client/main.go ) ) gains a final
22+ ` BatchSet ` step demonstrating the surface, and [ ` docs/client-sdk.md ` ] ( docs/client-sdk.md ) grows a dedicated
23+ "Batch operations" section explaining the per-item granularity contract.
24+ - ** Client SDK reference + example migration.** New [ ` docs/client-sdk.md ` ] ( docs/client-sdk.md ) is the
25+ recommended starting point for Go consumers — covers every auth mode (bearer / Basic / OIDC client
26+ credentials / custom mTLS via ` WithHTTPClient ` ), the multi-endpoint failover policy, topology refresh
27+ semantics with the 1s floor and seed fallback, the full sentinel + ` *StatusError ` recipe set, and the
28+ production caveats (connection pooling, retry policy, OTel propagation, OIDC refresh visibility). The
29+ existing hand-rolled HTTP demo at ` __examples/distributed-oidc-client/ ` was renamed to
30+ [ ` __examples/distributed-oidc-client-raw/ ` ] ( __examples/distributed-oidc-client-raw/ ) — kept in-tree as the
31+ "what the SDK does under the hood" reference and for non-Go consumers reading along — while
32+ [ ` __examples/distributed-oidc-client/ ` ] ( __examples/distributed-oidc-client/ ) is now the ~ 150-line SDK
33+ consumer that collapses the prior 480 lines down by ~ 70%. Top-level
34+ [ ` __examples/README.md ` ] ( __examples/README.md ) lists both with the SDK version flagged as recommended. The
35+ SDK page is registered under Reference in [ ` mkdocs.yml ` ] ( mkdocs.yml ) alongside the API reference and
36+ changelog.
37+ - ** ` pkg/client ` — Go SDK for hypercache-server clusters.** Closes the three operational gaps the OIDC-client
38+ example surfaced: - ** Multi-endpoint HA without an external LB.** ` client.New([]string{...}, opts...) `
39+ accepts a slice of seed endpoints. Each request picks one at random; on transport failure / 5xx / 503
40+ (draining) the client walks to the next. 4xx (auth, scope, not-found, bad-request) are deterministic and do
41+ NOT trigger failover. See [ RFC 0003] ( docs/rfcs/0003-client-sdk-and-redis-style-affordances.md ) for the
42+ failover policy rationale (F2 random with crypto-seeded math/rand). - ** Optional topology refresh.**
43+ ` WithTopologyRefresh(interval) ` enables a background loop that pulls ` /cluster/members ` and updates the
44+ in-memory endpoint view, so nodes added or removed after deploy become visible without redeploying
45+ consumers. The original seeds remain as a permanent fallback when the live view ever empties. - ** Four auth
46+ modes coexisting in one API.** ` WithBearerAuth ` , ` WithBasicAuth ` , ` WithOIDCClientCredentials ` (full OAuth2
47+ client-credentials flow with auto-refresh), and ` WithHTTPClient ` (bring your own mTLS-configured client).
48+ Mutually exclusive: the last applied wins. - ** Stable, typed error surface.** Sentinels (` ErrNotFound ` ,
49+ ` ErrUnauthorized ` , ` ErrForbidden ` , ` ErrDraining ` , ` ErrBadRequest ` , ` ErrInternal ` , ` ErrAllEndpointsFailed ` ,
50+ ` ErrNoEndpoints ` ) compose with ` errors.Is ` . ` *StatusError ` carries the cache's canonical
51+ ` { code, error, details } ` envelope for callers that need finer discrimination via ` errors.As ` . - ** Typed
52+ command surface.** ` Set ` , ` Get ` (raw bytes), ` GetItem ` (full envelope with version/owners), ` Delete ` ,
53+ ` Identity ` (the ` /v1/me ` canary including the new capabilities field), ` Endpoints ` (the current view),
54+ ` RefreshTopology ` (manual refresh for tests/operators), ` Close ` . - ** Full test coverage** in
55+ [ ` pkg/client/client_test.go ` ] ( pkg/client/client_test.go ) : happy-path round-trip, JSON-envelope decode, every
56+ auth mode against httptest stubs, 5xx failover, 4xx no-failover (regression guard), exhaustive-failure
57+ wrapping, every sentinel's ` errors.Is ` mapping, topology refresh, partition-survives-empty-refresh failsafe,
58+ and constructor input validation.
59+ - ** HTTP Basic auth as a first-class credential class (Redis-style ` AUTH user pass ` ).** New top-level ` users: `
60+ block in ` HYPERCACHE_AUTH_CONFIG ` accepts bcrypt-hashed passwords. Each user resolves to the same
61+ ` Identity{ID, Scopes} ` shape as every other auth mode, so all four mechanisms (static bearer → Basic → mTLS
62+ → OIDC) coexist in one cluster with consistent downstream behavior. Fail-closed posture: Basic over
63+ plaintext is refused by default; operators opt into dev-only plaintext via ` allow_basic_without_tls: true ` .
64+ Implementation in [ ` pkg/httpauth/policy.go ` ] ( pkg/httpauth/policy.go ) with bcrypt verification via
65+ ` golang.org/x/crypto/bcrypt ` . Threat note: bcrypt-per-request is CPU-bound; rate-limiting is left to a
66+ fronting LB (see [ RFC 0003] ( docs/rfcs/0003-client-sdk-and-redis-style-affordances.md ) open question 3).
67+ - ** ` /v1/me ` now returns a ` capabilities ` field.** Stable capability strings derived 1:1 from scopes (` read ` →
68+ ` cache.read ` , etc.). Clients should prefer ` capabilities ` over ` scopes ` for forward-compatibility: if a
69+ scope is later split into multiple capabilities, scope-keyed clients break but capability-keyed clients keep
70+ working. OpenAPI spec ([ ` cmd/hypercache-server/openapi.yaml ` ] ( cmd/hypercache-server/openapi.yaml ) ) updated
71+ to reflect the new required field; the binary's embedded spec is the contract.
72+ - ** Tests pinning the new auth contract.** [ ` pkg/httpauth/policy_test.go ` ] ( pkg/httpauth/policy_test.go ) covers
73+ Basic resolves on correct credentials, rejects on wrong passwords/users/malformed headers, refuses plaintext
74+ by default, and documents the bearer-wins-over-Basic chain order via a Locals-introspection test.
75+ [ ` pkg/httpauth/loader_test.go ` ] ( pkg/httpauth/loader_test.go ) covers the YAML round-trip plus the
76+ fail-loud-at-boot guards for malformed bcrypt hashes and empty usernames.
77+ - ** Operator runbook updates.** [ ` docs/oncall.md ` ] ( docs/oncall.md ) Auth failures section gains a Basic-auth
78+ debugging row covering the ` curl -u user:pass /v1/me ` canary and the plaintext-refused failure mode.
79+ - ** Migration-source observability for the hint queue.** Hints produced by rebalance migrations are now tagged
80+ at queue time and tracked in a dedicated set of counters alongside the existing aggregate metrics. Five new
81+ OTel metrics: ` dist.migration.queued ` , ` dist.migration.replayed ` , ` dist.migration.expired ` ,
82+ ` dist.migration.dropped ` , and ` dist.migration.last_age_ns ` (queue residency of the most-recently-replayed
83+ migration hint — direct signal of new-primary reachability during rolling deploys). Existing ` dist.hinted.* `
84+ counters keep their meaning as the aggregate across both sources, so operators can derive replication-only
85+ as ` aggregate - migration ` . Implementation reuses the proven hint queue infrastructure (TTL, caps, replay,
86+ drop logic) — no second queue, no second drain loop. Tests in
87+ [ ` pkg/backend/dist_migration_hint_test.go ` ] ( pkg/backend/dist_migration_hint_test.go ) cover source-tag
88+ preservation through queue→replay, per-source counter increments on every terminal path (replay success,
89+ expired, transport drop, global-cap drop), and the not-found keep-in-queue path.
2290- ** Adaptive Merkle anti-entropy scheduling.** New
2391 [ ` backend.WithDistMerkleAdaptiveBackoff(maxFactor) ` ] ( pkg/backend/dist_memory.go ) option lets the auto-sync
2492 loop double its sleep interval after each tick that finds zero divergence across every peer, capped at
2593 ` maxFactor ` . Any tick with at least one dirty peer snaps the factor back to 1× immediately — recovery is
26- never lazy. Disabled by default (factor=0 or 1) so existing deployments see no behavior change. Two new
27- OTel metrics expose the state: ` dist.auto_sync.backoff_factor ` (gauge) and ` dist.auto_sync.clean_ticks `
94+ never lazy. Disabled by default (factor=0 or 1) so existing deployments see no behavior change. Two new OTel
95+ metrics expose the state: ` dist.auto_sync.backoff_factor ` (gauge) and ` dist.auto_sync.clean_ticks `
2896 (counter). Each factor change is logged once at Info (` merkle auto-sync backoff factor changed ` ) — no
2997 per-tick log spam. Unit tests in
30- [ ` pkg/backend/dist_adaptive_backoff_test.go ` ] ( pkg/backend/dist_adaptive_backoff_test.go ) cover the ramp,
31- the cap, the dirty-tick reset, and the disabled-by-default back-compat invariant.
98+ [ ` pkg/backend/dist_adaptive_backoff_test.go ` ] ( pkg/backend/dist_adaptive_backoff_test.go ) cover the ramp, the
99+ cap, the dirty-tick reset, and the disabled-by-default back-compat invariant.
32100- ** Structured logging for background loops and cluster lifecycle.** HyperCache gained a
33101 ` WithLogger(*slog.Logger) ` option ([ config.go] ( config.go ) ) that wires a structured logger through the
34102 wrapper. Previously the eviction loop, expiration loop, and HyperCache lifecycle ran fully silent —
0 commit comments