Skip to content

Commit 24b3584

Browse files
authored
feat(client): add Go SDK for hypercache-server clusters (#124)
2 parents c8406d0 + e7b4bd4 commit 24b3584

42 files changed

Lines changed: 5511 additions & 120 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/release.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ name: release
1313

1414
on:
1515
push:
16-
tags: ['v*.*.*']
16+
tags: ["v*.*.*"]
1717
workflow_dispatch:
1818
inputs:
1919
tag:
20-
description: 'Existing tag to (re)create a release for'
20+
description: "Existing tag to (re)create a release for"
2121
required: true
2222

2323
permissions:
@@ -70,7 +70,7 @@ jobs:
7070
echo "path=/tmp/release-body.md" >> "$GITHUB_OUTPUT"
7171
7272
- name: Create GitHub Release
73-
uses: softprops/action-gh-release@v2
73+
uses: softprops/action-gh-release@v3
7474
with:
7575
tag_name: ${{ steps.tag.outputs.name }}
7676
name: ${{ steps.tag.outputs.name }}

CHANGELOG.md

Lines changed: 83 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,27 +8,95 @@ All notable changes to HyperCache are recorded here. The format follows
88

99
### Added
1010

11-
- **Migration-source observability for the hint queue.** Hints produced by rebalance migrations are now
12-
tagged at queue time and tracked in a dedicated set of counters alongside the existing aggregate
13-
metrics. Five new OTel metrics: `dist.migration.queued`, `dist.migration.replayed`,
14-
`dist.migration.expired`, `dist.migration.dropped`, and `dist.migration.last_age_ns` (queue residency of
15-
the most-recently-replayed migration hint — direct signal of new-primary reachability during rolling
16-
deploys). Existing `dist.hinted.*` counters keep their meaning as the aggregate across both sources, so
17-
operators can derive replication-only as `aggregate - migration`. Implementation reuses the proven hint
18-
queue infrastructure (TTL, caps, replay, drop logic) — no second queue, no second drain loop.
19-
Tests in [`pkg/backend/dist_migration_hint_test.go`](pkg/backend/dist_migration_hint_test.go) cover
20-
source-tag preservation through queue→replay, per-source counter increments on every terminal path
21-
(replay success, expired, transport drop, global-cap drop), and the not-found keep-in-queue path.
11+
- **Batch operations on the client SDK.** `BatchSet`, `BatchGet`, `BatchDelete` close the v1 SDK gap PR3's
12+
stopping conditions called out — the raw OIDC example demonstrated batch round-trips but the SDK had no
13+
equivalent. Each method takes a slice and returns per-item results so a single HTTP call can carry
14+
mixed-outcome batches (some stored, some draining) without forcing the caller to either fail-the-whole-batch
15+
or parse the wire envelope by hand. Per-item `Err` is the standard `*StatusError`, so
16+
`errors.Is(result.Err, client.ErrDraining)` works inside per-item handling the same way it does for
17+
single-key calls. Empty input short-circuits to an empty result slice without dispatching an HTTP request.
18+
Eight new test cases in [`pkg/client/batch_test.go`](pkg/client/batch_test.go) cover the happy path for each
19+
verb, per-item failures, mixed found/missing in `BatchGet`, empty-input no-op, and the HTTP-level
20+
failure-wraps-`ErrAllEndpointsFailed` regression guard. The OIDC example
21+
([`__examples/distributed-oidc-client/main.go`](__examples/distributed-oidc-client/main.go)) gains a final
22+
`BatchSet` step demonstrating the surface, and [`docs/client-sdk.md`](docs/client-sdk.md) grows a dedicated
23+
"Batch operations" section explaining the per-item granularity contract.
24+
- **Client SDK reference + example migration.** New [`docs/client-sdk.md`](docs/client-sdk.md) is the
25+
recommended starting point for Go consumers — covers every auth mode (bearer / Basic / OIDC client
26+
credentials / custom mTLS via `WithHTTPClient`), the multi-endpoint failover policy, topology refresh
27+
semantics with the 1s floor and seed fallback, the full sentinel + `*StatusError` recipe set, and the
28+
production caveats (connection pooling, retry policy, OTel propagation, OIDC refresh visibility). The
29+
existing hand-rolled HTTP demo at `__examples/distributed-oidc-client/` was renamed to
30+
[`__examples/distributed-oidc-client-raw/`](__examples/distributed-oidc-client-raw/) — kept in-tree as the
31+
"what the SDK does under the hood" reference and for non-Go consumers reading along — while
32+
[`__examples/distributed-oidc-client/`](__examples/distributed-oidc-client/) is now the ~150-line SDK
33+
consumer that collapses the prior 480 lines down by ~70%. Top-level
34+
[`__examples/README.md`](__examples/README.md) lists both with the SDK version flagged as recommended. The
35+
SDK page is registered under Reference in [`mkdocs.yml`](mkdocs.yml) alongside the API reference and
36+
changelog.
37+
- **`pkg/client` — Go SDK for hypercache-server clusters.** Closes the three operational gaps the OIDC-client
38+
example surfaced: - **Multi-endpoint HA without an external LB.** `client.New([]string{...}, opts...)`
39+
accepts a slice of seed endpoints. Each request picks one at random; on transport failure / 5xx / 503
40+
(draining) the client walks to the next. 4xx (auth, scope, not-found, bad-request) are deterministic and do
41+
NOT trigger failover. See [RFC 0003](docs/rfcs/0003-client-sdk-and-redis-style-affordances.md) for the
42+
failover policy rationale (F2 random with crypto-seeded math/rand). - **Optional topology refresh.**
43+
`WithTopologyRefresh(interval)` enables a background loop that pulls `/cluster/members` and updates the
44+
in-memory endpoint view, so nodes added or removed after deploy become visible without redeploying
45+
consumers. The original seeds remain as a permanent fallback when the live view ever empties. - **Four auth
46+
modes coexisting in one API.** `WithBearerAuth`, `WithBasicAuth`, `WithOIDCClientCredentials` (full OAuth2
47+
client-credentials flow with auto-refresh), and `WithHTTPClient` (bring your own mTLS-configured client).
48+
Mutually exclusive: the last applied wins. - **Stable, typed error surface.** Sentinels (`ErrNotFound`,
49+
`ErrUnauthorized`, `ErrForbidden`, `ErrDraining`, `ErrBadRequest`, `ErrInternal`, `ErrAllEndpointsFailed`,
50+
`ErrNoEndpoints`) compose with `errors.Is`. `*StatusError` carries the cache's canonical
51+
`{ code, error, details }` envelope for callers that need finer discrimination via `errors.As`. - **Typed
52+
command surface.** `Set`, `Get` (raw bytes), `GetItem` (full envelope with version/owners), `Delete`,
53+
`Identity` (the `/v1/me` canary including the new capabilities field), `Endpoints` (the current view),
54+
`RefreshTopology` (manual refresh for tests/operators), `Close`. - **Full test coverage** in
55+
[`pkg/client/client_test.go`](pkg/client/client_test.go): happy-path round-trip, JSON-envelope decode, every
56+
auth mode against httptest stubs, 5xx failover, 4xx no-failover (regression guard), exhaustive-failure
57+
wrapping, every sentinel's `errors.Is` mapping, topology refresh, partition-survives-empty-refresh failsafe,
58+
and constructor input validation.
59+
- **HTTP Basic auth as a first-class credential class (Redis-style `AUTH user pass`).** New top-level `users:`
60+
block in `HYPERCACHE_AUTH_CONFIG` accepts bcrypt-hashed passwords. Each user resolves to the same
61+
`Identity{ID, Scopes}` shape as every other auth mode, so all four mechanisms (static bearer → Basic → mTLS
62+
→ OIDC) coexist in one cluster with consistent downstream behavior. Fail-closed posture: Basic over
63+
plaintext is refused by default; operators opt into dev-only plaintext via `allow_basic_without_tls: true`.
64+
Implementation in [`pkg/httpauth/policy.go`](pkg/httpauth/policy.go) with bcrypt verification via
65+
`golang.org/x/crypto/bcrypt`. Threat note: bcrypt-per-request is CPU-bound; rate-limiting is left to a
66+
fronting LB (see [RFC 0003](docs/rfcs/0003-client-sdk-and-redis-style-affordances.md) open question 3).
67+
- **`/v1/me` now returns a `capabilities` field.** Stable capability strings derived 1:1 from scopes (`read`
68+
`cache.read`, etc.). Clients should prefer `capabilities` over `scopes` for forward-compatibility: if a
69+
scope is later split into multiple capabilities, scope-keyed clients break but capability-keyed clients keep
70+
working. OpenAPI spec ([`cmd/hypercache-server/openapi.yaml`](cmd/hypercache-server/openapi.yaml)) updated
71+
to reflect the new required field; the binary's embedded spec is the contract.
72+
- **Tests pinning the new auth contract.** [`pkg/httpauth/policy_test.go`](pkg/httpauth/policy_test.go) covers
73+
Basic resolves on correct credentials, rejects on wrong passwords/users/malformed headers, refuses plaintext
74+
by default, and documents the bearer-wins-over-Basic chain order via a Locals-introspection test.
75+
[`pkg/httpauth/loader_test.go`](pkg/httpauth/loader_test.go) covers the YAML round-trip plus the
76+
fail-loud-at-boot guards for malformed bcrypt hashes and empty usernames.
77+
- **Operator runbook updates.** [`docs/oncall.md`](docs/oncall.md) Auth failures section gains a Basic-auth
78+
debugging row covering the `curl -u user:pass /v1/me` canary and the plaintext-refused failure mode.
79+
- **Migration-source observability for the hint queue.** Hints produced by rebalance migrations are now tagged
80+
at queue time and tracked in a dedicated set of counters alongside the existing aggregate metrics. Five new
81+
OTel metrics: `dist.migration.queued`, `dist.migration.replayed`, `dist.migration.expired`,
82+
`dist.migration.dropped`, and `dist.migration.last_age_ns` (queue residency of the most-recently-replayed
83+
migration hint — direct signal of new-primary reachability during rolling deploys). Existing `dist.hinted.*`
84+
counters keep their meaning as the aggregate across both sources, so operators can derive replication-only
85+
as `aggregate - migration`. Implementation reuses the proven hint queue infrastructure (TTL, caps, replay,
86+
drop logic) — no second queue, no second drain loop. Tests in
87+
[`pkg/backend/dist_migration_hint_test.go`](pkg/backend/dist_migration_hint_test.go) cover source-tag
88+
preservation through queue→replay, per-source counter increments on every terminal path (replay success,
89+
expired, transport drop, global-cap drop), and the not-found keep-in-queue path.
2290
- **Adaptive Merkle anti-entropy scheduling.** New
2391
[`backend.WithDistMerkleAdaptiveBackoff(maxFactor)`](pkg/backend/dist_memory.go) option lets the auto-sync
2492
loop double its sleep interval after each tick that finds zero divergence across every peer, capped at
2593
`maxFactor`. Any tick with at least one dirty peer snaps the factor back to 1× immediately — recovery is
26-
never lazy. Disabled by default (factor=0 or 1) so existing deployments see no behavior change. Two new
27-
OTel metrics expose the state: `dist.auto_sync.backoff_factor` (gauge) and `dist.auto_sync.clean_ticks`
94+
never lazy. Disabled by default (factor=0 or 1) so existing deployments see no behavior change. Two new OTel
95+
metrics expose the state: `dist.auto_sync.backoff_factor` (gauge) and `dist.auto_sync.clean_ticks`
2896
(counter). Each factor change is logged once at Info (`merkle auto-sync backoff factor changed`) — no
2997
per-tick log spam. Unit tests in
30-
[`pkg/backend/dist_adaptive_backoff_test.go`](pkg/backend/dist_adaptive_backoff_test.go) cover the ramp,
31-
the cap, the dirty-tick reset, and the disabled-by-default back-compat invariant.
98+
[`pkg/backend/dist_adaptive_backoff_test.go`](pkg/backend/dist_adaptive_backoff_test.go) cover the ramp, the
99+
cap, the dirty-tick reset, and the disabled-by-default back-compat invariant.
32100
- **Structured logging for background loops and cluster lifecycle.** HyperCache gained a
33101
`WithLogger(*slog.Logger)` option ([config.go](config.go)) that wires a structured logger through the
34102
wrapper. Previously the eviction loop, expiration loop, and HyperCache lifecycle ran fully silent —

__examples/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,7 @@ All the code in this directory is for demonstration purposes only.
2323
1. [`Size`](./size/size.go) - An example of using the HyperCache package to store a list of items and limit the cache based on size.
2424

2525
1. [`Observability (OpenTelemetry)`](./observability/otel.go) - Demonstrates wrapping the service with tracing and metrics middleware using OpenTelemetry.
26+
27+
1. [`Distributed OIDC client (SDK)`](./distributed-oidc-client/) - **Recommended**: ~150-line consumer using [`pkg/client`](../pkg/client/) for OIDC client-credentials auth, multi-endpoint failover, topology refresh, and typed errors. The path most Go integrators should follow. See [`docs/client-sdk.md`](../docs/client-sdk.md) for the full SDK reference.
28+
29+
1. [`Distributed OIDC client (raw HTTP)`](./distributed-oidc-client-raw/) - The hand-crafted version of the above against `net/http` — kept in the tree as a reference for what the SDK does internally and for environments that can't depend on `pkg/client` (non-Go consumers reading along, code-review reference, etc.).

0 commit comments

Comments
 (0)