Skip to content

Commit e617645

Browse files
committed
feat(cluster): add cluster-wide key browser (GET /v1/cache/keys)
Introduce a new operator-debug endpoint that fans out across every alive peer, deduplicates replicas, sorts, and returns a paged key set. Pattern matching supports two modes via a classifier in buildKeyMatcher: patterns without glob metacharacters use strings.HasPrefix (prefix mode); patterns containing `*`, `?`, or `[` use path.Match (glob mode). Invalid globs are rejected at construction and surface as 400 BAD_REQUEST. Hard caps bound worst-case memory and response size: - `max` (default 10000, ceiling 50000): deduplicated result set cap - `limit` (default 100, ceiling 500): page size; cursor is offset-based Per-peer fan-out failures are best-effort — failed peer IDs land in `partial_nodes`, consistent with read-repair/hint-replay contracts. Returns 501 when the backend is not DistMemory. Route registered before /v1/cache/:key to prevent Fiber's trie router from shadowing it. Core changes: - pkg/backend/dist_keys.go: ListKeys fan-out via listKeysAccumulator (mutex-guarded dedup map) + localMatchingKeys for self-peer shard scan - DistTransport interface extended: ListKeys(ctx, nodeID, pattern string) - InProcessTransport, DistHTTPTransport, and chaosTransport implementations - /internal/keys extended with optional `q` param (backward compatible) - collectShardKeys accepts a matcher; non-matching keys skip the limit - HyperCache.ClusterKeys added as the public entry point Tests: 12-case unit table for buildKeyMatcher, HTTP smoke tests for paged walk and 400 surfaces, five integration tests covering cluster-wide dedup at RF=3 (50 seeds → 50 keys, not 150), prefix/glob filters, and max-cap truncation. OpenAPI spec and drift-detector updated.
1 parent 0168f93 commit e617645

19 files changed

Lines changed: 1111 additions & 16 deletions

CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,39 @@ All notable changes to HyperCache are recorded here. The format follows
88

99
### Added
1010

11+
- **Cluster-wide key browser (`GET /v1/cache/keys`).** New v1 client-API endpoint that fans out across
12+
every alive peer, dedupes replicas, sorts, and returns a paged slice — designed for the operator-debug
13+
workflow of "browse / refine a search" rather than as a primary data-access path. The `q` parameter
14+
switches between two modes via a small classifier:
15+
patterns containing any of `*`, `?`, `[` go through Go's `path.Match` (platform-agnostic glob —
16+
`filepath.Match`'s OS-specific separator semantics are wrong for arbitrary string keys);
17+
everything else is treated as a literal prefix via `strings.HasPrefix`. Two hard caps bound the
18+
worst case: `max` (default 10000, ceiling 50000) for the full deduplicated result set held in memory,
19+
and `limit` (default 100, ceiling 500) for the page size — `cursor` paging is offset-based against the
20+
sorted set so successive pages are stable across requests. Per-peer fan-out failures are best-effort:
21+
the failed peer ID lands in `partial_nodes` rather than failing the whole call, mirroring the
22+
read-repair and hint-replay contracts elsewhere in the cluster. Returns 501 when the underlying
23+
backend isn't `DistMemory` (this endpoint requires a cluster). The new method
24+
[`(*DistMemory).ListKeys`](pkg/backend/dist_keys.go) drives the fan-out via `errgroup` with a
25+
`listKeysAccumulator` merge struct keyed by a single mutex; the self-peer slice walks local shards
26+
directly (no HTTP self-hop). The `DistTransport` interface grows a new method
27+
`ListKeys(ctx, nodeID, pattern)` with implementations in `InProcessTransport` (direct shard scan),
28+
`DistHTTPTransport` (extends the existing `/internal/keys` path with an optional `q` query param —
29+
backward compatible; cursor semantics unchanged), and `chaosTransport` (pass-through with the same
30+
drop/latency injection hooks as the other verbs). Unit tests in
31+
[`pkg/backend/dist_keys_test.go`](pkg/backend/dist_keys_test.go) pin the
32+
prefix-vs-glob classifier across twelve table cases and the malformed-glob → `path.ErrBadPattern`
33+
surface; HTTP smoke tests in [`cmd/hypercache-server/handlers_test.go`](cmd/hypercache-server/handlers_test.go)
34+
drive seed → paged walk → assert union and no cross-page duplicates, plus 400 surfaces for invalid
35+
cursor and malformed glob; five integration tests in
36+
[`tests/hypercache_distmemory_listkeys_test.go`](tests/hypercache_distmemory_listkeys_test.go)
37+
cover cluster-wide dedup at RF=3 across 5 nodes (50 unique seeds → 50 keys, not 150 = 50 × RF=3),
38+
prefix vs glob filters, and `max`-triggered truncation. Route registration order matters in Fiber's
39+
trie router: `/v1/cache/keys` must come before `/v1/cache/:key`, otherwise the parameterized handler
40+
shadows it with `key="keys"`. OpenAPI spec entry (`ListKeysResponse` schema + operation) added to
41+
[`cmd/hypercache-server/openapi.yaml`](cmd/hypercache-server/openapi.yaml); the drift-detector test
42+
in [`cmd/hypercache-server/openapi_test.go`](cmd/hypercache-server/openapi_test.go) catches future
43+
spec / route mismatches.
1144
- **Async read-repair batching (Phase 4) + unconditional `ForwardSet`-only repair.** Two composing changes
1245
in the same PR that together cut the wire-call cost of read-repair under quorum reads. (1) The defensive
1346
`ForwardGet` probe in `repairRemoteReplica` is gone — every repair is now exactly one `ForwardSet`,

cmd/hypercache-server/handlers_test.go

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ import (
66
"io"
77
"net/http"
88
"net/http/httptest"
9+
"net/url"
10+
"strconv"
911
"strings"
1012
"testing"
1113

@@ -47,6 +49,10 @@ func newTestServer(t *testing.T) *fiber.App {
4749
app := fiber.New()
4850
nodeCtx := &nodeContext{hc: hc, nodeID: "test-node"}
4951

52+
// Match production ordering: literal /v1/cache/keys before the
53+
// parameterized /v1/cache/:key so the router picks handleListKeys
54+
// for the literal path.
55+
app.Get("/v1/cache/keys", func(c fiber.Ctx) error { return handleListKeys(c, nodeCtx) })
5056
app.Get("/v1/cache/:key", func(c fiber.Ctx) error { return handleGet(c, nodeCtx) })
5157
app.Head("/v1/cache/:key", func(c fiber.Ctx) error { return handleHead(c, nodeCtx) })
5258
app.Put("/v1/cache/:key", func(c fiber.Ctx) error { return handlePut(c, nodeCtx) })
@@ -352,3 +358,134 @@ func TestHandleBatchDelete_BasicFlow(t *testing.T) {
352358
t.Fatalf("batch-get post-delete should report found:false; got %s", got.body)
353359
}
354360
}
361+
362+
// seedListKeysFixture seeds `count` keys prefixed `first-NN` plus a
363+
// `second-1` decoy via PUT. Returns the test app so the caller can
364+
// drive the list-keys endpoint against the populated cache. Kept
365+
// here rather than in newTestServer so the test bodies stay
366+
// declarative.
367+
func seedListKeysFixture(t *testing.T, count int) *fiber.App {
368+
t.Helper()
369+
370+
app := newTestServer(t)
371+
372+
for i := range count {
373+
n := strconv.Itoa(i + 1)
374+
key := "first-" + strings.Repeat("0", 2-len(n)) + n
375+
376+
put := doRequest(t, app, http.MethodPut, "/v1/cache/"+key, "v", nil)
377+
if put.status != http.StatusOK {
378+
t.Fatalf("seed put %s: %d", key, put.status)
379+
}
380+
}
381+
382+
put := doRequest(t, app, http.MethodPut, "/v1/cache/second-1", "v", nil)
383+
if put.status != http.StatusOK {
384+
t.Fatalf("seed second put: %d", put.status)
385+
}
386+
387+
return app
388+
}
389+
390+
// fetchListKeysPage drives one /v1/cache/keys request and decodes
391+
// the response, failing the test on transport or parse errors.
392+
// Extracted so the test body can focus on the cursor walk.
393+
func fetchListKeysPage(t *testing.T, app *fiber.App, cursor string) listKeysResponse {
394+
t.Helper()
395+
396+
target := "/v1/cache/keys?q=first-&limit=10"
397+
if cursor != "" {
398+
target += "&cursor=" + cursor
399+
}
400+
401+
got := doRequest(t, app, http.MethodGet, target, "", nil)
402+
if got.status != http.StatusOK {
403+
t.Fatalf("cursor=%q: status %d body=%s", cursor, got.status, got.body)
404+
}
405+
406+
var resp listKeysResponse
407+
408+
err := json.Unmarshal([]byte(got.body), &resp)
409+
if err != nil {
410+
t.Fatalf("cursor=%q decode: %v", cursor, err)
411+
}
412+
413+
return resp
414+
}
415+
416+
// TestHandleListKeys_PrefixAndPaging drives the v1 list-keys
417+
// endpoint end-to-end: seed via PUT, filter by prefix, walk the
418+
// cursor across multiple pages, assert the union matches the seed
419+
// set and no key appears twice.
420+
func TestHandleListKeys_PrefixAndPaging(t *testing.T) {
421+
t.Parallel()
422+
423+
const seedCount = 25
424+
425+
app := seedListKeysFixture(t, seedCount)
426+
427+
collected := make(map[string]struct{}, seedCount)
428+
cursor := ""
429+
430+
for range 10 {
431+
resp := fetchListKeysPage(t, app, cursor)
432+
433+
if resp.TotalMatched != seedCount {
434+
t.Fatalf("total_matched=%d, want %d", resp.TotalMatched, seedCount)
435+
}
436+
437+
for _, k := range resp.Keys {
438+
if !strings.HasPrefix(k, "first-") {
439+
t.Fatalf("non-prefix key in result: %s", k)
440+
}
441+
442+
if _, dup := collected[k]; dup {
443+
t.Fatalf("duplicate key across pages: %s", k)
444+
}
445+
446+
collected[k] = struct{}{}
447+
}
448+
449+
if resp.NextCursor == "" {
450+
break
451+
}
452+
453+
cursor = resp.NextCursor
454+
}
455+
456+
if len(collected) != seedCount {
457+
t.Fatalf("collected %d keys across pages, want %d", len(collected), seedCount)
458+
}
459+
}
460+
461+
// TestHandleListKeys_InvalidCursor pins that a malformed cursor
462+
// surfaces 400, not 500 — the cursor field is operator-controlled
463+
// and must be validated at the boundary.
464+
func TestHandleListKeys_InvalidCursor(t *testing.T) {
465+
t.Parallel()
466+
467+
app := newTestServer(t)
468+
469+
got := doRequest(t, app, http.MethodGet, "/v1/cache/keys?cursor=not-a-number", "", nil)
470+
if got.status != http.StatusBadRequest {
471+
t.Fatalf("expected 400 for malformed cursor, got %d body=%s", got.status, got.body)
472+
}
473+
}
474+
475+
// TestHandleListKeys_InvalidGlob surfaces malformed glob patterns
476+
// as 400, matching the same validate-at-boundary contract as
477+
// cursor.
478+
func TestHandleListKeys_InvalidGlob(t *testing.T) {
479+
t.Parallel()
480+
481+
app := newTestServer(t)
482+
483+
// "[unclosed" is a malformed character class — URL-encoded so
484+
// the literal `[` is preserved through fiber's query parser.
485+
target := "/v1/cache/keys?q=" + url.QueryEscape("[unclosed")
486+
487+
got := doRequest(t, app, http.MethodGet, target, "", nil)
488+
if got.status != http.StatusBadRequest {
489+
t.Fatalf("expected 400 for malformed glob, got %d body=%s", got.status, got.body)
490+
}
491+
}

cmd/hypercache-server/main.go

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -459,6 +459,12 @@ func registerClientRoutes(app *fiber.App, policy httpauth.Policy, nodeCtx *nodeC
459459
return c.Send(openapiSpec)
460460
})
461461

462+
// /v1/cache/keys must be registered BEFORE the parameterized
463+
// /v1/cache/:key — Fiber matches in registration order and the
464+
// literal-path route would otherwise be shadowed by the
465+
// param-bound handler (handleGet would be invoked with
466+
// `key="keys"` and return 404).
467+
app.Get("/v1/cache/keys", read, func(c fiber.Ctx) error { return handleListKeys(c, nodeCtx) })
462468
app.Put("/v1/cache/:key", write, func(c fiber.Ctx) error { return handlePut(c, nodeCtx) })
463469
app.Get("/v1/cache/:key", read, func(c fiber.Ctx) error { return handleGet(c, nodeCtx) })
464470
app.Head("/v1/cache/:key", read, func(c fiber.Ctx) error { return handleHead(c, nodeCtx) })
@@ -646,6 +652,33 @@ type ownersResponse struct {
646652
Node string `json:"node"`
647653
}
648654

655+
// listKeysResponse is the body of GET /v1/cache/keys — operator-
656+
// facing key browser. `NextCursor` is empty on the last page;
657+
// `TotalMatched` is the full deduplicated matched set (capped by
658+
// `max`). `Truncated` reports that the cluster-wide cap was hit
659+
// and the operator should refine the pattern. `PartialNodes`
660+
// lists peers whose fan-out failed; their keys may be missing.
661+
type listKeysResponse struct {
662+
Keys []string `json:"keys"`
663+
NextCursor string `json:"next_cursor"`
664+
TotalMatched int `json:"total_matched"`
665+
Truncated bool `json:"truncated"`
666+
Node string `json:"node"`
667+
PartialNodes []string `json:"partial_nodes,omitempty"`
668+
}
669+
670+
// list-keys query-parameter bounds. Defaults match the operator
671+
// "browse / refine" workflow; the hard caps bound the worst-case
672+
// memory and response size — operators needing a larger sweep
673+
// script against the per-node /internal/keys path with their own
674+
// paging instead of lifting these.
675+
const (
676+
listKeysDefaultLimit = 100
677+
listKeysMaxLimit = 500
678+
listKeysDefaultMax = 10000
679+
listKeysHardMax = 50000
680+
)
681+
649682
// handlePut implements PUT /v1/cache/:key.
650683
// Body is the raw value (any content type). Optional ?ttl=<dur>
651684
// applies a relative expiration; empty/absent means no expiration.
@@ -1289,6 +1322,126 @@ func handleOwners(c fiber.Ctx, nodeCtx *nodeContext) error {
12891322
})
12901323
}
12911324

1325+
// listKeysParams is the parsed-and-validated form of the
1326+
// /v1/cache/keys query string. Returned as a struct so
1327+
// parseListKeysQuery stays under the function-result-limit and
1328+
// the call site reads fields by name rather than position.
1329+
type listKeysParams struct {
1330+
Pattern string
1331+
Cursor int
1332+
Limit int
1333+
MaxResults int
1334+
}
1335+
1336+
// parseBoundedPositiveInt reads a query parameter as a positive int
1337+
// with a default fallback and a hard ceiling. Empty value → default.
1338+
// Out-of-range or non-numeric → caller-visible error (must surface
1339+
// as 400 BAD_REQUEST).
1340+
func parseBoundedPositiveInt(c fiber.Ctx, name string, def, hardMax int) (int, error) {
1341+
v := c.Query(name)
1342+
if v == "" {
1343+
return def, nil
1344+
}
1345+
1346+
n, err := strconv.Atoi(v)
1347+
if err != nil || n <= 0 {
1348+
return 0, ewrap.New("invalid " + name + ": must be a positive integer")
1349+
}
1350+
1351+
if n > hardMax {
1352+
n = hardMax
1353+
}
1354+
1355+
return n, nil
1356+
}
1357+
1358+
// parseListKeysQuery extracts and validates the query parameters
1359+
// for GET /v1/cache/keys. Defaults and hard caps are applied here
1360+
// so handleListKeys keeps a single response-shape concern.
1361+
func parseListKeysQuery(c fiber.Ctx) (listKeysParams, error) {
1362+
out := listKeysParams{Pattern: c.Query("q")}
1363+
1364+
if cursorStr := c.Query("cursor"); cursorStr != "" {
1365+
n, err := strconv.Atoi(cursorStr)
1366+
if err != nil || n < 0 {
1367+
return listKeysParams{}, ewrap.New("invalid cursor: must be a non-negative integer")
1368+
}
1369+
1370+
out.Cursor = n
1371+
}
1372+
1373+
limit, err := parseBoundedPositiveInt(c, "limit", listKeysDefaultLimit, listKeysMaxLimit)
1374+
if err != nil {
1375+
return listKeysParams{}, err
1376+
}
1377+
1378+
out.Limit = limit
1379+
1380+
maxResults, err := parseBoundedPositiveInt(c, "max", listKeysDefaultMax, listKeysHardMax)
1381+
if err != nil {
1382+
return listKeysParams{}, err
1383+
}
1384+
1385+
out.MaxResults = maxResults
1386+
1387+
return out, nil
1388+
}
1389+
1390+
// handleListKeys implements GET /v1/cache/keys — operator-facing
1391+
// cluster-wide key browser. Fans out across every alive peer,
1392+
// merges + dedupes + sorts the result, then slices the page via
1393+
// cursor/limit. The full deduplicated set is held in memory for
1394+
// one request (bounded by `max`); paging re-fans out — fine for
1395+
// the operator-debug workflow this endpoint serves.
1396+
//
1397+
// Returns 501 when the underlying backend isn't a DistMemory
1398+
// (in-memory / Redis): the surface only makes sense in cluster
1399+
// mode and surfacing that explicitly is friendlier than a
1400+
// silently empty page.
1401+
func handleListKeys(c fiber.Ctx, nodeCtx *nodeContext) error {
1402+
params, err := parseListKeysQuery(c)
1403+
if err != nil {
1404+
return jsonErr(c, fiber.StatusBadRequest, codeBadRequest, err.Error())
1405+
}
1406+
1407+
res, err := nodeCtx.hc.ClusterKeys(c.Context(), params.Pattern, params.MaxResults)
1408+
if err != nil {
1409+
return jsonErr(c, fiber.StatusBadRequest, codeBadRequest, err.Error())
1410+
}
1411+
1412+
if res == nil {
1413+
return jsonErr(
1414+
c,
1415+
fiber.StatusNotImplemented,
1416+
codeInternal,
1417+
"list-keys requires a distributed backend",
1418+
)
1419+
}
1420+
1421+
total := len(res.Keys)
1422+
1423+
// Cursor past the end is a valid terminal state (last page +
1424+
// 1): respond with an empty page rather than 400. Mirrors how
1425+
// SQL OFFSET past the row count returns an empty result set.
1426+
start := min(params.Cursor, total)
1427+
end := min(start+params.Limit, total)
1428+
page := res.Keys[start:end]
1429+
1430+
nextCursor := ""
1431+
if end < total {
1432+
nextCursor = strconv.Itoa(end)
1433+
}
1434+
1435+
return c.JSON(listKeysResponse{
1436+
Keys: page,
1437+
NextCursor: nextCursor,
1438+
TotalMatched: total,
1439+
Truncated: res.Truncated,
1440+
Node: nodeCtx.nodeID,
1441+
PartialNodes: res.PartialNodes,
1442+
})
1443+
}
1444+
12921445
// meResponse is the body of GET /v1/me — the resolved caller identity
12931446
// after auth middleware ran. Mirrors httpauth.Identity but written as
12941447
// a wire type so the JSON tags are owned by the API surface, not the

0 commit comments

Comments
 (0)