Go: Add LATENCY HISTORY / LATENCY LATEST / LATENCY RESET commands#5958
Go: Add LATENCY HISTORY / LATENCY LATEST / LATENCY RESET commands#5958jagathprabhu wants to merge 1 commit into
Conversation
Adds the three Valkey latency-monitoring commands to the Go client surface
(both standalone Client and ClusterClient) and exposes them through the
ServerManagementCommands and ServerManagementClusterCommands interfaces.
- Standalone (glide_client.go): LatencyHistory, LatencyLatest, LatencyReset,
LatencyResetWithEvents. The standalone handlers transparently merge the
per-node Map response that the core returns when the deployment has
replicas (LATENCY HISTORY/LATEST use AllNodes routing + Special policy in
glide-core), so users keep getting a flat slice.
- Cluster (glide_cluster_client.go): the same four methods plus *WithOptions
variants that accept options.RouteOption. Read commands return
models.ClusterValue so callers can distinguish single-node and per-node
responses; LATENCY RESET totals are summed across nodes by the core
(Aggregate(Sum)). When opts.Route is nil the *WithOptions variants
delegate to the no-options method so behavior matches.
- Models (models/response_types.go): new LatencyHistoryEntry and
LatencyLatestEntry types. Valkey 8.1+ sum/count fields are intentionally
not surfaced to keep the public surface stable across server versions;
consumers needing them can fall back to CustomCommand.
- Response handlers (response_handlers.go): typed parsers
(parseLatencyHistoryEntries / parseLatencyLatestEntries) plus merge
helpers (mergeLatencyHistoryEntries / mergeLatencyLatestEntries) used by
the standalone client. Merge order is deterministic: nodes are processed
in sorted-address order, history is stable-sorted by ascending timestamp,
and latest entries dedupe by event name (most-recent timestamp wins,
ties break on larger LatestMs, MaxMs is the cross-node maximum).
- Tests:
* 26 unit tests in go/latency_commands_test.go covering the typed
parsers (happy path, empty/nil, wrong types, short tuples, 8.1+
trailing fields) and the merge helpers (cross-node max,
timestamp-tie break, sorted-by-event-name, error propagation).
* 12 integration tests in go/integTest/latency_commands_test.go for
standalone and cluster (including *WithOptions, RandomRoute,
nil-route fallback, and context cancellation). triggerLatencySpike*
helpers temporarily lower latency-monitor-threshold and use
t.Cleanup so failures cannot leak server state.
* Runnable Example* functions in go/latency_commands_examples_test.go
for every new public API on both clients.
- CHANGELOG.md: Pending 2.4 entry under valkey-io#5957.
Closes valkey-io#5957 (LATENCY commands subset)
Signed-off-by: jagath prabhu <jagath94@gmail.com>
93a26f4 to
38d975e
Compare
| events []string, | ||
| opts options.RouteOption, | ||
| ) (int64, error) { | ||
| response, err := client.executeCommandWithRoute(ctx, C.LatencyReset, events, opts.Route) |
There was a problem hiding this comment.
go cluster commands with route option has nil guard. Check other commands with route and follow the same pattern
| // The number of event time series that were reset (summed across nodes). | ||
| // | ||
| // [valkey.io]: https://valkey.io/commands/latency-reset/ | ||
| func (client *ClusterClient) LatencyResetWithOptions(ctx context.Context, opts options.RouteOption) (int64, error) { |
There was a problem hiding this comment.
go cluster commands with route option has nil guard. Check other commands with route and follow the same pattern
| if err != nil { | ||
| return nil, err | ||
| } | ||
| return parseLatencyHistoryEntries(data) |
There was a problem hiding this comment.
This looks like is not sorted and in the command doc mentioned that it is sorted
| * JAVA: Add cluster management commands (CLUSTER MEET, CLUSTER FORGET, CLUSTER REPLICATE, CLUSTER REPLICAS, CLUSTER COUNT-FAILURE-REPORTS, CLUSTER FAILOVER, CLUSTER SETSLOT, CLUSTER BUMPEPOCH, CLUSTER SET-CONFIG-EPOCH, CLUSTER FLUSHSLOTS, CLUSTER RESET, READONLY, READWRITE, ASKING, CLUSTER SAVECONFIG, CLUSTER GETKEYSINSLOT) ([#5503](https://github.com/valkey-io/valkey-glide/pull/5503)) | ||
| * Go: Client-Side Caching Support ([#5721](https://github.com/valkey-io/valkey-glide/pull/5721)) | ||
| * Go: Add `PeriodicChecks` configuration to `AdvancedClusterClientConfiguration` — supports `PeriodicChecksEnabled`, `PeriodicChecksDisabled`, and `PeriodicChecksManualInterval` modes for controlling cluster topology check intervals ([#5842](https://github.com/valkey-io/valkey-glide/issues/5842)) | ||
| * Go: Add `LATENCY HISTORY`, `LATENCY LATEST`, and `LATENCY RESET` commands to the standalone `Client` and `ClusterClient`. Cluster variants expose `*WithOptions` for explicit `RouteOption` and return `models.ClusterValue[...]` for the read commands; the standalone client transparently merges per-node responses when the deployment has replicas. ([#5957](https://github.com/valkey-io/valkey-glide/issues/5957)) |
There was a problem hiding this comment.
Update your branch there is a new section Pending 2.5. Add this entry there
| _, err := client.LatencyHistory(ctx, "command") | ||
| require.Error(t, err) | ||
| assert.True(t, | ||
| strings.Contains(err.Error(), "context canceled") || |
There was a problem hiding this comment.
Is the check correct? There is a spelling error
Double-check the error message to check here
| LatestMs int64 | ||
| // MaxMs is the all-time maximum latency for this event in milliseconds. "All-time" means since | ||
| // the server was started or `LATENCY RESET` was last executed for this event. In a standalone | ||
| // deployment with replicas the standalone client merges per-node responses; in that case MaxMs |
There was a problem hiding this comment.
What is the real benefit on returning the maxms cross nodes? Would that loose important per-node information?
| // Nodes are processed in sorted-address order so the merge is deterministic; entries | ||
| // are then stable-sorted by ascending timestamp so equal-timestamp samples preserve | ||
| // their per-node insertion order (lexicographically smaller addresses appear first). | ||
| func mergeLatencyHistoryEntries(perNode map[string]any) ([]models.LatencyHistoryEntry, error) { |
There was a problem hiding this comment.
LATENCY HISTORY has ResponsePolicy::Special + RouteBy::AllNodes in glide-core. On a standalone+replicas deployment, the core sends the command to every node — primary and all replicas — and returns a Map<address, []entries>.
My concerns are
- The merge concatenates all per-node arrays and sorts by timestamp. The problem is that primary and replica spike histories are independent time series. A "command" spike on a replica is not the same event as one on the primary — replicas process replication traffic and read commands, not the same workload. Merging them as if they're one time series can be misleading.
- After the merge, node identity is gone. If two nodes both recorded a spike at timestamp T, the user sees two entries with the same timestamp and no way to know which node produced which.
It should be ok for in the standalone case only return information from the primary. If I'm not mistaken is the case for other commands in the code base
Summary
Implements the
LATENCY HISTORY,LATENCY LATEST, andLATENCY RESETcommands for the Go client (both standaloneClientandClusterClient), exposed throughinterfaces.ServerManagementCommands/interfaces.ServerManagementClusterCommands. This is the first slice of the work tracked in #5957.Issue link
This Pull Request is linked to issue: Implement server management commands (... LATENCY ...) for the Go client
Part of #5957
Features / Behaviour Changes
New API methods (standalone
Client):LatencyHistory(ctx, event)— returns[]models.LatencyHistoryEntryLatencyLatest(ctx)— returns[]models.LatencyLatestEntryLatencyReset(ctx)— returns the number of event series resetLatencyResetWithEvents(ctx, events)— reset only the supplied eventsNew API methods (
ClusterClient):LatencyHistory/LatencyHistoryWithOptions(opts options.RouteOption)— returnsmodels.ClusterValue[[]models.LatencyHistoryEntry]LatencyLatest/LatencyLatestWithOptions(opts options.RouteOption)— returnsmodels.ClusterValue[[]models.LatencyLatestEntry]LatencyReset/LatencyResetWithOptions(opts options.RouteOption)— returnsint64LatencyResetWithEvents/LatencyResetWithEventsAndOptions(events []string, opts options.RouteOption)— returnsint64New models (
go/v2/models):LatencyHistoryEntry { Timestamp, LatencyMs int64 }LatencyLatestEntry { EventName string; Timestamp, LatestMs, MaxMs int64 }Routing & response semantics:
LATENCY HISTORYandLATENCY LATESTuseAllNodesrouting with theSpecialresponse policy inglide-core. The cluster client surfaces this as a multi-valueClusterValuekeyed by node address.Map<address, response>. The new handlers transparently merge those into a single flat slice — the standalone surface stays a flat[]models.LatencyHistoryEntry/[]models.LatencyLatestEntry.LATENCY RESETusesAggregate(Sum)so the cluster client returns the total number of series reset across all targeted nodes.*WithOptionsvariants fall back to the no-options method whenopts.Route == nil, so default behavior is identical regardless of which entry point you use.Return values:
LATENCY HISTORY: array of[timestamp, latency_ms]pairs (parsed intoLatencyHistoryEntry).LATENCY LATEST: array of[event, timestamp, latest_ms, max_ms, ...](parsed intoLatencyLatestEntry). Trailing fields added by Valkey 8.1+ (sum / count) are intentionally not surfaced to keep the public type stable across server versions; consumers needing them can fall back toCustomCommand.LATENCY RESET: integer count of event series reset.Implementation
Files Modified:
go/glide_client.go(+98 lines) — adds the four standalone methods. They useexecuteCommandand the newhandleLatencyHistoryResponse/handleLatencyLatestResponsehandlers, which accept both single-nodeArrayand multi-nodeMapshapes from the core and merge the latter for a flat user-facing slice.go/glide_cluster_client.go(+~210 lines) — adds the eight cluster methods. The read commands returnClusterValueand pick between the multi-node and single-node handlers based onresponse_type(no options) orRoute.IsMultiNode()(with options); reset commands returnint64aggregated by the core.go/response_handlers.go— addsparseLatencyHistoryEntries,parseLatencyLatestEntries,mergeLatencyHistoryEntries,mergeLatencyLatestEntries, plus fourhandleLatency*handlers (two for standalone, two cluster-multi-node). Merge order is deterministic: nodes are processed in sorted-address order; history is stable-sorted by ascending timestamp; latest entries dedupe by event name (most-recent timestamp wins, ties break on largerLatestMs,MaxMsis the cross-node maximum). All handlers usedefer C.free_command_response(response)exactly once.go/interfaces/server_management_commands.go— adds the four standalone method signatures.go/interfaces/server_management_cluster_commands.go— adds the eight cluster method signatures.go/models/response_types.go— addsLatencyHistoryEntryandLatencyLatestEntry.go/latency_commands_test.go(new, 26 unit tests) — typed parsers (happy / empty / nil / wrong types / short tuples / 8.1+ trailing fields) and merge helpers (cross-node max, timestamp-tie break, sorted-by-event-name, error propagation).go/integTest/latency_commands_test.go(new, 12 integration tests) — standalone and cluster, including*WithOptions,RandomRoute, nil-route fallback, and context cancellation. ThetriggerLatencySpike*helpers temporarily lowerlatency-monitor-thresholdand uset.Cleanupso a failing test cannot leak server state.go/latency_commands_examples_test.go(new) — runnableExampleClient_*/ExampleClusterClient_*for every new public API pergo/AGENTS.md.CHANGELOG.md— Pending 2.4 entry.Notes vs. the wider issue (#5957):
The issue groups SAVE / BGSAVE / BGREWRITEAOF / SHUTDOWN / REPLICAOF / DEBUG / LATENCY / MEMORY / MONITOR / FAILOVER / PSYNC together. To keep this PR small and reviewable per
CLAUDE.md, only the LATENCY commands ship here; SHUTDOWN/MONITOR/PSYNC/DEBUG and the rest will be follow-up PRs against the same issue.Limitations
LATENCY HISTORYandLATENCY LATESTonly return data when the server haslatency-monitor-threshold > 0. The integration tests temporarily set it viaCONFIG SET; the user-facing godoc calls this out explicitly.sum/countfields onLATENCY LATESTentries are not surfaced onLatencyLatestEntry. Consumers needing them can fall back toCustomCommand([]string{"LATENCY", "LATEST"}).LATENCY LATESTaggregation: when the core returns per-node entries the standalone client merges them.MaxMsis reported as the cross-node maximum (which is the relevant cross-node invariant for monitoring); the winning(Timestamp, LatestMs)is from the most recent node. Documented on both the method and the model.Testing
Unit tests (
go/latency_commands_test.go, 26 cases — all PASS):Integration tests (
go/integTest/latency_commands_test.go, 12 cases): run against a Valkey 8.x cluster + standalone viamake integ-test test-filter='TestLatency.*'— all PASS, covering both clients, routing variants, nil-route fallback, and context cancellation.Static analysis:
go vet ./...,gofumpt -d,golines -m 127 --dry-run, andstaticcheckare all clean for the touched files.Pre-PR review: five parallel review subagents (performance, code quality / style, test coverage, documentation clarity, security & edge cases) were run per
CLAUDE.md. All actionable findings were addressed before commit:*WithOptionsnil-route fallback so default routing always matches the no-options method.t.Cleanupforlatency-monitor-thresholdrestoration intriggerLatencySpike*so test failures cannot leak server state.mergedslice inmergeLatencyHistoryEntries.latency-monitor-thresholdrequirement, thicker*WithOptionsgodoc, accurate tie-break semantics, comment placement aboveparseLatencyHistoryEntries, 8.1+sum/countnote onLatencyLatestEntry.RandomRouteintegration assertion (replaced the tautologicalIsSingleValue() || IsMultiValue()).Checklist
Before submitting the PR make sure the following are checked:
gofumpt,golines,staticcheck,go vet).main.References
glide-core/src/request_type.rs(LatencyHistory,LatencyLatest,LatencyReset).