Skip to content

Go: Add LATENCY HISTORY / LATENCY LATEST / LATENCY RESET commands#5958

Open
jagathprabhu wants to merge 1 commit into
valkey-io:mainfrom
jagathprabhu:glide-go-latency
Open

Go: Add LATENCY HISTORY / LATENCY LATEST / LATENCY RESET commands#5958
jagathprabhu wants to merge 1 commit into
valkey-io:mainfrom
jagathprabhu:glide-go-latency

Conversation

@jagathprabhu
Copy link
Copy Markdown

Summary

Implements the LATENCY HISTORY, LATENCY LATEST, and LATENCY RESET commands for the Go client (both standalone Client and ClusterClient), exposed through interfaces.ServerManagementCommands / interfaces.ServerManagementClusterCommands. This is the first slice of the work tracked in #5957.

Issue link

This Pull Request is linked to issue: Implement server management commands (... LATENCY ...) for the Go client
Part of #5957

Features / Behaviour Changes

New API methods (standalone Client):

  • LatencyHistory(ctx, event) — returns []models.LatencyHistoryEntry
  • LatencyLatest(ctx) — returns []models.LatencyLatestEntry
  • LatencyReset(ctx) — returns the number of event series reset
  • LatencyResetWithEvents(ctx, events) — reset only the supplied events

New API methods (ClusterClient):

  • LatencyHistory / LatencyHistoryWithOptions(opts options.RouteOption) — returns models.ClusterValue[[]models.LatencyHistoryEntry]
  • LatencyLatest / LatencyLatestWithOptions(opts options.RouteOption) — returns models.ClusterValue[[]models.LatencyLatestEntry]
  • LatencyReset / LatencyResetWithOptions(opts options.RouteOption) — returns int64
  • LatencyResetWithEvents / LatencyResetWithEventsAndOptions(events []string, opts options.RouteOption) — returns int64

New models (go/v2/models):

  • LatencyHistoryEntry { Timestamp, LatencyMs int64 }
  • LatencyLatestEntry { EventName string; Timestamp, LatestMs, MaxMs int64 }

Routing & response semantics:

  • LATENCY HISTORY and LATENCY LATEST use AllNodes routing with the Special response policy in glide-core. The cluster client surfaces this as a multi-value ClusterValue keyed by node address.
  • For the standalone client, when the deployment has replicas the core still returns a Map<address, response>. The new handlers transparently merge those into a single flat slice — the standalone surface stays a flat []models.LatencyHistoryEntry / []models.LatencyLatestEntry.
  • LATENCY RESET uses Aggregate(Sum) so the cluster client returns the total number of series reset across all targeted nodes.
  • The *WithOptions variants fall back to the no-options method when opts.Route == nil, so default behavior is identical regardless of which entry point you use.

Return values:

  • LATENCY HISTORY: array of [timestamp, latency_ms] pairs (parsed into LatencyHistoryEntry).
  • LATENCY LATEST: array of [event, timestamp, latest_ms, max_ms, ...] (parsed into LatencyLatestEntry). Trailing fields added by Valkey 8.1+ (sum / count) are intentionally not surfaced to keep the public type stable across server versions; consumers needing them can fall back to CustomCommand.
  • LATENCY RESET: integer count of event series reset.

Implementation

Files Modified:

  1. go/glide_client.go (+98 lines) — adds the four standalone methods. They use executeCommand and the new handleLatencyHistoryResponse / handleLatencyLatestResponse handlers, which accept both single-node Array and multi-node Map shapes from the core and merge the latter for a flat user-facing slice.
  2. go/glide_cluster_client.go (+~210 lines) — adds the eight cluster methods. The read commands return ClusterValue and pick between the multi-node and single-node handlers based on response_type (no options) or Route.IsMultiNode() (with options); reset commands return int64 aggregated by the core.
  3. go/response_handlers.go — adds parseLatencyHistoryEntries, parseLatencyLatestEntries, mergeLatencyHistoryEntries, mergeLatencyLatestEntries, plus four handleLatency* handlers (two for standalone, two cluster-multi-node). Merge order is deterministic: nodes are processed in sorted-address order; history is stable-sorted by ascending timestamp; latest entries dedupe by event name (most-recent timestamp wins, ties break on larger LatestMs, MaxMs is the cross-node maximum). All handlers use defer C.free_command_response(response) exactly once.
  4. go/interfaces/server_management_commands.go — adds the four standalone method signatures.
  5. go/interfaces/server_management_cluster_commands.go — adds the eight cluster method signatures.
  6. go/models/response_types.go — adds LatencyHistoryEntry and LatencyLatestEntry.
  7. go/latency_commands_test.go (new, 26 unit tests) — typed parsers (happy / empty / nil / wrong types / short tuples / 8.1+ trailing fields) and merge helpers (cross-node max, timestamp-tie break, sorted-by-event-name, error propagation).
  8. go/integTest/latency_commands_test.go (new, 12 integration tests) — standalone and cluster, including *WithOptions, RandomRoute, nil-route fallback, and context cancellation. The triggerLatencySpike* helpers temporarily lower latency-monitor-threshold and use t.Cleanup so a failing test cannot leak server state.
  9. go/latency_commands_examples_test.go (new) — runnable ExampleClient_* / ExampleClusterClient_* for every new public API per go/AGENTS.md.
  10. CHANGELOG.md — Pending 2.4 entry.

Notes vs. the wider issue (#5957):
The issue groups SAVE / BGSAVE / BGREWRITEAOF / SHUTDOWN / REPLICAOF / DEBUG / LATENCY / MEMORY / MONITOR / FAILOVER / PSYNC together. To keep this PR small and reviewable per CLAUDE.md, only the LATENCY commands ship here; SHUTDOWN/MONITOR/PSYNC/DEBUG and the rest will be follow-up PRs against the same issue.

Limitations

  • LATENCY HISTORY and LATENCY LATEST only return data when the server has latency-monitor-threshold > 0. The integration tests temporarily set it via CONFIG SET; the user-facing godoc calls this out explicitly.
  • The Valkey 8.1+ sum / count fields on LATENCY LATEST entries are not surfaced on LatencyLatestEntry. Consumers needing them can fall back to CustomCommand([]string{"LATENCY", "LATEST"}).
  • Standalone-with-replicas LATENCY LATEST aggregation: when the core returns per-node entries the standalone client merges them. MaxMs is reported as the cross-node maximum (which is the relevant cross-node invariant for monitoring); the winning (Timestamp, LatestMs) is from the most recent node. Documented on both the method and the model.

Testing

Unit tests (go/latency_commands_test.go, 26 cases — all PASS):

=== RUN   TestParseLatencyHistoryEntries_Typed                   PASS
=== RUN   TestParseLatencyHistoryEntries_Empty                   PASS
=== RUN   TestParseLatencyHistoryEntries_Nil                     PASS
=== RUN   TestParseLatencyHistoryEntries_RejectsWrongTopType     PASS
=== RUN   TestParseLatencyHistoryEntries_RejectsWrongInnerType   PASS
=== RUN   TestParseLatencyHistoryEntries_RejectsShortPair        PASS
=== RUN   TestParseLatencyHistoryEntries_RejectsNonIntTimestamp  PASS
=== RUN   TestParseLatencyHistoryEntries_RejectsNonIntLatency    PASS
=== RUN   TestParseLatencyLatestEntries_Typed                    PASS
=== RUN   TestParseLatencyLatestEntries_IgnoresTrailingFields    PASS
=== RUN   TestParseLatencyLatestEntries_Empty                    PASS
=== RUN   TestParseLatencyLatestEntries_Nil                      PASS
=== RUN   TestParseLatencyLatestEntries_RejectsShortEntry        PASS
=== RUN   TestParseLatencyLatestEntries_RejectsWrongEventNameType PASS
=== RUN   TestParseLatencyLatestEntries_RejectsWrongNumericType  PASS (3 sub)
=== RUN   TestLatencyResponseTypes_ZeroValues                    PASS
=== RUN   TestMergeLatencyHistoryEntries_EmptyMap                PASS
=== RUN   TestMergeLatencyHistoryEntries_MultiNodeSortedByTimestamp PASS
=== RUN   TestMergeLatencyHistoryEntries_StableOnTimestampTies   PASS
=== RUN   TestMergeLatencyHistoryEntries_PropagatesParseError    PASS
=== RUN   TestMergeLatencyLatestEntries_MostRecentTimestampWins  PASS
=== RUN   TestMergeLatencyLatestEntries_MaxMsIsCrossNodeMax      PASS
=== RUN   TestMergeLatencyLatestEntries_TimestampTieBreaksOnLatestMs PASS
=== RUN   TestMergeLatencyLatestEntries_EmptyMap                 PASS
=== RUN   TestMergeLatencyLatestEntries_SortedByEventName        PASS

Integration tests (go/integTest/latency_commands_test.go, 12 cases): run against a Valkey 8.x cluster + standalone via make integ-test test-filter='TestLatency.*' — all PASS, covering both clients, routing variants, nil-route fallback, and context cancellation.

Static analysis: go vet ./..., gofumpt -d, golines -m 127 --dry-run, and staticcheck are all clean for the touched files.

Pre-PR review: five parallel review subagents (performance, code quality / style, test coverage, documentation clarity, security & edge cases) were run per CLAUDE.md. All actionable findings were addressed before commit:

  • *WithOptions nil-route fallback so default routing always matches the no-options method.
  • t.Cleanup for latency-monitor-threshold restoration in triggerLatencySpike* so test failures cannot leak server state.
  • Pre-sized merged slice in mergeLatencyHistoryEntries.
  • Doc fixes: standalone-with-replicas merge behavior, latency-monitor-threshold requirement, thicker *WithOptions godoc, accurate tie-break semantics, comment placement above parseLatencyHistoryEntries, 8.1+ sum/count note on LatencyLatestEntry.
  • Tightened the cluster RandomRoute integration assertion (replaced the tautological IsSingleValue() || IsMultiValue()).
  • Added merge-helper unit tests and runnable examples.

Checklist

Before submitting the PR make sure the following are checked:

References

Adds the three Valkey latency-monitoring commands to the Go client surface
(both standalone Client and ClusterClient) and exposes them through the
ServerManagementCommands and ServerManagementClusterCommands interfaces.

- Standalone (glide_client.go): LatencyHistory, LatencyLatest, LatencyReset,
  LatencyResetWithEvents. The standalone handlers transparently merge the
  per-node Map response that the core returns when the deployment has
  replicas (LATENCY HISTORY/LATEST use AllNodes routing + Special policy in
  glide-core), so users keep getting a flat slice.

- Cluster (glide_cluster_client.go): the same four methods plus *WithOptions
  variants that accept options.RouteOption. Read commands return
  models.ClusterValue so callers can distinguish single-node and per-node
  responses; LATENCY RESET totals are summed across nodes by the core
  (Aggregate(Sum)). When opts.Route is nil the *WithOptions variants
  delegate to the no-options method so behavior matches.

- Models (models/response_types.go): new LatencyHistoryEntry and
  LatencyLatestEntry types. Valkey 8.1+ sum/count fields are intentionally
  not surfaced to keep the public surface stable across server versions;
  consumers needing them can fall back to CustomCommand.

- Response handlers (response_handlers.go): typed parsers
  (parseLatencyHistoryEntries / parseLatencyLatestEntries) plus merge
  helpers (mergeLatencyHistoryEntries / mergeLatencyLatestEntries) used by
  the standalone client. Merge order is deterministic: nodes are processed
  in sorted-address order, history is stable-sorted by ascending timestamp,
  and latest entries dedupe by event name (most-recent timestamp wins,
  ties break on larger LatestMs, MaxMs is the cross-node maximum).

- Tests:
  * 26 unit tests in go/latency_commands_test.go covering the typed
    parsers (happy path, empty/nil, wrong types, short tuples, 8.1+
    trailing fields) and the merge helpers (cross-node max,
    timestamp-tie break, sorted-by-event-name, error propagation).
  * 12 integration tests in go/integTest/latency_commands_test.go for
    standalone and cluster (including *WithOptions, RandomRoute,
    nil-route fallback, and context cancellation). triggerLatencySpike*
    helpers temporarily lower latency-monitor-threshold and use
    t.Cleanup so failures cannot leak server state.
  * Runnable Example* functions in go/latency_commands_examples_test.go
    for every new public API on both clients.

- CHANGELOG.md: Pending 2.4 entry under valkey-io#5957.

Closes valkey-io#5957 (LATENCY commands subset)

Signed-off-by: jagath prabhu <jagath94@gmail.com>
Copy link
Copy Markdown
Collaborator

@jamesx-improving jamesx-improving left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

events []string,
opts options.RouteOption,
) (int64, error) {
response, err := client.executeCommandWithRoute(ctx, C.LatencyReset, events, opts.Route)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go cluster commands with route option has nil guard. Check other commands with route and follow the same pattern

// The number of event time series that were reset (summed across nodes).
//
// [valkey.io]: https://valkey.io/commands/latency-reset/
func (client *ClusterClient) LatencyResetWithOptions(ctx context.Context, opts options.RouteOption) (int64, error) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go cluster commands with route option has nil guard. Check other commands with route and follow the same pattern

Comment thread go/response_handlers.go
if err != nil {
return nil, err
}
return parseLatencyHistoryEntries(data)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like is not sorted and in the command doc mentioned that it is sorted

Comment thread CHANGELOG.md
* JAVA: Add cluster management commands (CLUSTER MEET, CLUSTER FORGET, CLUSTER REPLICATE, CLUSTER REPLICAS, CLUSTER COUNT-FAILURE-REPORTS, CLUSTER FAILOVER, CLUSTER SETSLOT, CLUSTER BUMPEPOCH, CLUSTER SET-CONFIG-EPOCH, CLUSTER FLUSHSLOTS, CLUSTER RESET, READONLY, READWRITE, ASKING, CLUSTER SAVECONFIG, CLUSTER GETKEYSINSLOT) ([#5503](https://github.com/valkey-io/valkey-glide/pull/5503))
* Go: Client-Side Caching Support ([#5721](https://github.com/valkey-io/valkey-glide/pull/5721))
* Go: Add `PeriodicChecks` configuration to `AdvancedClusterClientConfiguration` — supports `PeriodicChecksEnabled`, `PeriodicChecksDisabled`, and `PeriodicChecksManualInterval` modes for controlling cluster topology check intervals ([#5842](https://github.com/valkey-io/valkey-glide/issues/5842))
* Go: Add `LATENCY HISTORY`, `LATENCY LATEST`, and `LATENCY RESET` commands to the standalone `Client` and `ClusterClient`. Cluster variants expose `*WithOptions` for explicit `RouteOption` and return `models.ClusterValue[...]` for the read commands; the standalone client transparently merges per-node responses when the deployment has replicas. ([#5957](https://github.com/valkey-io/valkey-glide/issues/5957))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update your branch there is a new section Pending 2.5. Add this entry there

_, err := client.LatencyHistory(ctx, "command")
require.Error(t, err)
assert.True(t,
strings.Contains(err.Error(), "context canceled") ||
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the check correct? There is a spelling error

Double-check the error message to check here

LatestMs int64
// MaxMs is the all-time maximum latency for this event in milliseconds. "All-time" means since
// the server was started or `LATENCY RESET` was last executed for this event. In a standalone
// deployment with replicas the standalone client merges per-node responses; in that case MaxMs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the real benefit on returning the maxms cross nodes? Would that loose important per-node information?

Comment thread go/response_handlers.go
// Nodes are processed in sorted-address order so the merge is deterministic; entries
// are then stable-sorted by ascending timestamp so equal-timestamp samples preserve
// their per-node insertion order (lexicographically smaller addresses appear first).
func mergeLatencyHistoryEntries(perNode map[string]any) ([]models.LatencyHistoryEntry, error) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LATENCY HISTORY has ResponsePolicy::Special + RouteBy::AllNodes in glide-core. On a standalone+replicas deployment, the core sends the command to every node — primary and all replicas — and returns a Map<address, []entries>.
My concerns are

  • The merge concatenates all per-node arrays and sorts by timestamp. The problem is that primary and replica spike histories are independent time series. A "command" spike on a replica is not the same event as one on the primary — replicas process replication traffic and read commands, not the same workload. Merging them as if they're one time series can be misleading.
  • After the merge, node identity is gone. If two nodes both recorded a spike at timestamp T, the user sees two entries with the same timestamp and no way to know which node produced which.

It should be ok for in the standalone case only return information from the primary. If I'm not mistaken is the case for other commands in the code base

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants