Description
Track Sentinel and cluster failover events and correlate them with slowlog/COMMANDLOG spikes for post-incident analysis.
Problem
During failovers, latency spikes and command failures are common but the cause isn't always obvious from metrics alone. Being able to see "a failover happened at 03:12, and here's the slowlog spike that preceded/followed it" would make post-incident analysis significantly easier.
Proposed Scope
- Detect failover events by monitoring
INFO replication role changes and CLUSTER INFO state transitions
- Persist failover events with timestamps, old/new primary, and trigger reason where available
- Correlate failover timestamps with existing slowlog and anomaly detection data
- Surface in the UI timeline alongside existing anomaly events
- Add
failover.started and failover.completed webhook event types
Prior Art / Context
Requested by community — correlating failovers with slowlog spikes is a common post-incident debugging need for teams running Sentinel or Cluster topologies.
Related
- Existing cluster topology visualization
- Existing per-slot heatmaps and migration tracking
- Anomaly detection correlator (could add a
FAILOVER pattern)
Description
Track Sentinel and cluster failover events and correlate them with slowlog/COMMANDLOG spikes for post-incident analysis.
Problem
During failovers, latency spikes and command failures are common but the cause isn't always obvious from metrics alone. Being able to see "a failover happened at 03:12, and here's the slowlog spike that preceded/followed it" would make post-incident analysis significantly easier.
Proposed Scope
INFO replicationrole changes andCLUSTER INFOstate transitionsfailover.startedandfailover.completedwebhook event typesPrior Art / Context
Requested by community — correlating failovers with slowlog spikes is a common post-incident debugging need for teams running Sentinel or Cluster topologies.
Related
FAILOVERpattern)