Skip to content

Commit ccb5f42

Browse files
authored
Adding doc for ServerMaintenanceEvent (#1894)
Documentation for the feature introduced here: #1876
1 parent ef392e9 commit ccb5f42

2 files changed

Lines changed: 68 additions & 0 deletions

File tree

docs/ServerMaintenanceEvent.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Introducing ServerMaintenanceEvents
2+
3+
StackExchange.Redis now automatically subscribes to notifications about upcoming maintenance from supported Redis providers. The ServerMaintenanceEvent on the ConnectionMultiplexer raises events in response to notifications about server maintenance, and application code can subscribe to the event to handle connection drops more gracefully during these maintenance operations.
4+
5+
If you are a Redis vendor and want to integrate support for ServerMaintenanceEvents into StackExchange.Redis, we recommend opening an issue so we can discuss the details.
6+
7+
## Types of events
8+
9+
Azure Cache for Redis currently sends the following notifications:
10+
* `NodeMaintenanceScheduled`: Indicates that a maintenance event is scheduled. Can be 10-15 minutes in advance.
11+
* `NodeMaintenanceStarting`: This event gets fired ~20s before maintenance begins
12+
* `NodeMaintenanceStart`: This event gets fired when maintenance is imminent (<5s)
13+
* `NodeMaintenanceFailoverComplete`: Indicates that a replica has been promoted to primary
14+
* `NodeMaintenanceEnded`: Indicates that the node maintenance operation is over
15+
16+
## Sample code
17+
18+
The library will automatically subscribe to the pub/sub channel to receive notifications from the server, if one exists. For Azure Redis caches, this is the 'AzureRedisEvents' channel. To plug in your maintenance handling logic, you can pass in an event handler via the `ServerMaintenanceEvent` event on your `ConnectionMultiplexer`. For example:
19+
20+
```
21+
multiplexer.ServerMaintenanceEvent += (object sender, ServerMaintenanceEvent e) =>
22+
{
23+
if (e is AzureMaintenanceEvent azureEvent && azureEvent.NotificationType == AzureNotificationType.NodeMaintenanceStart)
24+
{
25+
// Take whatever action is appropriate for your application to handle the maintenance operation gracefully.
26+
// This might mean writing a log entry, redirecting traffic away from the impacted Redis server, or
27+
// something entirely different.
28+
}
29+
};
30+
```
31+
You can see the schema for the `AzureMaintenanceEvent` class [here](https://github.com/StackExchange/StackExchange.Redis/blob/main/src/StackExchange.Redis/Maintenance/AzureMaintenanceEvent.cs). Note that the library automatically sets the `ReceivedTimeUtc` timestamp when the event is received, so if you see in your logs that `ReceivedTimeUtc` is after `StartTimeUtc`, this may indicate that your connections are under high load.
32+
33+
## Walking through a sample maintenance event
34+
35+
1. App is connected to Redis and everything is working fine.
36+
2. Current Time: [16:21:39] -> `NodeMaintenanceScheduled` event is raised, with a `StartTimeUtc` of 16:35:57 (about 14 minutes from current time).
37+
* Note: the start time for this event is an approximation, because we will start getting ready for the update proactively and the node may become unavailable up to 3 minutes sooner. We recommend listening for `NodeMaintenanceStarting` and `NodeMaintenanceStart` for the highest level of accuracy (these are only likely to differ by a few seconds at most).
38+
3. Current Time: [16:34:26] -> `NodeMaintenanceStarting` message is received, and `StartTimeUtc` is 16:34:46, about 20 seconds from the current time.
39+
4. Current Time: [16:34:46] -> `NodeMaintenanceStart` message is received, so we know the node maintenance is about to happen. We break the circuit and stop sending new operations to the Redis connection. (Note: the appropriate action for your application may be different.) StackExchange.Redis will automatically refresh its view of the overall server topology.
40+
5. Current Time: [16:34:47] -> The connection is closed by the Redis server.
41+
6. Current Time: [16:34:56] -> `NodeMaintenanceFailoverComplete` message is received. This tells us that the replica node has promoted itself to primary, so the other node can go offline for maintenance.
42+
7. Current Time [16:34:56] -> The connection to the Redis server is restored. It is safe to send commands again to the connection and all commands will succeed.
43+
8. Current Time [16:37:48] -> `NodeMaintenanceEnded` message is received, with a `StartTimeUtc` of 16:37:48. Nothing to do here if you are talking to the load balancer endpoint (port 6380 or 6379). For clustered servers, you can resume sending readonly workloads to the replica(s).
44+
45+
## Azure Cache for Redis Maintenance Event details
46+
47+
#### NodeMaintenanceScheduled event
48+
49+
`NodeMaintenanceScheduled` events are raised for maintenance scheduled by Azure, up to 15 minutes in advance. This event will not get fired for user-initiated reboots.
50+
51+
#### NodeMaintenanceStarting event
52+
53+
`NodeMaintenanceStarting` events are raised ~20 seconds ahead of upcoming maintenance. This means that one of the primary or replica nodes will be going down for maintenance.
54+
55+
It's important to understand that this does *not* mean downtime if you are using a Standard/Premier SKU cache. If the replica is targeted for maintenance, disruptions should be minimal. If the primary node is the one going down for maintenance, a failover will occur, which will close existing connections going through the load balancer port (6380/6379) or directly to the node (15000/15001). You may want to pause sending write commands until the replica node has assumed the primary role and the failover is complete.
56+
57+
#### NodeMaintenanceStart event
58+
59+
`NodeMaintenanceStart` events are raised when maintenance is imminent (within seconds). These messages do not include a `StartTimeUtc` because they are fired immediately before maintenance occurs.
60+
61+
#### NodeMaintenanceFailoverComplete event
62+
63+
`NodeMaintenanceFailoverComplete` events are raised when a replica has promoted itself to primary. These events do not include a `StartTimeUtc` because the action has already occurred.
64+
65+
#### NodeMaintenanceEnded event
66+
67+
`NodeMaintenanceEnded` events are raised to indicate that the maintenance operation has completed and that the replica is once again available. You do *NOT* need to wait for this event to use the load balancer endpoint, as it is available throughout. However, we included this for logging purposes and for customers who use the replica endpoint in clusters for read workloads.

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Documentation
3838
- [Transactions](Transactions) - how atomic transactions work in redis
3939
- [Events](Events) - the events available for logging / information purposes
4040
- [Pub/Sub Message Order](PubSubOrder) - advice on sequential and concurrent processing
41+
- [ServerMaintenanceEvent](ServerMaintenanceEvent) - how to listen and prepare for hosted server maintenance (e.g. Azure Cache for Redis)
4142
- [Streams](Streams) - how to use the Stream data type
4243
- [Where are `KEYS` / `SCAN` / `FLUSH*`?](KeysScan) - how to use server-based commands
4344
- [Profiling](Profiling) - profiling interfaces, as well as how to profile in an `async` world

0 commit comments

Comments
 (0)