You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This update addresses some comments I've received on the design
document. Most importantly, I add a new table for rate limiters which
allows us to combine several callers under 1 rate limiter in a
straightforward way.
-[Pool Member / Multi-Host Considerations](#pool-member-multi-host-considerations)
20
27
<!--toc:end-->
21
28
22
29
## Overview
23
30
24
31
We have had several customer incidents in the past that have been attributed to
25
-
“overloading” xapi. This effectively means that a client is making requests at
26
-
a rate that xapi cannot handle. This can result in very bad response times (“we
32
+
“overloading” Xapi. This effectively means that a client is making requests at
33
+
a rate that Xapi cannot handle. This can result in very bad response times (“we
27
34
tried to shut down 20 VMs and this took 2 hours!”) and general system
28
35
instability and unavailability.
29
36
@@ -32,7 +39,7 @@ either misconfigured or make improper use of the API, hammering the pool and
32
39
breaking use of the good guys. For example, a dodgy monitoring service may
33
40
lock out the control software, or slow down VM lifecycle operations.
34
41
35
-
Part of the problem is that xapi and xenopsd are not very good at handling
42
+
Part of the problem is that Xapi and xenopsd are not very good at handling
36
43
load, in particular in a pool where the coordinator is often a bottleneck. A
37
44
lot of work has already been done make the Toolstack cope better under load.
38
45
This is important and a lot more can be done in this space.
@@ -44,7 +51,7 @@ improvements, as a complimentary approach.
44
51
45
52
Last year, thread prioritisation was tried. This could be revisited, but we
46
53
also need an approach that allows us to pose hard constraints on clients. For
47
-
example, as an admin, I want to configure xapi to give my control panel
54
+
example, as an admin, I want to configure Xapi to give my control panel
48
55
unlimited access, but explicitly limit how much Monitoring App X can do.
49
56
50
57
The proposal here is to do a simpler kind of per-client rate limiting at the
@@ -91,60 +98,156 @@ potentially made on multiple connections.
91
98
### Client classification
92
99
In order to let pool administrators know who they should be rate limiting, we
93
100
will also introduce a **Caller** datamodel class which tracks all requests made
94
-
to xapi.
101
+
to Xapi.
95
102
96
103
Callers will be a high-level way of tracking clients. We allow callers to be
97
104
identified by a number of different parameters: AD user, IP address,
98
-
originator, user agent. When an unknown caller makes a request to xapi, we
105
+
originator, user agent. When an unknown caller makes a request to Xapi, we
99
106
record their data in a new row. The pool administrator will be able to merge
100
107
related callers together and assign them labels.
101
108
102
109
The caller classification allows wildcards for any field, though we require
103
110
that at least one field be specified. This lets us, for example, combine all
104
-
accesses from the xapi python API by specifying the user-agent .
111
+
accesses from the Xapi python API by specifying the user-agent .
105
112
106
-
In order to assist with rate limiting, we can store statistics about callers:
107
-
- Last request timestamp
113
+
A rate limiter can be associated with any number of callers, and the parameters
114
+
of the rate limiter can either be derived from the usage patterns of the
115
+
callers or selected from a number of preset profiles.
116
+
117
+
### Statistics
118
+
In order to assist with rate limiting, we can store statistics about callers.
119
+
We identify two kinds of statistics: volatile and stable. Volatile statistics
120
+
change over time without any input, e.g. sliding windows. These will be stored
121
+
in RRDs. By contrast, stable statistics only vary at most once per request, and
122
+
so are safe to store in the main database.
123
+
124
+
**Volatile statistics:**
108
125
- Tokens used over the last (5 minutes/hour/day).
109
-
- Most common API requests.
126
+
- Most common requests over the last (5 minutes/hour/day).
110
127
111
-
A rate limiter can then be attached to a particular caller, and the parameters
112
-
of the rate limiter can either be derived from the usage patterns of the caller
113
-
or selected from a number of preset profiles.
128
+
**Stable statistics:**
129
+
- Last request timestamp
114
130
115
131
## API design
116
-
We propose two new datamodels: **Caller** and **Rate_limit**.
132
+
133
+
### Caller Datamodel
134
+
We propose two new datamodel tables: **Caller**, which stores the data associated with each caller, and **Rate limit**, which identifies one or more callers with a rate limiter.
0 commit comments