You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content/en/blog/news/read-after-write-consistency.md
+76-1Lines changed: 76 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -89,7 +89,7 @@ it is not. Websocket can be disconnected (actually happens on purpose sometimes)
89
89
90
90
## The problem(s) we try to solve
91
91
92
-
Let's consider the following operator:
92
+
Let's consider an operator with the following requirements:
93
93
- we have a custom resource `PodPrefix` where the spec contains only one field: `podNamePrexix`,
94
94
- goal of the operator is to create a pod with name that has the prefix and a random sequence suffix
95
95
- it should never run two pods at once, if the `podNamePrefix` changes it should delete
@@ -155,13 +155,88 @@ So can we have stronger guarantees regarding caches? It turns out we can now...
155
155
156
156
## Achieving read-cache-after-write consistency
157
157
158
+
When we send an update (applies also on various create and patch) requests to Kubernetes API, in the response
159
+
we receive the up-to-date resource, with the resource version that is the most recent at that point.
160
+
The idea of the implementation is that we can cache the response in a cache on top the Informer's cache.
161
+
We call this cache `TemporaryResourceCache` (TRC) and among caching such responses has also role for event filtering
162
+
as we will see later.
158
163
164
+
Note that the challenge here was in the past, when to evict this response from the TRC. We eventually
165
+
will receive an event in informer and the informer cache will be propagated with an up-to-date resource.
166
+
But was not possible to tell reliably about an event that it contains a resource that it was a result
167
+
of an update prior or after our update. The reason is that Kubernetes documentation stated that the
168
+
`metadata.resourceVersion` should be considered as a string, and should be matched only with equality.
169
+
Although with optimistic locking we were able to overcome this issue, see [this blogpost](primary-cache-for-next-recon.md).
159
170
171
+
{{% alert color=success %}}
172
+
This changed in Kubernetes guidelines. Now if we are able to pars the `resourceVersion` as an integer
173
+
we can use numerical comparison. See related [KEP](https://github.com/michaelasp/enhancements/tree/master/keps/sig-api-machinery/5504-comparable-resource-version)
174
+
{{% /alert %}}
175
+
176
+
From this point the idea of the algorithm is very simple:
160
177
178
+
1. After update kubernetes resource cache the response in TRC if the Informer's cache.
179
+
2. If the informer propagates an event, check if it's resource version is same or larger
180
+
or equals than the one in the TRC, if yes, evict the resource from TRC.
181
+
3. If the controller reads a resource from the cache first it checks if it in TRC then in Informers cache.
182
+
183
+
184
+
```mermaid
185
+
sequenceDiagram
186
+
box rgba(50,108,229,0.1)
187
+
participant K8S as ⎈ Kubernetes API Server
188
+
end
189
+
box rgba(232,135,58,0.1)
190
+
participant R as Reconciler
191
+
end
192
+
box rgba(58,175,169,0.1)
193
+
participant I as Informer
194
+
participant IC as Informer Cache
195
+
participant TRC as Temporary Resource Cache
196
+
end
197
+
198
+
R->>K8S: 1. Update resource
199
+
K8S-->>R: Updated resource (with new resourceVersion)
200
+
R->>TRC: 2. Cache updated resource in TRC
201
+
202
+
I-)K8S: 3. Watch event (resource updated)
203
+
I->>TRC: On event: event resourceVersion ≥ TRC version?
204
+
alt Yes: event is up-to-date
205
+
I-->>TRC: Evict resource from TRC
206
+
else No: stale event
207
+
Note over TRC: TRC entry retained
208
+
end
209
+
210
+
R->>TRC: 4. Read resource from cache
211
+
alt Resource found in TRC
212
+
TRC-->>R: Return cached resource
213
+
else Not in TRC
214
+
R->>IC: Read from Informer Cache
215
+
IC-->>R: Return resource
216
+
end
217
+
```
161
218
162
219
## Filtering events for our own updates
163
220
221
+
When update a resource, eventually the informer will propagate an event that will trigger the reconciliation.
222
+
However, this is mostly not something that is desired. Since we know we already know that point the up-to-date
223
+
resource, we would like to be notified only if that resource is changed after our change.
224
+
Therefore, in addition to the caching of the resource we also filter out the events which contains a resource
225
+
version that is older or has the same resource version as our cached resource.
226
+
227
+
Note that the implementation of this is relatively complex. Since while doing the update we want to record all the
228
+
events that we received meanwhile, and make a decision to propagate any further if the update request if complete.
229
+
230
+
However, this way we significantly reduce the number of reconciliations, thus making the whole process much more efficient.
231
+
232
+
## Additional considerations and alternatives
233
+
234
+
## Conclusion
235
+
236
+
## Notes
237
+
164
238
165
239
TODO:
240
+
- alternatives => deferring reconciliation, this is optimized for throughput
0 commit comments