Skip to content

Commit 1e764a0

Browse files
committed
wip
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
1 parent 5aa4dc8 commit 1e764a0

File tree

2 files changed

+79
-3
lines changed

2 files changed

+79
-3
lines changed

docs/content/en/blog/news/primary-cache-for-next-recon.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,10 @@ author: >-
1010
Read-cache-after-write consistency feature replaces this functionality. (since version 5.3.0)
1111

1212
> It provides this functionality also for secondary resources and optimistic locking
13-
is not required anymore. See [details here](./../../docs/documentation/reconciler.md#read-cache-after-write-consistency-and-event-filtering).
14-
{{% /alert %}}
13+
is not required anymore. See the [docs](./../../docs/documentation/reconciler.md#read-cache-after-write-consistency-and-event-filtering) and
14+
related [blog post](read-after-write-consistency.md) for details.
1515

16+
{{% /alert %}}
1617

1718
We recently released v5.1 of Java Operator SDK (JOSDK). One of the highlights of this release is related to a topic of
1819
so-called

docs/content/en/blog/news/read-after-write-consistency.md

Lines changed: 76 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ it is not. Websocket can be disconnected (actually happens on purpose sometimes)
8989

9090
## The problem(s) we try to solve
9191

92-
Let's consider the following operator:
92+
Let's consider an operator with the following requirements:
9393
- we have a custom resource `PodPrefix` where the spec contains only one field: `podNamePrexix`,
9494
- goal of the operator is to create a pod with name that has the prefix and a random sequence suffix
9595
- it should never run two pods at once, if the `podNamePrefix` changes it should delete
@@ -155,13 +155,88 @@ So can we have stronger guarantees regarding caches? It turns out we can now...
155155

156156
## Achieving read-cache-after-write consistency
157157

158+
When we send an update (applies also on various create and patch) requests to Kubernetes API, in the response
159+
we receive the up-to-date resource, with the resource version that is the most recent at that point.
160+
The idea of the implementation is that we can cache the response in a cache on top the Informer's cache.
161+
We call this cache `TemporaryResourceCache` (TRC) and among caching such responses has also role for event filtering
162+
as we will see later.
158163

164+
Note that the challenge here was in the past, when to evict this response from the TRC. We eventually
165+
will receive an event in informer and the informer cache will be propagated with an up-to-date resource.
166+
But was not possible to tell reliably about an event that it contains a resource that it was a result
167+
of an update prior or after our update. The reason is that Kubernetes documentation stated that the
168+
`metadata.resourceVersion` should be considered as a string, and should be matched only with equality.
169+
Although with optimistic locking we were able to overcome this issue, see [this blogpost](primary-cache-for-next-recon.md).
159170

171+
{{% alert color=success %}}
172+
This changed in Kubernetes guidelines. Now if we are able to pars the `resourceVersion` as an integer
173+
we can use numerical comparison. See related [KEP](https://github.com/michaelasp/enhancements/tree/master/keps/sig-api-machinery/5504-comparable-resource-version)
174+
{{% /alert %}}
175+
176+
From this point the idea of the algorithm is very simple:
160177

178+
1. After update kubernetes resource cache the response in TRC if the Informer's cache.
179+
2. If the informer propagates an event, check if it's resource version is same or larger
180+
or equals than the one in the TRC, if yes, evict the resource from TRC.
181+
3. If the controller reads a resource from the cache first it checks if it in TRC then in Informers cache.
182+
183+
184+
```mermaid
185+
sequenceDiagram
186+
box rgba(50,108,229,0.1)
187+
participant K8S as ⎈ Kubernetes API Server
188+
end
189+
box rgba(232,135,58,0.1)
190+
participant R as Reconciler
191+
end
192+
box rgba(58,175,169,0.1)
193+
participant I as Informer
194+
participant IC as Informer Cache
195+
participant TRC as Temporary Resource Cache
196+
end
197+
198+
R->>K8S: 1. Update resource
199+
K8S-->>R: Updated resource (with new resourceVersion)
200+
R->>TRC: 2. Cache updated resource in TRC
201+
202+
I-)K8S: 3. Watch event (resource updated)
203+
I->>TRC: On event: event resourceVersion ≥ TRC version?
204+
alt Yes: event is up-to-date
205+
I-->>TRC: Evict resource from TRC
206+
else No: stale event
207+
Note over TRC: TRC entry retained
208+
end
209+
210+
R->>TRC: 4. Read resource from cache
211+
alt Resource found in TRC
212+
TRC-->>R: Return cached resource
213+
else Not in TRC
214+
R->>IC: Read from Informer Cache
215+
IC-->>R: Return resource
216+
end
217+
```
161218

162219
## Filtering events for our own updates
163220

221+
When update a resource, eventually the informer will propagate an event that will trigger the reconciliation.
222+
However, this is mostly not something that is desired. Since we know we already know that point the up-to-date
223+
resource, we would like to be notified only if that resource is changed after our change.
224+
Therefore, in addition to the caching of the resource we also filter out the events which contains a resource
225+
version that is older or has the same resource version as our cached resource.
226+
227+
Note that the implementation of this is relatively complex. Since while doing the update we want to record all the
228+
events that we received meanwhile, and make a decision to propagate any further if the update request if complete.
229+
230+
However, this way we significantly reduce the number of reconciliations, thus making the whole process much more efficient.
231+
232+
## Additional considerations and alternatives
233+
234+
## Conclusion
235+
236+
## Notes
237+
164238

165239
TODO:
240+
- alternatives => deferring reconciliation, this is optimized for throughput
166241
- filter events
167242
- reschedule

0 commit comments

Comments
 (0)