fix(apiserver): resolve VD GET cache-miss via live RD re-read (aligns with RD GET)#172
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds a cache-miss fallback to ChangesCache-miss fallback for VolumeDefinitions.Get
Estimated code review effort: 3 (Moderate) | ~20 minutes Sequence Diagram(s)sequenceDiagram
participant Caller
participant VolumeDefinitionsGet as volumeDefinitions.Get
participant CachedClient
participant APIReader
Caller->>VolumeDefinitionsGet: Get(volumeNumber)
VolumeDefinitionsGet->>CachedClient: fetch RD (cached)
CachedClient-->>VolumeDefinitionsGet: NotFound / stale RD
alt apiReader configured and NotFound
VolumeDefinitionsGet->>APIReader: live re-read RD
APIReader-->>VolumeDefinitionsGet: fresh RD
VolumeDefinitionsGet->>VolumeDefinitionsGet: vdByNumber(freshRD)
else no apiReader or other error
VolumeDefinitionsGet->>VolumeDefinitionsGet: return error immediately
end
VolumeDefinitionsGet-->>Caller: VolumeDefinition or ErrNotFound
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements cache-miss fallback logic across various Kubernetes store implementations (nodes, resourceGroups, resources, snapshots, storagePools, and volumeDefinitions) to prevent spurious NotFound errors caused by multi-replica apiserver cache lag. When a resource is not found in the informer cache, the store now falls back to a direct live read using the apiReader. A comprehensive set of unit tests has been added to verify this fallback behavior. I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
c3f3d0f to
acf3160
Compare
… with RD GET)
Volume-definitions are embedded in the RD CRD spec, so after the
v0.1.15 RD-404 fix the SAME committed RD answered differently by
path on a multi-replica apiserver: `GET /v1/resource-definitions/{rd}`
resolved through the RD store's uncached fallback while
`GET .../volume-definitions/{vn}` read only the lagging informer cache
and 404'd — observed live on the cozystack e2e stand
(`GET .../volume-definitions/0 -> 404`, REST cache-retry budget
exhausted). volumeDefinitions.Get now does one live re-read via the
existing fetchRDLive on a cache miss OR a stale cached RD revision,
mirroring resourceDefinitions.Get.
Deliberately NOT extended to the other stores: raw store Gets are
cache-only by design — REST existence probes rely on a fast cached
NotFound, and get*WithCacheRetry absorbs cross-replica lag at the REST
layer while preserving read-your-writes for subsequent cached Lists.
A store-level fallback short-circuits that convergence wait: trying it
across all stores made the CI snapshot-pagination test regress (the
create flow's read-back resolved uncached while the cached List still
under-reported). NewWithAPIReader's doc now pins this boundary.
Regression tests (L1, fail-on-bug proven — both FAIL on the pre-fix
tree): the stale-RD embedded-VD mode, the RD-cache-miss mode, and
nil-reader NotFound preservation.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
acf3160 to
29e71d5
Compare
What
Volume-definitions are embedded in the RD CRD spec, so after the v0.1.15 RD-404 fix the same committed RD answered differently by path on a multi-replica apiserver:
GET /v1/resource-definitions/{rd}resolved through the RD store's uncached fallback, whileGET .../volume-definitions/{vn}read only the lagging informer cache and 404'd — observed live on the cozystack e2e stand (GET .../volume-definitions/0 -> 404with the REST cache-retry budget exhausted).volumeDefinitions.Getnow does one live re-read via the existingfetchRDLiveon a cache miss OR a stale cached RD revision, mirroringresourceDefinitions.Get.Why the scope is VD-only (down from the first revision)
The first revision extended the uncached fallback to all stores (resources/nodes/resource-groups/storage-pools/snapshots). CI caught a real regression: the snapshot-pagination test failed (
page2: got 1 entries, want 2) because the store-level fallback short-circuits the REST layer'sget*WithCacheRetryconvergence wait — the create flow's read-back resolved uncached while the cached List still under-reported. The architecture is deliberately two-tier:get*WithCacheRetry, which POLLS the cached read so subsequent cached Lists stay coherent.The VD case is the exception because it is a consistency alignment with the already-released RD behaviour (same underlying CRD object), and its 404 was observed live with the REST retry budget exhausted.
NewWithAPIReader's doc comment now pins this boundary for future readers.Testing
L1 regression tests (fail-on-bug proven: both FAIL on the pre-fix tree, PASS with the fix): the stale-RD embedded-VD mode, the RD-cache-miss mode, and nil-reader NotFound preservation. Full
pkg/store/k8sgreen, lint 0 issues; the two integration tests that failed on the first revision's CI run (TestGroupJ/CSIListSnapshotsPagination,TestGroupKWFSpawnAndDependentReAutoplace) pass 3/3 locally with this revision.Note on tiers: store-layer robustness change, not a CLI-verb change — the cache-lag race cannot be reproduced deterministically on the single-replica L6/L7 stand, so L1 internal tests are the appropriate tier (same as the v0.1.15 RD-404 fix).
Summary by CodeRabbit
Bug Fixes
Tests
Documentation