Commit 9fcac35
fix(querier): wait for store-gateway ACTIVE in querier ring view in store-gateway limits integration tests
TestQuerierWithStoreGatewayDataBytesLimits intermittently fails with
HTTP 500 instead of the expected 422 (#7606, arm64 CI). The decoded
(gzipped) 500 response body from the failing run is the querier-local
ring error:
expanding series: failed to get store-gateway replication set owning
the block <ULID>: at least 1 healthy replica required, could only
find 0 - unhealthy instances: 172.18.0.8:9095
i.e. the ring lookup failed before any store-gateway RPC was made. The
store-gateway registers in the ring as JOINING (already owning tokens)
and switches to ACTIVE only after its initial blocks sync, while the
querier's BlocksRead ring operation only selects ACTIVE instances and
its consul watch is rate-limited (1 rps by default). So the existing
waits (ring tokens registered, blocks loaded on the store-gateway) can
all pass while the querier's view of the store-gateway ring still says
JOINING, and the first query 500s.
The hypothesis originally filed on the issue - that the bytes-limit
error loses its 422/ResourceExhausted coding in the vendored Thanos
refetch ("series size exceeded expected size; refetching") path - was
falsified during investigation: those log lines belong to an earlier,
passing test in the same CI job; the failing query never reached
store-gateway limiter code at all; and all 10 vendored limiter
consumption sites (including the refetch recursion) re-code the error
as ResourceExhausted, which the querier maps to a 422 LimitError
(#5286).
Fix the race in the tests by waiting until the querier sees the
store-gateway ACTIVE in its store-gateway ring view before querying
(same idiom as backward_compatibility_test.go, #5975). Apply the same
wait to the sibling TestQuerierWithBlocksStorageLimits, which has the
identical vulnerable shape (every query expected to hit a 422 limit
against a freshly started store-gateway). Same root cause as #7605,
which is fixed separately for
TestQuerierWithBlocksStorageOnMissingBlocksFromStorage in a
non-overlapping PR.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>1 parent 74185ef commit 9fcac35
2 files changed
Lines changed: 17 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| 60 | + | |
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
474 | 474 | | |
475 | 475 | | |
476 | 476 | | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
477 | 485 | | |
478 | 486 | | |
479 | 487 | | |
| |||
571 | 579 | | |
572 | 580 | | |
573 | 581 | | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
574 | 590 | | |
575 | 591 | | |
576 | 592 | | |
| |||
0 commit comments