cosmos: add probe-gated endpoint failback integration test#4628
cosmos: add probe-gated endpoint failback integration test#4628NaluTripician wants to merge 2 commits into
Conversation
Adds an in-memory-emulator integration test exercising the real account-level connectivity-probe-gated failback path end to end (PR Azure#4604 / issue Azure#4622): a marked regional endpoint is failed back only after a real connectivity probe succeeds, and a failed probe resets the cooldown so it is not immediately re-probed. The existing unit tests only cover this state machine with injected fake probe closures; this test drives the driver's real probe closure against the emulator (connection-blocked vs. healthy via region-scoped fault injection). Failback is driven deterministically through new doc-hidden, internal-feature-gated test hooks on CosmosDriver (run_endpoint_probe_once_for_testing, mark_region_endpoint_unavailable_for_testing, is_endpoint_host_marked_unavailable_for_testing) plus a short endpoint_unavailability_ttl, so it never waits on the 60s probe loop. No production behavior changes; all hooks compile out unless the test / __internal_in_memory_emulator feature is enabled. Fixes Azure#4622 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds an in-memory-emulator integration test that exercises the real connectivity-probe-gated account endpoint failback path end to end, closing the follow-up issue #4622 from the probe-gating work in #4604. Previously, the probe-gated failback state machine was only covered by unit tests using injected fake probe closures; this test drives the driver's actual probe closure against the emulator under simulated regional outage and recovery.
To make the test deterministic (instead of waiting on the 60-second background probe loop), the PR adds three doc-hidden, internal-feature-gated *_for_testing hooks on CosmosDriver and a supporting hook on LocationStateStore. The only non-test production change is a behavior-identical restructuring of the probe-closure construction so a clone can be retained for the test hook, wrapped in a small Debug-implementing newtype because CosmosDriver derives Debug.
Changes:
- New
endpoint_probe_failback.rsintegration test covering the three behaviors from #4622 (stays out of rotation while blocked, fails back only after a successful probe, cooldown reset on a failed probe). - New internal
*_for_testinghooks (run_endpoint_probe_once_for_testing,mark_region_endpoint_unavailable_for_testing,is_endpoint_host_marked_unavailable_for_testing) following the existing convention. - Behavior-identical refactor of the endpoint probe-closure construction in
CosmosDriver::newto retain a clone for tests.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
tests/in_memory_emulator_tests/mod.rs |
Registers the new endpoint_probe_failback module, gated on fault_injection (consistent with sibling fault-injection tests). |
tests/in_memory_emulator_tests/endpoint_probe_failback.rs |
New integration test driving the real probe through outage/recovery phases. |
src/driver/routing/location_state_store.rs |
Adds a pub(crate) test hook to seed an endpoint as unavailable using the live snapshot's endpoint object. |
src/driver/cosmos_driver.rs |
Adds Debug newtype + stored probe-fn clone, three doc-hidden test hooks, and a behavior-identical probe-closure restructuring. |
CHANGELOG.md |
Adds an "Other Changes" entry documenting the test and the real-emulator follow-up note. |
…s-probe-gated-failback-test # Conflicts: # sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md
Summary
Follow-up to #4604 (review thread r3431461745), closing #4622. PR #4604 made Cosmos account-level endpoint failback probe-gated: a marked-unavailable endpoint rejoins the routing rotation only after a background connectivity probe confirms it is reachable (the old time-based auto-clear was removed). That state machine is covered by unit tests using injected fake probe closures, but nothing exercised the real probe path end to end.
This PR adds an in-memory-emulator integration test that drives the driver's real connectivity-probe closure against the emulator and asserts all three behaviors the issue calls out:
How
tests/in_memory_emulator_tests/endpoint_probe_failback.rsbuilds a two-region in-memory emulator, blocks one region with a region-scopedConnectionErrorfault (which also blocks the probe'sGET /probe), and toggles the fault to simulate outage and recovery.CosmosDriver:run_endpoint_probe_once_for_testing,mark_region_endpoint_unavailable_for_testing, andis_endpoint_host_marked_unavailable_for_testing— matching the existing*_for_testingconvention (gated onany(test, feature = "__internal_in_memory_emulator")). A shortendpoint_unavailability_ttlmakes the endpoint probe-eligible quickly.regional_gateway_unreachable.rs, so this test isolates the new failback behavior and exercises the real probe.Notes
test/__internal_in_memory_emulatorfeature is enabled; the only non-test change is a behavior-identical restructuring of the probe-closure construction so a clone can be retained for the hook.cargo fmt,cargo clippy(default and--all-features), andcargo test --all-featuresforazure_data_cosmos_driverall pass.Fixes #4622