feat(cosmos): enforce maximum fan-out limit in query pipeline#4615
feat(cosmos): enforce maximum fan-out limit in query pipeline#4615Copilot wants to merge 7 commits into
Conversation
Adds a configurable maximum fan-out limit for cross-partition queries. When a query would fan out to more physical partitions than the limit, plan_operation returns a CLIENT_QUERY_FAN_OUT_LIMIT_EXCEEDED error (HTTP 400 / sub-status 20307). The default cap is 100 and can be raised via FeedOptions::with_max_fan_out / QueryOptions::with_max_fan_out. Fixes #4453 Co-authored-by: analogrelay <7574+analogrelay@users.noreply.github.com>
- Rename effective_max_fan_out to fan_out_limit in planner.rs - Fix incorrect doc comment reference in feed.rs Co-authored-by: analogrelay <7574+analogrelay@users.noreply.github.com>
Group continuation token and max_fan_out into a PlanOptions struct so plan_operation can accept Option<PlanOptions> as its final parameter. This avoids signature churn when adding new planning options in future. - Add PlanOptions struct (driver/plan_options.rs), exported from the driver crate - Change CosmosDriver::plan_operation to accept Option<PlanOptions> with unwrap_or_default() at the start - Update all SDK callers (container_client, cosmos_client, database_client) to construct PlanOptions from FeedOptions - Fix private intra-doc link on plan_operation (was referencing the private planner::DEFAULT_MAX_FAN_OUT constant, breaking cargo doc) - Update CHANGELOG entries to reference PR #4615 (not issue #4453) Co-authored-by: analogrelay <7574+analogrelay@users.noreply.github.com>
Co-authored-by: Ashley Stanton-Nurse <github@analogrelay.net>
There was a problem hiding this comment.
Pull request overview
This PR adds a configurable maximum fan-out cap for cross-partition queries so that queries targeting “too many” physical partitions fail fast (HTTP 400 / sub-status 20307) instead of silently triggering expensive scatter-gather execution. It wires the cap from the public azure_data_cosmos surface down into the azure_data_cosmos_driver planner, and introduces a new sub-status/status code to represent this client-side policy violation.
Changes:
- Added
FeedOptions::max_fan_out+with_max_fan_outand aQueryOptions::with_max_fan_outshortcut to configure the cap from the public SDK. - Introduced
azure_data_cosmos_driver::PlanOptionsand updatedCosmosDriver::plan_operationto acceptOption<PlanOptions>, usingPlanOptions::max_fan_outto enforce the cap during planning. - Added new error codes (
SubStatusCode/CosmosStatus20307) plus unit/integration test updates for the new planner parameter.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure_data_cosmos/src/options/feed.rs | Adds max_fan_out to feed/query options and builder conveniences. |
| sdk/cosmos/azure_data_cosmos/src/clients/container_client.rs | Propagates public max_fan_out + continuation into driver planning via PlanOptions. |
| sdk/cosmos/azure_data_cosmos/CHANGELOG.md | Documents the new public query fan-out cap option. |
| sdk/cosmos/azure_data_cosmos_driver/src/lib.rs | Re-exports PlanOptions at the driver crate root. |
| sdk/cosmos/azure_data_cosmos_driver/src/error/cosmos_status.rs | Adds new 20307 sub-status + 400/20307 CosmosStatus constant and name mapping. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/plan_options.rs | Introduces the new PlanOptions struct for plan_operation. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/mod.rs | Wires plan_options module and re-exports PlanOptions. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/dataflow/planner.rs | Enforces default/custom fan-out limit and adds planner tests. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/dataflow/integration_tests/query_resume.rs | Updates integration tests for the new build_sequential_drain signature. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs | Updates plan_operation to use PlanOptions and pass max_fan_out into the planner. |
| sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md | Documents new planning options + new status codes (needs categorization tweak). |
Copilot's findings
- Files reviewed: 11/11 changed files
- Comments generated: 7
| let fan_out_limit = max_fan_out.unwrap_or(DEFAULT_MAX_FAN_OUT); | ||
| if request_nodes.len() > fan_out_limit { | ||
| return Err(crate::error::CosmosError::builder() | ||
| .with_status(crate::error::CosmosStatus::CLIENT_QUERY_FAN_OUT_LIMIT_EXCEEDED) | ||
| .with_message(format!( | ||
| "cross-partition query would fan out to {} physical partitions, \ | ||
| which exceeds the maximum of {}; use \ | ||
| QueryOptions::with_max_fan_out() to raise the limit if this \ | ||
| level of fan-out is intentional", | ||
| request_nodes.len(), | ||
| fan_out_limit, | ||
| )) | ||
| .build()); | ||
| } |
There was a problem hiding this comment.
Maybe, but continuation tokens aren't authenticated or secured, so my concern would be this could open someone up to crafting a token with huge fan-out. I think we should leave this as-is. If there is a continuation token and resuming it with the current topology would put it over the max fan-out limit, I think we should fail.
| /// Users who truly need more can increase the limit via | ||
| /// [`FeedOptions::max_fan_out`](crate::options::FeedOptions::max_fan_out) / | ||
| /// `QueryOptions::with_max_fan_out`. |
There was a problem hiding this comment.
@copilot Just remove the reference to the max_fan_out option from this doc comment. This is the constant, it's crate-internal.
| /// The query would require contacting more physical partitions than | ||
| /// the configured maximum. Raise the limit via | ||
| /// `FeedOptions::max_fan_out` / `QueryOptions::with_max_fan_out` if | ||
| /// this level of fan-out is intentional. |
| /// Group both continuation-token resumption and the fan-out cap into a single | ||
| /// struct so the signature of `plan_operation` can grow without forcing a | ||
| /// change at every call site. Callers that need no special behavior can pass | ||
| /// `None`; `plan_operation` treats `None` as `PlanOptions::default()`. | ||
| #[derive(Default)] |
There was a problem hiding this comment.
@copilot Derive Clone and Debug as recommended, but don't make the suggested doc comment updates, they're too verbose.
There was a problem hiding this comment.
Done in commit 9c4e9f8 — derived Clone and Debug on PlanOptions.
|
|
||
| ### Features Added | ||
|
|
||
| - Added `PlanOptions` struct to `CosmosDriver::plan_operation`, grouping continuation-token and fan-out cap into a single parameter, making the API extensible without future call-site churn. ([#4615](https://github.com/Azure/azure-sdk-for-rust/pull/4615)) |
There was a problem hiding this comment.
@copilot Agreed here, put the change to the plan_operation signature in the driver CHANGELOG's breaking changes section. It's NOT breaking to SDK users though.
There was a problem hiding this comment.
Done in commit 9c4e9f8 — moved the plan_operation signature change to the Breaking Changes section.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- Derive Clone and Debug on PlanOptions - Move plan_operation signature change to breaking changes section - Remove FeedOptions reference from internal constant doc Co-authored-by: analogrelay <7574+analogrelay@users.noreply.github.com>
Cross-partition queries targeting a large number of physical partitions are a common source of performance problems. This adds a configurable cap: if a query would fan out to more physical partitions than the limit,
plan_operationreturns an error (HTTP 400 / sub-status 20307) rather than silently executing an expensive scatter-gather. The default limit is 100.New error code
SubStatusCode::CLIENT_QUERY_FAN_OUT_LIMIT_EXCEEDED(20307) andCosmosStatus::CLIENT_QUERY_FAN_OUT_LIMIT_EXCEEDED(HTTP 400 / 20307)Driver API (
azure_data_cosmos_driver)PlanOptionsstruct groups the continuation token and fan-out cap into a single parameter;CosmosDriver::plan_operationnow acceptsOption<PlanOptions>as its final argument (Noneapplies all defaults). This follows the same extensible pattern used elsewhere in the SDK, avoiding call-site churn when new planning options are added.QueryOptions::with_max_fan_out()so users know how to raise the capPublic API (
azure_data_cosmos)FeedOptions::max_fan_out: Option<usize>+FeedOptions::with_max_fan_out(n)builder methodQueryOptions::with_max_fan_out(n)convenience shortcutThe limit only applies to fresh query plans; resumed continuation tokens are not re-checked.