Skip to content

query-avoid-cross-partition: Python SDK partition_key parameter example #131

@jaydestro

Description

@jaydestro

Rubric Gap Analysis — query-avoid-cross-partition enhancement: Python SDK partition_key parameter example

Field Value
Type Rule Enhancement
Target Rule query-avoid-cross-partition
Severity HIGH
Source SCOPE Python Serverless RAG V-B — Query Optimization Criterion #1 (Cross-partition query minimization), Phase 5
Labels enhancement, SCOPE, agent-kit, rule:query

Summary

The query-avoid-cross-partition rule has C# and Java/Spring Data examples but no Python examples. In SCOPE V-B, AK-equipped agents understand the concept of partition-scoped queries (they write correct WHERE clauses filtering on the PK and document "no cross-partition fan-out" in READMEs) but use enable_cross_partition_query=True instead of the Python SDK's partition_key= parameter on query_items(). This causes cross-partition fan-out despite the agent's stated intent to avoid it. Adding a Python section with the correct partition_key= parameter pattern would close this gap.

Rubric Gap Analysis

Criterion: Query Optimization #1 — Cross-partition query minimization (Critical, 2x weight)
Hit rate: 3/13 V-B runs (23%) use enable_cross_partition_query=True where partition_key= should be used. This is a regression from V-A's 2/39 (5%).
Affected profiles: P02 (Claude Code CLI + Opus 4.6 + AK, 1/5 runs) and P03 (VSCode + Opus 4.6 + AK, 1/3 runs) — 2 AK profiles, meeting the multi-profile filing threshold.
Control (P01): 0/5 runs have this issue — the control profile naturally avoids cross-partition by passing partition_key= correctly in 4/5 runs.

Evidence

Existing Rule Coverage

The rule currently provides C# and Java/Spring Data examples only:

// Correct (single-partition query):
var iterator = container.GetItemQueryIterator<Order>(
    query,
    requestOptions: new QueryRequestOptions
    {
        PartitionKey = new PartitionKey(customerId)  // Single partition!
    });

No Python section exists in the rule.

Missing Anti-Pattern / Finding

Pattern 1 — enable_cross_partition_query=True with PK in WHERE (P02R01):

The agent designs PK as /corpus, writes a WHERE clause filtering on c.corpus = @corpus, documents "no cross-partition fan-out" in the README — then uses enable_cross_partition_query=True anyway:

# ❌ Anti-pattern: WHERE clause has PK filter but SDK call forces cross-partition
results = list(container.query_items(
    query="SELECT TOP @topK ... FROM c WHERE c.corpus = @corpus ORDER BY VectorDistance(...)",
    parameters=parameters,
    enable_cross_partition_query=True  # Forces fan-out despite PK filter in WHERE
))

Pattern 2 — HPK configured but not used in query_items (P03R03):

The agent designs HPK /corpusId, /documentId, correctly uses partition_key=[corpus_id, document_id] for read_item(), but uses enable_cross_partition_query=True for all queries:

# ✅ Agent uses partition_key correctly for point reads
item = self._container.read_item(item=chunk_id, partition_key=[corpus_id, document_id])

# ❌ But uses cross-partition for queries despite having HPK
items = list(self._container.query_items(
    query=query, parameters=parameters,
    enable_cross_partition_query=True  # Should use partition_key=[corpus_id]
))

Correct Pattern

# ✅ Correct: Pass partition_key parameter to scope query to single partition
items = list(container.query_items(
    query="SELECT TOP @topK ... FROM c WHERE c.tenantId = @tenantId ORDER BY VectorDistance(...)",
    parameters=[
        {"name": "@topK", "value": top_k},
        {"name": "@tenantId", "value": tenant_id},
    ],
    partition_key=tenant_id,  # Routes to single partition — no fan-out
))
# ✅ Correct: HPK partial prefix to scope to Level 1
items = list(container.query_items(
    query="SELECT TOP @topK ... FROM c WHERE c.corpusId = @corpusId ORDER BY VectorDistance(...)",
    parameters=[
        {"name": "@topK", "value": top_k},
        {"name": "@corpusId", "value": corpus_id},
    ],
    partition_key=[corpus_id],  # Partial HPK prefix — scopes to corpusId sub-tree
))
# ⚠️ When cross-partition is truly unavoidable (e.g., global search across all tenants)
items = list(container.query_items(
    query="SELECT TOP @topK ... ORDER BY VectorDistance(...)",
    parameters=[{"name": "@topK", "value": top_k}],
    enable_cross_partition_query=True,
    max_item_count=100,  # Limit per-partition fan-out
))

Rule Contradiction Scan

Rule Interaction
query-point-reads Complementary — read_item() uses partition_key= correctly; this rule should show same pattern for query_items()
partition-hierarchical Complementary — HPK partial prefix usage in partition_key=[level1] should be cross-referenced
partition-query-patterns Aligned — PK design for query alignment; this rule covers the SDK call-site pattern

No contradictions found.

Documentation Cross-reference

Recommended cosmosdb-agent-kit Fix

Add a Python section to query-avoid-cross-partition.md after the existing Java/Spring Data section:

### Python SDK — pass `partition_key` parameter to `query_items()`

The Python SDK's `container.query_items()` accepts a `partition_key` parameter that routes the query to a single logical partition. Without it, the SDK defaults to cross-partition fan-out even if the WHERE clause filters on the partition key — the WHERE clause alone does NOT route the query.

**Incorrect — WHERE has PK filter but SDK fans out:**

\`\`\`python
# ❌ enable_cross_partition_query=True forces fan-out
items = list(container.query_items(
    query="SELECT ... FROM c WHERE c.tenantId = @tenantId ...",
    parameters=[{"name": "@tenantId", "value": tenant_id}],
    enable_cross_partition_query=True,  # Fans out to ALL partitions
))
\`\`\`

**Correct — explicit `partition_key` parameter:**

\`\`\`python
# ✅ partition_key routes to single partition
items = list(container.query_items(
    query="SELECT ... FROM c WHERE c.tenantId = @tenantId ...",
    parameters=[{"name": "@tenantId", "value": tenant_id}],
    partition_key=tenant_id,
))
\`\`\`

For hierarchical partition keys, use a list for partial prefix scoping:

\`\`\`python
# ✅ HPK Level 1 prefix — scopes to corpusId sub-tree
items = list(container.query_items(
    query="SELECT ... FROM c WHERE c.corpusId = @corpusId ...",
    parameters=[{"name": "@corpusId", "value": corpus_id}],
    partition_key=[corpus_id],  # Partial prefix
))
\`\`\`

Metadata

Metadata

Assignees

Labels

SCOPEIssues generated by SCOPE toolagent-kitIssues requiring updates to cosmosdb-best-practices Agent Kit rulesenhancementNew feature or requestrule:queryCosmos DB query rule enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions