Skip to content

Rubric Gap Analysis — New rule needed: query-point-reads — No guidance on ReadItem over queries or ReadMany over OR/IN #48

@jaydestro

Description

@jaydestro

#48 — Rubric Gap Analysis — New rule needed: query-point-reads — No guidance on ReadItem over queries or ReadMany over OR/IN

Field Value
Type New Rule
Proposed Rule query-point-reads (or fold into query-avoid-cross-partition enhancement)
Category query-*
Severity HIGH
Source SCOPE Rubric Criteria — Query Optimization, Criterion 1 (Cross-Partition Avoidance), checks 2 and 3
Labels enhancement, SCOPE, agent-kit, rule:query

Summary

No existing rule guides agents to use point reads (ReadItem / readItem / get_item) instead of queries when both the document id and partition key are known, or to use ReadMany instead of OR/IN clauses spanning multiple partition key values. These are distinct optimizations from partition key inclusion in WHERE clauses (which query-avoid-cross-partition already covers). Point reads bypass the query engine entirely, costing 1 RU for a 1 KB document versus 2.5+ RU for an equivalent SELECT * WHERE c.id = @id query.

Scoping Note

This could be implemented as either:

  • A standalone rule query-point-reads — cleaner separation of concerns
  • An enhancement to query-avoid-cross-partition — since all three patterns (partition key in WHERE, ReadItem over queries, ReadMany over OR) serve the same goal of minimizing unnecessary fan-out and RU waste

The existing query-avoid-cross-partition rule covers: partition key in WHERE clauses, Spring Data @Query routing, parallel cross-partition options. It does not cover ReadItem or ReadMany. There is no overlap or contradiction — these are complementary optimizations.

Evidence

Incorrect Pattern — Query when point read suffices

// ❌ Uses query engine when both id and partition key are known
public async Task<Order> GetOrder(string orderId, string customerId)
{
    var query = new QueryDefinition("SELECT * FROM c WHERE c.id = @id")
        .WithParameter("@id", orderId);
    
    var iterator = container.GetItemQueryIterator<Order>(query,
        requestOptions: new QueryRequestOptions
        {
            PartitionKey = new PartitionKey(customerId)
        });
    
    var response = await iterator.ReadNextAsync();
    return response.FirstOrDefault();
    // Cost: ~2.5 RU (query engine overhead) for a 1 KB document
}
# ❌ Query instead of point read
def get_player(self, player_id: str, game_id: str):
    query = "SELECT * FROM c WHERE c.id = @id"
    items = list(self.container.query_items(
        query=query,
        parameters=[{"name": "@id", "value": player_id}],
        partition_key=game_id
    ))
    return items[0] if items else None
    # Unnecessary query engine invocation

Correct Pattern — Point read

// ✅ Point read — bypasses query engine entirely
public async Task<Order> GetOrder(string orderId, string customerId)
{
    var response = await container.ReadItemAsync<Order>(
        orderId,
        new PartitionKey(customerId));
    return response.Resource;
    // Cost: 1 RU for a 1 KB document (fixed, no query engine)
}
# ✅ Point read
def get_player(self, player_id: str, game_id: str):
    return self.container.read_item(item=player_id, partition_key=game_id)
    # 1 RU — no query engine overhead

Incorrect Pattern — OR/IN across partition keys

// ❌ OR clause spanning multiple partition keys — cross-partition fan-out
var query = new QueryDefinition(
    "SELECT * FROM c WHERE c.id IN (@id1, @id2, @id3)")
    .WithParameter("@id1", "order-1")
    .WithParameter("@id2", "order-2")
    .WithParameter("@id3", "order-3");
// Fans out to ALL partitions to find 3 documents

Correct Pattern — ReadMany

// ✅ ReadMany — targeted reads, no fan-out
var items = new List<(string id, PartitionKey partitionKey)>
{
    ("order-1", new PartitionKey("customer-a")),
    ("order-2", new PartitionKey("customer-b")),
    ("order-3", new PartitionKey("customer-a"))
};

var response = await container.ReadManyItemsAsync<Order>(items);
// Targets only the relevant partitions — consistent cost

Impact

  • RU waste: Query engine overhead adds 1.5-3x RU cost per single-document lookup vs. point read
  • Latency: Point reads skip query parsing, optimization, and execution pipeline
  • Fan-out: OR/IN across partition keys scales with total partition count, not document count
  • Prevalence: Extremely common — agents default to SELECT * WHERE id = @id because it "works"

Recommended New Rule

Create query-point-reads.md:

Use point reads (ReadItem) when both document id and partition key are known. Use ReadMany for multiple known documents across partitions.

A point read costs 1 RU for a 1 KB document and bypasses the query engine entirely. A SELECT * FROM c WHERE c.id = @id query against the same document costs ~2.5 RU — the query engine parses, optimizes, and executes even though the result is a single document.

When fetching multiple documents by known ids across different partition keys, use ReadMany instead of OR/IN clauses. ReadMany targets only the relevant partitions; OR/IN fans out to all partitions.

Use queries only when: you need filtering, sorting, projection, or aggregation that point reads cannot provide.

References

Metadata

Metadata

Labels

SCOPEIssues generated by SCOPE toolagent-kitIssues requiring updates to cosmosdb-best-practices Agent Kit rulesenhancementNew feature or requestrule:queryCosmos DB query rule enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions