#49 — Rubric Gap Analysis — New rule needed: query-olap-detection — No warning against analytical queries on transactional containers
| Field |
Value |
| Type |
New Rule |
| Proposed Rule |
query-olap-detection |
| Category |
query-* |
| Severity |
HIGH |
| Source |
SCOPE Rubric Criteria — Query Optimization, Criterion 10 (OLAP Detection) |
| Labels |
enhancement, SCOPE, agent-kit, rule:query |
Summary
No existing rule warns against running large aggregations (COUNT, SUM, AVG over unbounded data), GROUP BY across all partitions, or full-container scans for reporting purposes against transactional Cosmos DB containers. The SQL-like query syntax makes Cosmos DB appear suitable for analytical workloads, but these queries consume massive RU, trigger sustained 429 throttling that impacts transactional traffic, and can time out against the 5-second execution limit.
Rubric Gap Analysis
During rubric criteria review for query optimization, this was identified as a platform constraint with no corresponding rule. The existing monitoring-ru-consumption rule covers tracking RU costs but does not warn about the fundamental mismatch between analytical query patterns and transactional container design. No query-* rule mentions Synapse Link, analytical store, or the HTAP separation pattern.
Evidence
Incorrect Pattern — Analytical queries on transactional container
// ❌ Unbounded aggregation across all partitions
var query = "SELECT c.region, COUNT(1) as orderCount, SUM(c.total) as revenue " +
"FROM c WHERE c.orderDate >= '2025-01-01' GROUP BY c.region";
var iterator = container.GetItemQueryIterator<dynamic>(query);
// Fans out to ALL partitions, reads ALL matching documents
// At 10M orders: potentially 50,000+ RU per execution
// Blocks transactional traffic with sustained high RU consumption
# ❌ Dashboard refreshing aggregations against transactional store
def get_dashboard_metrics(self):
queries = [
"SELECT VALUE COUNT(1) FROM c", # Full scan
"SELECT c.status, COUNT(1) FROM c GROUP BY c.status", # Unbounded GROUP BY
"SELECT VALUE AVG(c.responseTime) FROM c WHERE c.type = 'request'" # Cross-partition AVG
]
# Each query scans the entire container
# Running these every 30 seconds for a dashboard = sustained throttling
// ❌ Reporting query running against operational container
@Query("SELECT c.category, SUM(c.quantity) as totalSold, AVG(c.price) as avgPrice " +
"FROM c WHERE c.type = 'sale' GROUP BY c.category")
List<CategoryReport> getCategorySalesReport();
// Full cross-partition scan + aggregation — hundreds of thousands of RU
// Competes with real-time order processing for the same throughput budget
Correct Pattern — Synapse Link / analytical store / materialized views
// ✅ Enable analytical store on the container
var containerProperties = new ContainerProperties
{
Id = "orders",
PartitionKeyPath = "/customerId",
AnalyticalStoreTimeToLiveInSeconds = -1 // Enable analytical store
};
// ✅ Run aggregations via Synapse Link (no RU consumed on transactional store)
// In Synapse SQL or Spark:
// SELECT region, COUNT(*) as orderCount, SUM(total) as revenue
// FROM cosmos_db.orders WHERE orderDate >= '2025-01-01' GROUP BY region
// ✅ Pre-compute via Change Feed materialized views
// Real-time aggregation maintained by Change Feed processor
public class SalesAggregate
{
public string Id { get; set; } // "category-electronics"
public string PartitionKey { get; set; } // "aggregates"
public string Category { get; set; }
public long TotalSold { get; set; }
public decimal AveragePrice { get; set; }
public DateTime LastUpdated { get; set; }
}
// Dashboard reads pre-computed aggregates: 1 RU per read
// Instead of recalculating from millions of source documents
// ✅ Single-partition aggregation is acceptable when scoped and bounded
var query = new QueryDefinition(
"SELECT VALUE COUNT(1) FROM c WHERE c.customerId = @cid AND c.status = 'pending'")
.WithParameter("@cid", customerId);
var iterator = container.GetItemQueryIterator<int>(query,
requestOptions: new QueryRequestOptions
{
PartitionKey = new PartitionKey(customerId) // Scoped to ONE partition
});
// Single-partition, bounded data — acceptable cost
Impact
- Throughput starvation: Analytical queries consume the entire RU budget, throttling transactional operations (point reads, writes) with 429 responses
- Cascading failures: Sustained throttling from reporting queries propagates to all application operations sharing the container
- Timeout failures: Large aggregations exceed the 5-second query execution timeout, producing partial or failed results
- Cost amplification: Running these queries on a schedule (dashboard refresh) multiplies the damage
Recommended New Rule
Create query-olap-detection.md:
Do not run large aggregations, unbounded GROUP BY, or full-container scans against transactional Cosmos DB containers.
Cosmos DB's transactional store is optimized for OLTP: point reads, targeted queries within a partition, and bounded result sets. Analytical patterns — COUNT/SUM/AVG across all partitions, GROUP BY over unbounded data, or full-container scans — consume massive RU, trigger 429 throttling that impacts transactional operations, and can time out.
For analytical workloads, use:
- Azure Synapse Link + Analytical Store: Zero-ETL, no RU impact on transactional store
- Change Feed materialized views: Pre-compute aggregates incrementally
- Dedicated reporting containers: Populated via Change Feed with query-optimized partition keys
Single-partition aggregations scoped to a known partition key with bounded data are acceptable — the concern is unbounded cross-partition scans.
References
#49 — Rubric Gap Analysis — New rule needed:
query-olap-detection— No warning against analytical queries on transactional containersquery-olap-detectionquery-*enhancement,SCOPE,agent-kit,rule:querySummary
No existing rule warns against running large aggregations (COUNT, SUM, AVG over unbounded data), GROUP BY across all partitions, or full-container scans for reporting purposes against transactional Cosmos DB containers. The SQL-like query syntax makes Cosmos DB appear suitable for analytical workloads, but these queries consume massive RU, trigger sustained 429 throttling that impacts transactional traffic, and can time out against the 5-second execution limit.
Rubric Gap Analysis
During rubric criteria review for query optimization, this was identified as a platform constraint with no corresponding rule. The existing
monitoring-ru-consumptionrule covers tracking RU costs but does not warn about the fundamental mismatch between analytical query patterns and transactional container design. Noquery-*rule mentions Synapse Link, analytical store, or the HTAP separation pattern.Evidence
Incorrect Pattern — Analytical queries on transactional container
Correct Pattern — Synapse Link / analytical store / materialized views
Impact
Recommended New Rule
Create
query-olap-detection.md:References
query-optimization.md, Criterion 10 (OLAP Detection)