Skip to content

Rubric Gap Analysis — partition-synthetic-keys enhancement: Add non-reconstructable synthetic key anti-pattern #53

@jaydestro

Description

@jaydestro

#53 — Rubric Gap Analysis — partition-synthetic-keys enhancement: Add non-reconstructable synthetic key anti-pattern

Field Value
Type Rule Enhancement
Target Rule partition-synthetic-keys
Severity MEDIUM
Source SCOPE Rubric Criteria — Partition Key Design, Criterion 5, check 1
Labels enhancement, SCOPE, agent-kit, rule:partition

Summary

The partition-synthetic-keys rule demonstrates correct synthetic key construction using deterministic, caller-known components (DeviceId + Timestamp, TenantId : UserId, CustomerId + OrderDate). However, it does not warn against the anti-pattern of constructing synthetic keys from non-deterministic or caller-unknown components (random suffixes, server-side timestamps at millisecond granularity, internal sequence numbers). Non-reconstructable keys make point reads impossible and force cross-partition scans for all lookups — defeating the purpose of the synthetic key.

Rubric Gap Analysis

During rubric criteria review, this was identified as a missing anti-pattern. All existing examples in the rule are correct (deterministic keys), but the rule does not explicitly state the principle or show what goes wrong when it's violated. Agents may create synthetic keys with non-deterministic components when trying to maximize write distribution, inadvertently making reads impossible without a secondary index.

Evidence

Existing Rule Coverage

All three examples use reconstructable components:

  • $"{DeviceId}_{Timestamp:yyyy-MM}" — caller knows device ID and month
  • $"{TenantId}:{UserId}" — caller knows both IDs
  • $"{CustomerId}_{OrderDate:yyyy}" — caller knows customer and year

No example shows the failure mode of non-reconstructable keys.

Missing Anti-Pattern

// ❌ Anti-pattern: random component in synthetic key
public class Event
{
    public string Id { get; set; }
    public string DeviceId { get; set; }
    public DateTime Timestamp { get; set; }
    
    // Random shard suffix — caller cannot reconstruct at read time
    public string PartitionKey => $"{DeviceId}_{Guid.NewGuid().ToString()[..4]}";
}

// Write succeeds — great distribution
// But how do you READ this document?
// You don't know which random suffix was used!
// Must query across all possible suffixes: cross-partition scan

var query = "SELECT * FROM c WHERE c.deviceId = @deviceId";
// Fan-out to ALL partitions — the synthetic key backfired
// ❌ Anti-pattern: server-side millisecond timestamp in key
public class Telemetry
{
    public string Id { get; set; }
    public string DeviceId { get; set; }
    
    // Exact timestamp not known to callers — only "around that time"
    public string PartitionKey => $"{DeviceId}_{DateTime.UtcNow:yyyyMMddHHmmssffff}";
}

// Callers know the device and approximate time, but not the exact millisecond
// Cannot construct the exact partition key for a point read

Correct Principle

// ✅ Synthetic key components must be known to callers at read time
public class Telemetry
{
    public string Id { get; set; }
    public string DeviceId { get; set; }
    public DateTime Timestamp { get; set; }
    
    // Month granularity — caller always knows which month to target
    public string PartitionKey => $"{DeviceId}_{Timestamp:yyyy-MM}";
}

// Read path: "Get device X data for January 2026"
// Caller can reconstruct: "device123_2026-01" → single partition read

Recommended Enhancement

Add to partition-synthetic-keys after the existing examples:

Every component of a synthetic key must be reconstructable by the caller at read time. If the caller cannot determine the exact partition key value when reading, the synthetic key forces cross-partition scans for all lookups.

Never include in a synthetic key:

  • Random values (GUIDs, random shard suffixes)
  • Server-side timestamps at granularity finer than the caller can target
  • Internal sequence numbers not exposed to the API consumer

Test: For every query that reads from this container, can the caller compute the exact partition key value from the query parameters alone? If not, the synthetic key is not reconstructable.

References

Metadata

Metadata

Assignees

Labels

SCOPEIssues generated by SCOPE toolagent-kitIssues requiring updates to cosmosdb-best-practices Agent Kit rulesenhancementNew feature or requestrule:partitionPartition key rules (partition-*)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions