Skip to content

pattern-pydantic-model-dump-mode-json: Propose new rule for Python + azure-cosmos #114

@jaydestro

Description

@jaydestro

SCOPE-PY-008: Propose New Rule pattern-pydantic-model-dump-mode-json for Python + azure-cosmos

Repository: AzureCosmosDB/cosmosdb-agent-kit
Labels: SCOPE, enhancement, rule:sdk, rule:pattern
Affected Rule: NEW rule (proposed) — category pattern- or sdk-
Severity: CRITICAL


Summary

The Python azure-cosmos SDK serializes request bodies with stdlib json.dumps(data, separators=(",", ":")) and no custom encoder. Pydantic v2's default model_dump() returns Python-native datetime / UUID / Decimal / date / time values, which raise TypeError: Object of type X is not JSON serializable when passed to create_item / upsert_item / replace_item. Every vulnerable run fails on its first write endpoint call (POST score). Across 25 SCOPE V-B Python Gaming Leaderboard L5 runs the anti-pattern rate is 11/25 = 44%, with AK-only profiles at 40% and the Control profile at 80%. No existing cosmosdb-best-practices rule covers this Python-specific pattern.

Scope of this proposal: non-Enum non-JSON-primitive Pydantic fields (datetime, UUID, Decimal, date, time). Enum serialization is already covered by sdk-serialization-enums; mode="json" is compatible with that rule's class X(str, Enum) guidance and does not conflict.

SDK source (from GitHub): azure/cosmos/_synchronized_request.py — function _request_body_from_data:

if isinstance(data, (dict, list, tuple)):
    json_dumped = json.dumps(data, separators=(",", ":"))
    return json_dumped

Observed Behavior

Refined 25-run static scan (pattern model_dump(...) piped to create_item / upsert_item / replace_item, AND a BaseModel with a : datetime-typed field, AND no mode="json" anywhere in the file):

Profile Config Vuln Runs Rate
P1 Control (Sonnet 4.5) 4/5 80%
P2 AK + Azure MCP 4/5 80%
P3 AK only 2/5 40%
P4 MS Learn MCP 0/5 0%
P5 No extensions 1/5 20%
Total 11/25 44%

Runtime-verified failures (Phase 4): P01 R04, P02 R01, P03 R04 all fail with the TypeError traceback at azure/cosmos/_synchronized_request.py:66 on the first score submission. P04 R04 succeeds because the author used mode="json". P04 R02 and P05 R04 succeed only because their timestamp fields are typed str (pre-stringified via .isoformat() in classmethods) — an accidental dodge, not a deliberate defense.

Anti-pattern (versionb/profile03/run04/workspace/workspace/app/repositories/score_repository.py):

class ScoreDocument(BaseModel):
    id: str
    player_id: str = Field(alias="playerId")
    created_at: datetime = Field(alias="createdAt")   # ← datetime field

async def create(self, doc: ScoreDocument) -> dict:
    payload = doc.model_dump(by_alias=True, exclude_none=True)   # ← no mode="json"
    return await self._container.create_item(body=payload)        # 💥 TypeError at runtime

Expected Behavior

When an agent writes Pydantic payloads containing non-JSON-primitive typed fields to azure-cosmos, it should always pass mode="json" to model_dump(), letting Pydantic convert datetime → ISO-8601 str, UUID → hex str, Decimal → str, date/time → ISO str before the SDK sees the dict.

Proposed Fix

Add a new rule file to skills/cosmosdb-best-practices/rules/pattern-pydantic-model-dump-mode-json.md:

---
title: Always pass mode="json" when dumping Pydantic models for Cosmos DB writes
impact: CRITICAL
impactDescription: prevents TypeError on first write when Pydantic models contain datetime, UUID, Decimal, date, or time fields
tags: [sdk, python, serialization, pydantic, datetime, bug-prevention]
---

## Always pass `mode="json"` when dumping Pydantic models for Cosmos DB writes

When serializing a Pydantic v2 model for `azure.cosmos` `create_item`,
`upsert_item`, or `replace_item`, always call `.model_dump(..., mode="json")`.

### Why

The Python `azure-cosmos` SDK serializes request bodies with
`json.dumps(data, separators=(",", ":"))` and no custom encoder
(`azure/cosmos/_synchronized_request.py`, `_request_body_from_data`). Any
`datetime`, `UUID`, `Decimal`, `date`, or `time` value in `data` raises
`TypeError: Object of type X is not JSON serializable` at runtime on the
first write.

Pydantic's default `model_dump()` returns native Python objects. `mode="json"`
converts them to JSON-safe primitives before the SDK sees the dict.

### Incorrect

```python
from datetime import datetime
from pydantic import BaseModel, Field

class ScoreDoc(BaseModel):
    id: str
    submitted_at: datetime = Field(alias="submittedAt")

# ❌ raises TypeError: Object of type datetime is not JSON serializable
await container.create_item(body=doc.model_dump(by_alias=True))

Correct

# ✅ datetime → ISO-8601 string before azure-cosmos sees it
await container.create_item(body=doc.model_dump(by_alias=True, mode="json"))

Applies to

  • datetime, date, time — most common trigger
  • UUID — second most common
  • Decimal — financial/precision fields
  • Any field with a Pydantic custom serializer returning non-primitive objects

For Enum fields, see the related rule sdk-serialization-enums
(inherit from str, int, or use a Pydantic serializer). mode="json"
is compatible with that rule and covers Enum correctly as well.


## Evidence

- **SDK source:** [`_synchronized_request.py` L60-70](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/cosmos/azure-cosmos/azure/cosmos/_synchronized_request.py) — `json.dumps` with no `default=` or `cls=`
- **Phase 4a proof-of-concept:** [`versionb/_scratch/`](../../versionb/_scratch/README.md) — `poc_vuln.py` (reproduces TypeError), `poc_safe.py` (succeeds with `mode="json"`), `run_poc.py` (runner, exits 0)

[vuln] create_item -> ✓ Object of type datetime is not JSON serializable
[safe] create_item -> ✓ created id=poc-safe-a6858ddf
PoC PASSED — rule pattern-pydantic-model-dump-mode-json is evidence-backed.

- **Phase 4 runtime findings:** [`versionb/docs/runtime-findings.md`](../../versionb/docs/runtime-findings.md) — 6 runs runtime-verified, full 25-run refined matrix
- **Official Microsoft docs (gap):**
- [Azure Cosmos DB NoSQL Python quickstart](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/quickstart-python) (updated 2026-03-25) — `create_item` examples use plain `dict` literals; no Pydantic / datetime guidance
- [azure-cosmos 4.15.0 README](https://learn.microsoft.com/en-us/python/api/overview/azure/cosmos-readme) — has a "Boolean Data Type" Python-vs-JSON callout, no equivalent for datetime/UUID/Decimal
- **Related AK rules checked:**
- `sdk-serialization-enums` — covers Enum only (scope overlap resolved by narrowing this rule to non-Enum types)
- `model-json-serialization` — Jackson/Java only, does not apply to Python
- `sdk-python-async-deps` — aiohttp dependency only

## Documentation Gap

**Yes.** Companion issue recommended on `MicrosoftDocs/azure-databases-docs-pr` to add a Pydantic-integration callout in the Python quickstart, mirroring the existing "Boolean Data Type" callout in the SDK README.

## Related Out-of-Scope Bugs

Found during Phase 4 runtime validation but **do not belong to this rule**; each warrants a separate issue proposal:

| Bug | Observed in | Root cause |
|---|---|---|
| Custom `to_dict()` returning raw `datetime` | P3 R5, P5 R2 | Same SDK `json.dumps` limitation; pattern is a custom serializer not `model_dump` |
| URL-unsafe `#` in document `id` causes 401 auth-signing mismatch | P4 R02 | Cosmos REST signature is computed over ResourceLink; HTTP client strips fragment, server sees truncated id, signatures don't match |
| `match_condition="IfMatch"` passed as raw string instead of `MatchConditions.IfNotModified` enum | P5 R04 | `azure.core.MatchConditions` enum required; string rejected with `TypeError: Invalid match condition`. Note: .NET `IfMatch` maps to Python `IfNotModified` |

Metadata

Metadata

Assignees

Labels

SCOPEIssues generated by SCOPE toolenhancementNew feature or requestrule:sdkSDK usage rules (sdk-*)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions