SCOPE-PY-010: Update model-id-constraints Rule — Disallow URL-reserved Characters in Document id
Repository: AzureCosmosDB/cosmosdb-agent-kit
Labels: SCOPE, enhancement, rule:model
Affected Rule: rules/model-id-constraints.md
Severity: HIGH
Summary
Document id values that contain URL-reserved characters (#, ?, /, \, space) cause Cosmos DB REST request signing failures. The SDK builds the request URI and the HMAC signature using the ResourceLink (dbs/{db}/colls/{coll}/docs/{id}) with the raw id. When the underlying HTTP client transmits the URI, it strips everything after a # (URL fragment delimiter per RFC 3986), so the server receives a truncated ResourceLink, recomputes a different signature, and returns 401 Unauthorized: "The input authorization token can't serve the request". The failure surfaces only on operations whose ResourceLink includes the item id — read_item / replace_item / delete_item / patch_item — not on create_item, whose ResourceLink is the parent collection. The existing model-id-constraints rule recommends "only alphanumeric ASCII + - + _" as a best practice (which would prevent the bug), and lists / and \ as forbidden, but does not explicitly call out # or ? nor explain the REST-signing root cause, so agents that deviate from the best practice rediscover the bug at runtime. Observed in 1/25 = 4% of V-B Python Gaming Leaderboard L5 runs (P4 R02), where the agent constructs composite ids like best#<player>#<week>#<region>.
Observed Behavior
Anti-pattern (versionb/profile04/run02/workspace/workspace/app/models/player.py):
# Composite id uses '#' as separator
doc_id = f"best#{player_id}#{week}#{region}"
await container.upsert_item(body={"id": doc_id, ...}) # succeeds
# Later:
await container.read_item(item=doc_id, partition_key=player_id)
# 💥 azure.cosmos.exceptions.CosmosHttpResponseError: (Unauthorized)
# The input authorization token can't serve the request.
Runtime evidence (Phase 4, P04 R02): POST /api/players returns 201 (create path), subsequent POST /api/scores (which does a read_item + replace_item on the best-score doc) returns 401. Emulator logs show auth-signing mismatch. Changing the separator from # to _ resolves the 401.
Expected Behavior
The agent should choose id separators that are URL-path-safe: _, -, :, or |. The rule should explicitly name # (fragment), ? (query), / and \ (path), and space as unsafe, and should cite the REST signing root cause so agents don't re-derive the workaround each time.
Proposed Fix
Extend the "Forbidden characters" section of model-id-constraints.md:
### URL-reserved characters break Cosmos DB auth signing
Cosmos DB's REST protocol computes an HMAC signature over a canonical string
that includes the ResourceLink (`dbs/{db}/colls/{coll}/docs/{id}`). When the
SDK sends an HTTP request whose URL embeds a URL-reserved character in the
`id` segment, the HTTP transport may strip or reinterpret the URL (e.g. a `#`
is a fragment delimiter and is removed before the request leaves the client).
The server then recomputes the signature over the truncated ResourceLink and
returns **401 Unauthorized: "The input authorization token can't serve the
request"** — even though the key is correct.
The failure surfaces on `read_item`, `replace_item`, `delete_item`, and
`patch_item`. It does **not** surface on `create_item` (the id is not part of
the signed ResourceLink for creates — the parent collection is), so the bug
often hides until the first update or read.
**Never use any of these in `id`:**
| Char | Reason |
|---|---|
| `#` | URL fragment delimiter — HTTP client strips everything after `#` before sending; server sees truncated id, HMAC signature mismatch → 401 |
| `?` | URL query delimiter — same fragment-truncation class of failure |
| `/` `\` | Path separators — change the ResourceLink structure |
**Avoid (interoperability / encoding risk):**
| Char | Reason |
|---|---|
| ` ` (space) | Percent-encoding inconsistency across SDKs and connectors |
| `%` | Ambiguous with percent-encoding sequences |
| Any non-ASCII | Encoded differently across clients; known issues in ADF / Spark / Kafka connectors |
**Safe synthetic-id separators:** `_`, `-`, `:`
**Incorrect:**
```python
doc_id = f"best#{player_id}#{week}#{region}" # ❌ 401 on read/update
Correct:
doc_id = f"best:{player_id}:{week}:{region}" # ✅ works on all operations
See also: partition-synthetic-keys for synthetic-key construction patterns.
## Evidence
- **Runtime reproduction:** [`versionb/docs/runtime-findings.md`](../../versionb/docs/runtime-findings.md) — P04 R02 runtime validation section
- **Emulator error:** `azure.cosmos.exceptions.CosmosHttpResponseError (Unauthorized): The input authorization token can't serve the request`
- **Azure docs on REST signing:** [Access control on Cosmos DB resources](https://learn.microsoft.com/en-us/rest/api/cosmos-db/access-control-on-cosmosdb-resources) — `ResourceLink` embedded in the canonical string
- **Azure docs on id constraints:** [How to model and partition data](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/model-partition-example) — mentions forbidden `/ \ ? #` characters but does not cite the REST-signing root cause
- **Related AK rules checked:**
- `model-id-constraints` — already lists `/` and `\` as forbidden and recommends alphanumeric+`-`+`_` best practice. This enhancement adds explicit `#` and `?` callouts with the fragment-stripping root cause, so agents who deviate from the best practice still produce correct code.
- `partition-synthetic-keys` — relevant but orthogonal
## Documentation Gap
**Partial.** MS Learn documents forbidden characters but not the REST-signing root cause. A companion doc PR on `MicrosoftDocs/azure-databases-docs-pr` to cross-link id constraints → REST access control would help, but this rule change alone is sufficient to prevent the agent-generated bug.
SCOPE-PY-010: Update
model-id-constraintsRule — Disallow URL-reserved Characters in DocumentidRepository: AzureCosmosDB/cosmosdb-agent-kit
Labels: SCOPE, enhancement, rule:model
Affected Rule:
rules/model-id-constraints.mdSeverity: HIGH
Summary
Document
idvalues that contain URL-reserved characters (#,?,/,\, space) cause Cosmos DB REST request signing failures. The SDK builds the request URI and the HMAC signature using the ResourceLink (dbs/{db}/colls/{coll}/docs/{id}) with the rawid. When the underlying HTTP client transmits the URI, it strips everything after a#(URL fragment delimiter per RFC 3986), so the server receives a truncated ResourceLink, recomputes a different signature, and returns401 Unauthorized: "The input authorization token can't serve the request". The failure surfaces only on operations whose ResourceLink includes the item id —read_item/replace_item/delete_item/patch_item— not oncreate_item, whose ResourceLink is the parent collection. The existingmodel-id-constraintsrule recommends "only alphanumeric ASCII +-+_" as a best practice (which would prevent the bug), and lists/and\as forbidden, but does not explicitly call out#or?nor explain the REST-signing root cause, so agents that deviate from the best practice rediscover the bug at runtime. Observed in 1/25 = 4% of V-B Python Gaming Leaderboard L5 runs (P4 R02), where the agent constructs composite ids likebest#<player>#<week>#<region>.Observed Behavior
Anti-pattern (
versionb/profile04/run02/workspace/workspace/app/models/player.py):Runtime evidence (Phase 4, P04 R02):
POST /api/playersreturns 201 (create path), subsequentPOST /api/scores(which does aread_item+replace_itemon the best-score doc) returns 401. Emulator logs show auth-signing mismatch. Changing the separator from#to_resolves the 401.Expected Behavior
The agent should choose
idseparators that are URL-path-safe:_,-,:, or|. The rule should explicitly name#(fragment),?(query),/and\(path), and space as unsafe, and should cite the REST signing root cause so agents don't re-derive the workaround each time.Proposed Fix
Extend the "Forbidden characters" section of
model-id-constraints.md:Correct:
See also:
partition-synthetic-keysfor synthetic-key construction patterns.