Skip to content

model-id-constraints: Disallow URL-reserved characters (#, ?) in Python document id #115

@jaydestro

Description

@jaydestro

SCOPE-PY-010: Update model-id-constraints Rule — Disallow URL-reserved Characters in Document id

Repository: AzureCosmosDB/cosmosdb-agent-kit
Labels: SCOPE, enhancement, rule:model
Affected Rule: rules/model-id-constraints.md
Severity: HIGH


Summary

Document id values that contain URL-reserved characters (#, ?, /, \, space) cause Cosmos DB REST request signing failures. The SDK builds the request URI and the HMAC signature using the ResourceLink (dbs/{db}/colls/{coll}/docs/{id}) with the raw id. When the underlying HTTP client transmits the URI, it strips everything after a # (URL fragment delimiter per RFC 3986), so the server receives a truncated ResourceLink, recomputes a different signature, and returns 401 Unauthorized: "The input authorization token can't serve the request". The failure surfaces only on operations whose ResourceLink includes the item id — read_item / replace_item / delete_item / patch_item — not on create_item, whose ResourceLink is the parent collection. The existing model-id-constraints rule recommends "only alphanumeric ASCII + - + _" as a best practice (which would prevent the bug), and lists / and \ as forbidden, but does not explicitly call out # or ? nor explain the REST-signing root cause, so agents that deviate from the best practice rediscover the bug at runtime. Observed in 1/25 = 4% of V-B Python Gaming Leaderboard L5 runs (P4 R02), where the agent constructs composite ids like best#<player>#<week>#<region>.

Observed Behavior

Anti-pattern (versionb/profile04/run02/workspace/workspace/app/models/player.py):

# Composite id uses '#' as separator
doc_id = f"best#{player_id}#{week}#{region}"
await container.upsert_item(body={"id": doc_id, ...})   # succeeds

# Later:
await container.read_item(item=doc_id, partition_key=player_id)
# 💥 azure.cosmos.exceptions.CosmosHttpResponseError: (Unauthorized)
#    The input authorization token can't serve the request.

Runtime evidence (Phase 4, P04 R02): POST /api/players returns 201 (create path), subsequent POST /api/scores (which does a read_item + replace_item on the best-score doc) returns 401. Emulator logs show auth-signing mismatch. Changing the separator from # to _ resolves the 401.

Expected Behavior

The agent should choose id separators that are URL-path-safe: _, -, :, or |. The rule should explicitly name # (fragment), ? (query), / and \ (path), and space as unsafe, and should cite the REST signing root cause so agents don't re-derive the workaround each time.

Proposed Fix

Extend the "Forbidden characters" section of model-id-constraints.md:

### URL-reserved characters break Cosmos DB auth signing

Cosmos DB's REST protocol computes an HMAC signature over a canonical string
that includes the ResourceLink (`dbs/{db}/colls/{coll}/docs/{id}`). When the
SDK sends an HTTP request whose URL embeds a URL-reserved character in the
`id` segment, the HTTP transport may strip or reinterpret the URL (e.g. a `#`
is a fragment delimiter and is removed before the request leaves the client).
The server then recomputes the signature over the truncated ResourceLink and
returns **401 Unauthorized: "The input authorization token can't serve the
request"** — even though the key is correct.

The failure surfaces on `read_item`, `replace_item`, `delete_item`, and
`patch_item`. It does **not** surface on `create_item` (the id is not part of
the signed ResourceLink for creates — the parent collection is), so the bug
often hides until the first update or read.

**Never use any of these in `id`:**

| Char | Reason |
|---|---|
| `#` | URL fragment delimiter — HTTP client strips everything after `#` before sending; server sees truncated id, HMAC signature mismatch → 401 |
| `?` | URL query delimiter — same fragment-truncation class of failure |
| `/` `\` | Path separators — change the ResourceLink structure |

**Avoid (interoperability / encoding risk):**

| Char | Reason |
|---|---|
| ` ` (space) | Percent-encoding inconsistency across SDKs and connectors |
| `%` | Ambiguous with percent-encoding sequences |
| Any non-ASCII | Encoded differently across clients; known issues in ADF / Spark / Kafka connectors |

**Safe synthetic-id separators:** `_`, `-`, `:`

**Incorrect:**

```python
doc_id = f"best#{player_id}#{week}#{region}"   # ❌ 401 on read/update

Correct:

doc_id = f"best:{player_id}:{week}:{region}"   # ✅ works on all operations

See also: partition-synthetic-keys for synthetic-key construction patterns.


## Evidence

- **Runtime reproduction:** [`versionb/docs/runtime-findings.md`](../../versionb/docs/runtime-findings.md) — P04 R02 runtime validation section
- **Emulator error:** `azure.cosmos.exceptions.CosmosHttpResponseError (Unauthorized): The input authorization token can't serve the request`
- **Azure docs on REST signing:** [Access control on Cosmos DB resources](https://learn.microsoft.com/en-us/rest/api/cosmos-db/access-control-on-cosmosdb-resources) — `ResourceLink` embedded in the canonical string
- **Azure docs on id constraints:** [How to model and partition data](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/model-partition-example) — mentions forbidden `/ \ ? #` characters but does not cite the REST-signing root cause
- **Related AK rules checked:**
  - `model-id-constraints` — already lists `/` and `\` as forbidden and recommends alphanumeric+`-`+`_` best practice. This enhancement adds explicit `#` and `?` callouts with the fragment-stripping root cause, so agents who deviate from the best practice still produce correct code.
  - `partition-synthetic-keys` — relevant but orthogonal

## Documentation Gap

**Partial.** MS Learn documents forbidden characters but not the REST-signing root cause. A companion doc PR on `MicrosoftDocs/azure-databases-docs-pr` to cross-link id constraints → REST access control would help, but this rule change alone is sufficient to prevent the agent-generated bug.

Metadata

Metadata

Assignees

Labels

SCOPEIssues generated by SCOPE toolenhancementNew feature or requestrule:modelData model and serialization rules (model-*)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions