Skip to content

query-pagination: Add Python SDK continuation token examples #105

@jaydestro

Description

@jaydestro

Summary

The query-pagination rule ("Use Continuation Tokens for Pagination") contains excellent anti-OFFSET guidance with C# and Java examples, but no Python examples. OFFSET usage is at 22% overall (7/32 runs), with AK profiles at 12% (2/17) and non-AK at 33% (5/15). Adding Python by_page() continuation token examples would further reduce AK OFFSET usage.

Rule text (from rule): "Use continuation tokens to paginate through large result sets efficiently. Never use OFFSET/LIMIT for deep pagination — it is a common anti-pattern with severe performance implications."

Important Python limitation (from MS Learn): "For the Python SDK, continuation tokens are only supported for single partition queries. The partition key must be specified in the options object."

Observed Behavior

Scope OFFSET Runs Rate
AK profiles (P02-P04, P08) 2/17 12%
Non-AK profiles (P01, P05, P06) 5/15 33%
Total 7/32 22%

Anti-pattern (from SCOPE evaluation P06 R01 leaderboard_repo.py — neighbor query):

# OFFSET/LIMIT for neighbor queries — RU cost scales with rank depth
docs = await self._query_all(
    query=(
        "SELECT * FROM c "
        "WHERE c.pk = @pk AND c.doc_type = 'leaderboard_entry' "
        "ORDER BY c.doc_type ASC, c.score DESC "
        "OFFSET @offset LIMIT @limit"
    ),
    parameters=[
        {"name": "@pk",     "value": pk},
        {"name": "@offset", "value": above_offset},
        {"name": "@limit",  "value": above_limit},
    ],
    partition_key=pk,
)

Note: In this specific case (neighbor query within a single partition, bounded window of ±10), OFFSET is acceptable per the rule's exception ("total result set is small"). The concern is that agents learn the OFFSET pattern and apply it to unbounded queries too.

Proposed Fix

Add a Python section to query-pagination.md:

# ❌ OFFSET cost scales linearly with page depth
async def get_scores_page(container, player_id, page: int, page_size: int = 20):
    offset = (page - 1) * page_size
    query = f"SELECT * FROM c ORDER BY c.submittedAt DESC OFFSET {offset} LIMIT {page_size}"
    items = container.query_items(query=query, partition_key=player_id)
    return [item async for item in items]

# ✅ Every page costs the same RU regardless of depth
async def get_scores_page(container, player_id, page_size: int = 20,
                          continuation_token: str = None):
    query = "SELECT * FROM c ORDER BY c.submittedAt DESC"
    results = container.query_items(
        query=query,
        partition_key=player_id,  # Required for Python continuation tokens
        max_item_count=page_size,
    )
    pager = results.by_page(continuation_token)
    page = await pager.__anext__()
    items = [item async for item in page]
    return {"items": items, "continuation_token": pager.continuation_token}

Python SDK limitation: Continuation tokens are only supported for single-partition queries. Always provide partition_key= when using by_page(). For cross-partition result sets, use max_item_count to limit per-request size and iterate all pages.

References

Metadata

Metadata

Assignees

Labels

SCOPEIssues generated by SCOPE toolenhancementNew feature or requestrule:queryCosmos DB query rule enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions