Skip to content

[Cosmos] [Embedding V0] VectorEmbeddingPolicy: add EmbeddingSource TypedDict and typed policy models #46764

@ananth7592

Description

@ananth7592

[Cosmos] [Embedding V0] VectorEmbeddingPolicy: document and add typed support for embeddingSource

Parent: #46729

Background

The vectorEmbeddingPolicy on a container now supports an optional embeddingSource block inside each entry of vectorEmbeddings. This block carries the endpoint, deployment name, auth type, and source paths that the new azure-cosmos-embedding package reads to construct an AzureOpenAIEmbeddingGenerator.

Example container policy JSON:

{
  "vectorEmbeddingPolicy": {
    "vectorEmbeddings": [
      {
        "path": "/embedding",
        "dataType": "float32",
        "dimensions": 1536,
        "distanceFunction": "cosine",
        "embeddingSource": {
          "sourcePaths": ["/title", "/abstract"],
          "deploymentName": "text-embedding-3-small",
          "modelName": "text-embedding-3-small",
          "endpoint": "https://embedding-south-central.cognitiveservices.azure.com/",
          "authType": "ApiKey"
        }
      }
    ]
  }
}

Scope

Python's vector_embedding_policy is currently typed as dict[str, Any] and passed through transparently. This issue adds typed support for the new embeddingSource sub-object without breaking the existing raw-dict path.

1. Add TypedDict models

In azure/cosmos/_models.py (or documents.py — confirm preferred location with SDK conventions):

from typing import List, Literal, Optional
from typing_extensions import TypedDict

class EmbeddingSource(TypedDict, total=False):
    sourcePaths: List[str]
    deploymentName: str
    modelName: str
    endpoint: str
    authType: Literal["ApiKey", "Entra"]

class VectorEmbedding(TypedDict, total=False):
    path: str
    dataType: Literal["float32", "float16", "uint8", "int8"]
    dimensions: int
    distanceFunction: Literal["cosine", "dotproduct", "euclidean"]
    embeddingSource: EmbeddingSource   # NEW

class VectorEmbeddingPolicy(TypedDict, total=False):
    vectorEmbeddings: List[VectorEmbedding]

2. Update database.py (sync + async)

Update all vector_embedding_policy keyword parameter type annotations from dict[str, Any] to VectorEmbeddingPolicy — no behavioral change, just stronger typing.

3. Update ContainerProperties docstring (if applicable)

Update the docstring for vector_embedding_policy in ContainerProperties / database.py to document the new embeddingSource schema.

Acceptance criteria

  • TypedDict models are exported from azure.cosmos (or a supported sub-module).
  • mypy passes on a usage like:
    source: EmbeddingSource = {"endpoint": "...", "deploymentName": "...", "authType": "ApiKey"}
  • Existing containers without embeddingSource continue to work unchanged (round-trip through dict[str, Any] still valid).
  • Unit test: create a VectorEmbedding TypedDict with and without embeddingSource, verify json.dumps round-trips correctly.

Files likely touched

  • sdk/cosmos/azure-cosmos/azure/cosmos/_models.py (or documents.py) — new TypedDict classes
  • sdk/cosmos/azure-cosmos/azure/cosmos/__init__.py — export new types
  • sdk/cosmos/azure-cosmos/azure/cosmos/database.py — updated type annotations (sync)
  • sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py — updated type annotations (async)

Dependencies

None — pure model/typing change, no behavioral change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions