Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 10 additions & 11 deletions airbyte_cdk/destinations/vector_db_based/embedder.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@
from dataclasses import dataclass
from typing import List, Optional, Union, cast

from langchain.embeddings.cohere import CohereEmbeddings
from langchain.embeddings.fake import FakeEmbeddings
from langchain.embeddings.localai import LocalAIEmbeddings
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain_cohere import CohereEmbeddings
from langchain_community.embeddings import FakeEmbeddings, LocalAIEmbeddings
from langchain_openai import AzureOpenAIEmbeddings, OpenAIEmbeddings
from pydantic import SecretStr

from airbyte_cdk.destinations.vector_db_based.config import (
AzureOpenAIEmbeddingConfigModel,
Expand Down Expand Up @@ -118,14 +118,13 @@ class AzureOpenAIEmbedder(BaseOpenAIEmbedder):
def __init__(self, config: AzureOpenAIEmbeddingConfigModel, chunk_size: int):
# Azure OpenAI API has — as of 20230927 — a limit of 16 documents per request
super().__init__(
OpenAIEmbeddings( # type: ignore [call-arg]
openai_api_key=config.openai_key,
AzureOpenAIEmbeddings( # type: ignore [call-arg]
api_key=SecretStr(config.openai_key),
chunk_size=16,
max_retries=15,
openai_api_type="azure",
openai_api_version="2023-05-15",
openai_api_base=config.api_base,
deployment=config.deployment,
api_version="2023-05-15",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Should the Azure OpenAI API version be configurable or more recent?

The api_version is hardcoded to "2023-05-15", which is over 2 years old. Azure OpenAI has released newer API versions since then. This could limit access to newer features or cause compatibility issues.

Consider either:

  1. Making the API version configurable via AzureOpenAIEmbeddingConfigModel
  2. Updating to a more recent stable API version (2024-x)

Wdyt about making this configurable for flexibility?


🌐 Web query:

What are the latest Azure OpenAI API versions and which version is recommended for langchain-openai 0.2.x?

💡 Result:

Short answer

  • Latest Azure OpenAI API surfaces (as of Oct 23, 2025): the new unified v1 OpenAI-style API (opt‑in since mid‑2025) plus dateed preview versions (notably 2025-03-01-preview for Responses/features) and the most recent date-based GA/preview releases (examples: 2024-06-01 / 2024-10-21 / 2025-03-01-preview). [1][2][3]

Recommended version for langchain-openai 0.2.x

  • Use the API version LangChain v0.2 docs show: set OPENAI_API_VERSION = 2023-12-01-preview (LangChain v0.2 guidance). If you can move to the newer unified v1/OpenAI-style endpoints and your LangChain version supports them, prefer the v1 spec for access to the latest features; otherwise pin to 2023-12-01-preview for stability with langchain-openai 0.2.x. [4][1]

Sources

  • Microsoft: Responses API / Azure OpenAI docs (v1 + preview notes). [1][2]
  • Microsoft: Azure OpenAI API lifecycle (unified v1 rollout / api-version guidance). [1]
  • LangChain v0.2 Azure OpenAI docs (recommended OPENAI_API_VERSION = 2023-12-01-preview). [4]

References (links shown as numbered sources above).


Update the Azure OpenAI API version to match LangChain compatibility requirements.

The api_version is hardcoded to "2023-05-15", but LangChain v0.2 docs recommend setting OPENAI_API_VERSION = 2023-12-01-preview for langchain-openai 0.2.x. This 7-month gap may cause compatibility issues or limit access to features.

Would updating to 2023-12-01-preview work for your use case? Additionally, consider making this configurable via AzureOpenAIEmbeddingConfigModel for flexibility across different deployment scenarios.

🤖 Prompt for AI Agents
In airbyte_cdk/destinations/vector_db_based/embedder.py around line 124, the
api_version is hardcoded to "2023-05-15"; update it to "2023-12-01-preview" to
match LangChain v0.2 expectations and avoid compatibility issues, and make this
value configurable by adding a new field (e.g., openai_api_version) to
AzureOpenAIEmbeddingConfigModel with default "2023-12-01-preview"; then replace
the hardcoded string with the config value when constructing the Azure OpenAI
client so existing behavior remains unchanged if the config is not set.

azure_endpoint=config.api_base,
azure_deployment=config.deployment,
disallowed_special=(),
Comment on lines +121 to 128
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify Azure OpenAI parameter mappings with the new SDK.

The parameter renames look aligned with Azure-specific naming conventions (e.g., azure_endpoint, azure_deployment), but since this is a major dependency upgrade, it's worth double-checking these match the langchain-openai 0.2.x API exactly.

As you noted in the PR description, testing with real Azure OpenAI credentials would help confirm the mappings are correct.


🏁 Script executed:

#!/bin/bash
# Description: Check for actual usage patterns of AzureOpenAIEmbeddings in the codebase to ensure parameter names are used correctly

# Search for any other references to AzureOpenAIEmbeddings or AzureOpenAIEmbedder
rg -nP -C5 'AzureOpenAI(Embeddings|Embedder)' --type=py

Length of output: 6311


Let me verify the exact parameter names for langchain-openai 0.2.x:

Let me check the actual implementation more closely and whether the model parameter is being provided:

Add missing model parameter to AzureOpenAIEmbeddings initialization.

The model parameter is required in langchain-openai 0.2.x AzureOpenAIEmbeddings—every official example shows it as the first argument. Your code passes api_key, chunk_size, max_retries, api_version, azure_endpoint, azure_deployment, and disallowed_special, but omits model.

The Azure OpenAI Service embeddings operation supports text-embedding-ada-002 (Version 2), text-embedding-3-large, and text-embedding-3-small models. Wdyt about adding something like model="text-embedding-3-large" (or whichever model your Azure deployment uses) to the initialization?

🤖 Prompt for AI Agents
In airbyte_cdk/destinations/vector_db_based/embedder.py around lines 120 to 127,
the AzureOpenAIEmbeddings initialization is missing the required model parameter
for langchain-openai 0.2.x; add a model argument (e.g.,
model="text-embedding-3-large" or the specific embedding model your Azure
deployment uses) to the constructor call so the embeddings client is created
with the correct model; ensure the chosen model matches the Azure deployment
configuration (deployment/model) and update any config or docs if you introduce
a new default.

),
chunk_size,
Expand All @@ -140,7 +139,7 @@ def __init__(self, config: CohereEmbeddingConfigModel):
super().__init__()
# Client is set internally
self.embeddings = CohereEmbeddings(
cohere_api_key=config.cohere_key, model="embed-english-light-v2.0"
cohere_api_key=SecretStr(config.cohere_key), model="embed-english-light-v2.0"
) # type: ignore

def check(self) -> Optional[str]:
Expand Down
Loading
Loading