Skip to content

FEAT: add text embedder#1694

Closed
rhajou wants to merge 1 commit into
deepset-ai:mainfrom
rhajou:add-gemini-embedder
Closed

FEAT: add text embedder#1694
rhajou wants to merge 1 commit into
deepset-ai:mainfrom
rhajou:add-gemini-embedder

Conversation

@rhajou
Copy link
Copy Markdown
Contributor

@rhajou rhajou commented May 2, 2025

Related Issues

Proposed Changes:

Added the Google Text Embedder

How did you test it?

Unit tests

Notes for the reviewer

Didn't add the Document Embedder, is it needed?

Checklist

@rhajou rhajou requested a review from a team as a code owner May 2, 2025 15:16
@rhajou rhajou requested review from mpangrazzi and removed request for a team May 2, 2025 15:16
@github-actions github-actions Bot added integration:google-ai type:documentation Improvements or additions to documentation labels May 2, 2025
Copy link
Copy Markdown
Contributor

@mpangrazzi mpangrazzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments! I recommend reading first the official docs, then update the implementation and the tests. I also recommend adding an example (and test it out) or an integration test. LMK if something is not clear! 😉

"Programming Language :: Python :: Implementation :: PyPy",
]
dependencies = ["haystack-ai>=2.9.0", "google-generativeai>=0.3.1"]
dependencies = ["haystack-ai>=2.9.0", "google-generativeai>=0.3.1", "google-genai==1.13.0"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding google-genai==1.13.0 with exact version pinning could cause conflicts. What about google-genai>=1.13.0?

:param model: The name of the Google AI embedding model to use.
Defaults to "models/embedding-001".
:param api_key: The Google AI API key. It can be explicitly provided or automatically read from the
`GOOGLE_API_KEY` environment variable.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: above you're initializing this from GEMINI_API_KEY, but here you are referring to GOOGLE_API_KEY.

configs.title = self.title
elif self.title and self.task_type != "retrieval_document":
warnings.warn(
UserWarning("Warning: Title 'Should Be Ignored' is ignored because task_type is 'retrieval_query'"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be f"Warning: title '{self.title}' is ignored..."?

raise RuntimeError(msg) from e

# Extract embeddings - result.embedding should be the list of lists
embeddings = result.get("embedding") # Use .get for safety, returns None if key missing
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to docs, result should be an object and not a dict, so you should do result.embeddings.

I see that in the tests you're mocking this response (so tests are actually passing), but have you tried it outside tests (e.g. in an integration tests or an example?)

texts = ["text 1", "text 2"]
expected_embeddings = [[0.1, 0.2], [0.3, 0.4]]
# Configure the mock embed_content method to return a successful response
mock_client_instance.models.embed_content.return_value = {"embedding": expected_embeddings}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong mocking I was mentioning above.

According to docs, a correct mock should be something like:

mock_response = MagicMock()
mock_response.embeddings = None # or e.g. [[0.1, 0.2], [0.3, 0.4]]
mock_client_instance.models.embed_content.return_value = mock_response

Can you please update tests accordingly?

@anakin87
Copy link
Copy Markdown
Member

Google Generative AI SDK is deprecated and will reach EOL in September 2025.

Google GenAI SDK should be used instead, for which we already introduced Embedders in #1783.

For this reason, I am closing this PR and the related issue.

@anakin87 anakin87 closed this Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:google-ai type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a GoogleAIGeminiDocumentEmbedder and GoogleAIGeminiTextEmbedder

3 participants