FEAT: add text embedder by rhajou · Pull Request #1694 · deepset-ai/haystack-core-integrations

rhajou · 2025-05-02T15:16:13Z

Related Issues

fixes Add a GoogleAIGeminiDocumentEmbedder and GoogleAIGeminiTextEmbedder #1534 Gemini embedder models #1611

Proposed Changes:

Added the Google Text Embedder

How did you test it?

Unit tests

Notes for the reviewer

Didn't add the Document Embedder, is it needed?

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.

mpangrazzi

I've left some comments! I recommend reading first the official docs, then update the implementation and the tests. I also recommend adding an example (and test it out) or an integration test. LMK if something is not clear! 😉

mpangrazzi · 2025-06-04T07:58:09Z

  "Programming Language :: Python :: Implementation :: PyPy",
 ]
-dependencies = ["haystack-ai>=2.9.0", "google-generativeai>=0.3.1"]
+dependencies = ["haystack-ai>=2.9.0", "google-generativeai>=0.3.1", "google-genai==1.13.0"]


Adding google-genai==1.13.0 with exact version pinning could cause conflicts. What about google-genai>=1.13.0?

mpangrazzi · 2025-06-04T08:06:50Z

+        :param model: The name of the Google AI embedding model to use.
+                      Defaults to "models/embedding-001".
+        :param api_key: The Google AI API key. It can be explicitly provided or automatically read from the
+                        `GOOGLE_API_KEY` environment variable.


Note: above you're initializing this from GEMINI_API_KEY, but here you are referring to GOOGLE_API_KEY.

mpangrazzi · 2025-06-04T08:10:32Z

+            configs.title = self.title
+        elif self.title and self.task_type != "retrieval_document":
+            warnings.warn(
+                UserWarning("Warning: Title 'Should Be Ignored' is ignored because task_type is 'retrieval_query'"),


Shouldn't this be f"Warning: title '{self.title}' is ignored..."?

mpangrazzi · 2025-06-04T08:25:03Z

+            raise RuntimeError(msg) from e
+
+        # Extract embeddings - result.embedding should be the list of lists
+        embeddings = result.get("embedding")  # Use .get for safety, returns None if key missing


According to docs, result should be an object and not a dict, so you should do result.embeddings.

I see that in the tests you're mocking this response (so tests are actually passing), but have you tried it outside tests (e.g. in an integration tests or an example?)

mpangrazzi · 2025-06-04T08:28:11Z

+    texts = ["text 1", "text 2"]
+    expected_embeddings = [[0.1, 0.2], [0.3, 0.4]]
+    # Configure the mock embed_content method to return a successful response
+    mock_client_instance.models.embed_content.return_value = {"embedding": expected_embeddings}


This is the wrong mocking I was mentioning above.

According to docs, a correct mock should be something like:

mock_response = MagicMock() mock_response.embeddings = None # or e.g. [[0.1, 0.2], [0.3, 0.4]] mock_client_instance.models.embed_content.return_value = mock_response

Can you please update tests accordingly?

anakin87 · 2025-07-24T13:55:26Z

Google Generative AI SDK is deprecated and will reach EOL in September 2025.

Google GenAI SDK should be used instead, for which we already introduced Embedders in #1783.

For this reason, I am closing this PR and the related issue.

add text embedder

f7b1419

rhajou requested a review from a team as a code owner May 2, 2025 15:16

rhajou requested review from mpangrazzi and removed request for a team May 2, 2025 15:16

github-actions Bot added integration:google-ai type:documentation Improvements or additions to documentation labels May 2, 2025

anakin87 mentioned this pull request May 24, 2025

feat: Add GoogleAITextEmbedder and GoogleAIDocumentEmbedder components #1783

Merged

mpangrazzi requested changes Jun 4, 2025

View reviewed changes

anakin87 closed this Jul 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: add text embedder#1694

FEAT: add text embedder#1694
rhajou wants to merge 1 commit into
deepset-ai:mainfrom
rhajou:add-gemini-embedder

rhajou commented May 2, 2025 •

edited

Loading

Uh oh!

mpangrazzi left a comment

Uh oh!

mpangrazzi Jun 4, 2025

Uh oh!

mpangrazzi Jun 4, 2025

Uh oh!

mpangrazzi Jun 4, 2025

Uh oh!

mpangrazzi Jun 4, 2025

Uh oh!

mpangrazzi Jun 4, 2025

Uh oh!

anakin87 commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rhajou commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

mpangrazzi left a comment

Choose a reason for hiding this comment

Uh oh!

mpangrazzi Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

mpangrazzi Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

mpangrazzi Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

mpangrazzi Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

mpangrazzi Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

anakin87 commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rhajou commented May 2, 2025 •

edited

Loading