feat(celery Wave 6 #39): per-provider multimodal embedding input payload + fix self.base_url bug#1739
Merged
Merged
Conversation
…oad + fix self.base_url bug
Wave 5 P2 chunk 1 shipped `EmbeddingService._embed_image_via_litellm`
with a single LiteLLM-shaped `input=[{"image_url": {"url": "data:..."}},
{"text": "..."}]` payload that mirrors OpenAI's multimodal *chat
completion* request. That shape is **not** the canonical embedding
input shape for the documented multimodal embedders (Voyage AI,
Jina, Cohere) — LiteLLM may translate it for some providers but
ships no guarantee.
This PR introduces `aperag/llm/embed/multimodal_input.py` —
`build_multimodal_input_payload(provider, image_data_url, alt_text)` —
that dispatches to provider-specific shapes per their documented
embedding wire format:
* **Voyage AI** (`voyage_ai` / `voyageai` / `voyage`):
`[{"content": [{"type": "image_base64", "image_base64": ...},
{"type": "text", "text": ...}]}]`
* **Jina** (`jina_ai` / `jinaai` / `jina`): flat single-key list
`[{"image": ...}, {"text": ...}]` — fused embedding
* **Cohere** (`cohere`): same flat-list shape as Jina
* **OpenAI** (`openai` / `openai_multimodal`): chat-multimodal
envelope with `image_url` / `text` parts (closest documented
shape; OpenAI text-embedding endpoints don't accept images yet
so the failure surfaces as a provider-side 4xx with a clear
message)
* **Unknown provider**: falls back to the Wave 5 P2 baseline shape
so prior behaviour is preserved (hard-cut directive: no shim,
but also no regression for unmapped providers)
Also fixes a latent Wave 5 P2 chunk 1 bug:
`_embed_image_via_litellm` referenced `self.base_url` (undefined)
where it should be `self.api_base` (the constructor-set attribute
matching the text path). The first real production
`embed_image()` call would have raised `AttributeError`; this PR
makes the production code path actually reachable.
Tests
-----
`tests/unit_test/llm/test_multimodal_input.py` — 11 tests:
* Voyage / Jina / Cohere / OpenAI / unknown-provider shape pinning
* alias matching (`voyage` / `voyageai` / `VOYAGE` / ` Voyage `)
* empty / whitespace `alt_text` skips the text part on every
provider (otherwise pairing the image with " " changes the
cache key and may confuse the embedder)
* `provider=None` resolves to the default
Full unit suite: **1038 passed, 29 skipped**, ruff + format clean.
Out of scope (per task #39 boundary)
------------------------------------
* No new operator config — provider keyword already drives the
rest of the embedding stack (cache key, text path, error wrap).
* No backend rename / migration — task #36 territory.
* No cache-layer changes — task #37 already shipped that wiring.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wave 6 task #39 — per-provider multimodal embedding input payload dispatcher + latent Wave 5 P2 chunk 1 bug fix.
Wave 5 P2 chunk 1 shipped
EmbeddingService._embed_image_via_litellmwith a single LiteLLM-shaped payload that mirrored OpenAI's multimodal chat completion request (not the embedding request). Voyage / Jina / Cohere each document a different embedding input shape; LiteLLM may translate transparently for some but ships no guarantee. This PR addsaperag/llm/embed/multimodal_input.py:build_multimodal_input_payloadto emit the canonical shape per provider, with the Wave 5 baseline as the unknown-provider fallback.Also fixes
self.base_url→self.api_basein the same call (constructor setsself.api_base = embedding_service_url; the existing reference would have raisedAttributeErroron the first real production embed_image call).Per-provider shapes
voyage_ai/voyageai/voyage[{"content": [{"type":"image_base64","image_base64":<data url>}, {"type":"text","text":<alt>}]}]jina_ai/jinaai/jina[{"image":<data url>}, {"text":<alt>}](fused vector)cohere[{"image":<data url>}, {"text":<alt>}]openai/openai_multimodal[{"type":"image_url","image_url":{"url":<data url>}}, {"type":"text","text":<alt>}][{"image_url":{"url":<data url>}}, {"text":<alt>}](Wave 5 P2 baseline preserved)Empty / whitespace
alt_textskips the text part on every provider — pairing the image with" "would change the cache key (#37) and potentially confuse the embedder.Bug fix
embedding_service.py:_embed_image_via_litellmreferencedself.base_url(undefined) in the LiteLLM call; should beself.api_base. Verified via grep: zero otherself.base_urlreferences in this file. The text path at line 323 already usesself.api_basecorrectly. Without this fix, the first productionembed_image()call raisesAttributeError: 'EmbeddingService' object has no attribute 'base_url'.Tests
tests/unit_test/llm/test_multimodal_input.py— 11 tests:contentenvelope +image_base64part + alias matching (voyage/voyageai/VOYAGE/Voyage)openai_multimodalresolves same shapeprovider=Noneresolves to defaultalt_texttreated as empty across all providers (regression guard)Full unit suite: 1038 passed, 29 skipped (16 new + 1022 from main), ruff + format clean.
Out of scope
Simple-stable 4 guardrail check
Test plan
uv run pytest tests/unit_test/— 1038 passeduvx ruff check + format— clean0edc82a6(post-Wave 5 + feat(celery Wave 6 #37): wire EmbeddingService.embed_image into application cache #1735 + docs(celery Wave 6 §K.11): full spec amendment — 7 items + 2-PR strategy + double pre-check lock #1736 merged main)