Skip to content

feat(celery Wave 6 #39): per-provider multimodal embedding input payload + fix self.base_url bug#1739

Merged
earayu merged 1 commit into
mainfrom
ming-shu/wave6-39-provider-formats
Apr 27, 2026
Merged

feat(celery Wave 6 #39): per-provider multimodal embedding input payload + fix self.base_url bug#1739
earayu merged 1 commit into
mainfrom
ming-shu/wave6-39-provider-formats

Conversation

@earayu
Copy link
Copy Markdown
Collaborator

@earayu earayu commented Apr 27, 2026

Summary

Wave 6 task #39per-provider multimodal embedding input payload dispatcher + latent Wave 5 P2 chunk 1 bug fix.

Wave 5 P2 chunk 1 shipped EmbeddingService._embed_image_via_litellm with a single LiteLLM-shaped payload that mirrored OpenAI's multimodal chat completion request (not the embedding request). Voyage / Jina / Cohere each document a different embedding input shape; LiteLLM may translate transparently for some but ships no guarantee. This PR adds aperag/llm/embed/multimodal_input.py:build_multimodal_input_payload to emit the canonical shape per provider, with the Wave 5 baseline as the unknown-provider fallback.

Also fixes self.base_urlself.api_base in the same call (constructor sets self.api_base = embedding_service_url; the existing reference would have raised AttributeError on the first real production embed_image call).

Per-provider shapes

Provider Aliases Shape
Voyage AI voyage_ai / voyageai / voyage [{"content": [{"type":"image_base64","image_base64":<data url>}, {"type":"text","text":<alt>}]}]
Jina jina_ai / jinaai / jina [{"image":<data url>}, {"text":<alt>}] (fused vector)
Cohere cohere [{"image":<data url>}, {"text":<alt>}]
OpenAI openai / openai_multimodal [{"type":"image_url","image_url":{"url":<data url>}}, {"type":"text","text":<alt>}]
Unknown (fallback) [{"image_url":{"url":<data url>}}, {"text":<alt>}] (Wave 5 P2 baseline preserved)

Empty / whitespace alt_text skips the text part on every provider — pairing the image with " " would change the cache key (#37) and potentially confuse the embedder.

Bug fix

embedding_service.py:_embed_image_via_litellm referenced self.base_url (undefined) in the LiteLLM call; should be self.api_base. Verified via grep: zero other self.base_url references in this file. The text path at line 323 already uses self.api_base correctly. Without this fix, the first production embed_image() call raises AttributeError: 'EmbeddingService' object has no attribute 'base_url'.

Tests

tests/unit_test/llm/test_multimodal_input.py11 tests:

  • ✅ Voyage content envelope + image_base64 part + alias matching (voyage / voyageai / VOYAGE / Voyage)
  • ✅ Voyage / Jina / Cohere / OpenAI / unknown empty-alt_text path skips text part
  • ✅ Jina + Cohere flat-list shape
  • ✅ OpenAI chat-multimodal envelope + alias openai_multimodal resolves same shape
  • ✅ Unknown provider falls back to LiteLLM-baseline shape
  • provider=None resolves to default
  • ✅ Whitespace-only alt_text treated as empty across all providers (regression guard)

Full unit suite: 1038 passed, 29 skipped (16 new + 1022 from main), ruff + format clean.

Out of scope

Simple-stable 4 guardrail check

Guardrail #39 verdict
不无限扩范围 ✅ scope = single dispatcher module + 1-line callsite swap + 1-char bug fix
先把功能做实 ✅ each provider gets the documented embedding wire shape; latent bug fixed so production path is reachable
简单稳定 ✅ pure function dispatcher, string-match by lowercase, no abstraction layer
私有化免维护 ✅ provider keyword already operator-set; no new knob

Test plan

…oad + fix self.base_url bug

Wave 5 P2 chunk 1 shipped `EmbeddingService._embed_image_via_litellm`
with a single LiteLLM-shaped `input=[{"image_url": {"url": "data:..."}},
{"text": "..."}]` payload that mirrors OpenAI's multimodal *chat
completion* request. That shape is **not** the canonical embedding
input shape for the documented multimodal embedders (Voyage AI,
Jina, Cohere) — LiteLLM may translate it for some providers but
ships no guarantee.

This PR introduces `aperag/llm/embed/multimodal_input.py` —
`build_multimodal_input_payload(provider, image_data_url, alt_text)` —
that dispatches to provider-specific shapes per their documented
embedding wire format:

* **Voyage AI** (`voyage_ai` / `voyageai` / `voyage`):
  `[{"content": [{"type": "image_base64", "image_base64": ...},
                 {"type": "text", "text": ...}]}]`
* **Jina** (`jina_ai` / `jinaai` / `jina`): flat single-key list
  `[{"image": ...}, {"text": ...}]` — fused embedding
* **Cohere** (`cohere`): same flat-list shape as Jina
* **OpenAI** (`openai` / `openai_multimodal`): chat-multimodal
  envelope with `image_url` / `text` parts (closest documented
  shape; OpenAI text-embedding endpoints don't accept images yet
  so the failure surfaces as a provider-side 4xx with a clear
  message)
* **Unknown provider**: falls back to the Wave 5 P2 baseline shape
  so prior behaviour is preserved (hard-cut directive: no shim,
  but also no regression for unmapped providers)

Also fixes a latent Wave 5 P2 chunk 1 bug:
`_embed_image_via_litellm` referenced `self.base_url` (undefined)
where it should be `self.api_base` (the constructor-set attribute
matching the text path). The first real production
`embed_image()` call would have raised `AttributeError`; this PR
makes the production code path actually reachable.

Tests
-----
`tests/unit_test/llm/test_multimodal_input.py` — 11 tests:
* Voyage / Jina / Cohere / OpenAI / unknown-provider shape pinning
* alias matching (`voyage` / `voyageai` / `VOYAGE` / ` Voyage `)
* empty / whitespace `alt_text` skips the text part on every
  provider (otherwise pairing the image with " " changes the
  cache key and may confuse the embedder)
* `provider=None` resolves to the default

Full unit suite: **1038 passed, 29 skipped**, ruff + format clean.

Out of scope (per task #39 boundary)
------------------------------------
* No new operator config — provider keyword already drives the
  rest of the embedding stack (cache key, text path, error wrap).
* No backend rename / migration — task #36 territory.
* No cache-layer changes — task #37 already shipped that wiring.
@earayu earayu merged commit 74327c0 into main Apr 27, 2026
4 checks passed
@earayu earayu deleted the ming-shu/wave6-39-provider-formats branch April 27, 2026 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant