feat(celery Wave 6 #39): per-provider multimodal embedding input payload + fix self.base_url bug by earayu · Pull Request #1739 · apecloud/ApeRAG

earayu · 2026-04-27T11:32:00Z

Summary

Wave 6 task #39 — per-provider multimodal embedding input payload dispatcher + latent Wave 5 P2 chunk 1 bug fix.

Wave 5 P2 chunk 1 shipped EmbeddingService._embed_image_via_litellm with a single LiteLLM-shaped payload that mirrored OpenAI's multimodal chat completion request (not the embedding request). Voyage / Jina / Cohere each document a different embedding input shape; LiteLLM may translate transparently for some but ships no guarantee. This PR adds aperag/llm/embed/multimodal_input.py:build_multimodal_input_payload to emit the canonical shape per provider, with the Wave 5 baseline as the unknown-provider fallback.

Also fixes self.base_url → self.api_base in the same call (constructor sets self.api_base = embedding_service_url; the existing reference would have raised AttributeError on the first real production embed_image call).

Per-provider shapes

Provider	Aliases	Shape
Voyage AI	`voyage_ai` / `voyageai` / `voyage`	`[{"content": [{"type":"image_base64","image_base64":<data url>}, {"type":"text","text":<alt>}]}]`
Jina	`jina_ai` / `jinaai` / `jina`	`[{"image":<data url>}, {"text":<alt>}]` (fused vector)
Cohere	`cohere`	`[{"image":<data url>}, {"text":<alt>}]`
OpenAI	`openai` / `openai_multimodal`	`[{"type":"image_url","image_url":{"url":<data url>}}, {"type":"text","text":<alt>}]`
Unknown	(fallback)	`[{"image_url":{"url":<data url>}}, {"text":<alt>}]` (Wave 5 P2 baseline preserved)

Empty / whitespace alt_text skips the text part on every provider — pairing the image with " " would change the cache key (#37) and potentially confuse the embedder.

Bug fix

embedding_service.py:_embed_image_via_litellm referenced self.base_url (undefined) in the LiteLLM call; should be self.api_base. Verified via grep: zero other self.base_url references in this file. The text path at line 323 already uses self.api_base correctly. Without this fix, the first production embed_image() call raises AttributeError: 'EmbeddingService' object has no attribute 'base_url'.

Tests

tests/unit_test/llm/test_multimodal_input.py — 11 tests:

✅ Voyage content envelope + image_base64 part + alias matching (voyage / voyageai / VOYAGE / Voyage)
✅ Voyage / Jina / Cohere / OpenAI / unknown empty-alt_text path skips text part
✅ Jina + Cohere flat-list shape
✅ OpenAI chat-multimodal envelope + alias openai_multimodal resolves same shape
✅ Unknown provider falls back to LiteLLM-baseline shape
✅ provider=None resolves to default
✅ Whitespace-only alt_text treated as empty across all providers (regression guard)

Full unit suite: 1038 passed, 29 skipped (16 new + 1022 from main), ruff + format clean.

Out of scope

No new operator config — provider keyword already drives the rest of the embedding stack (cache key feat: async running #37, text path, error wrap).
No backend / column rename — task chore: change apecd #36 territory.
No cache layer changes — task feat: async running #37 wiring already shipped.

Simple-stable 4 guardrail check

Guardrail	#39 verdict
不无限扩范围	✅ scope = single dispatcher module + 1-line callsite swap + 1-char bug fix
先把功能做实	✅ each provider gets the documented embedding wire shape; latent bug fixed so production path is reachable
简单稳定	✅ pure function dispatcher, string-match by lowercase, no abstraction layer
私有化免维护	✅ provider keyword already operator-set; no new knob

Test plan

uv run pytest tests/unit_test/ — 1038 passed
uvx ruff check + format — clean
base on 0edc82a6 (post-Wave 5 + feat(celery Wave 6 #37): wire EmbeddingService.embed_image into application cache #1735 + docs(celery Wave 6 §K.11): full spec amendment — 7 items + 2-PR strategy + double pre-check lock #1736 merged main)

…oad + fix self.base_url bug Wave 5 P2 chunk 1 shipped `EmbeddingService._embed_image_via_litellm` with a single LiteLLM-shaped `input=[{"image_url": {"url": "data:..."}}, {"text": "..."}]` payload that mirrors OpenAI's multimodal *chat completion* request. That shape is **not** the canonical embedding input shape for the documented multimodal embedders (Voyage AI, Jina, Cohere) — LiteLLM may translate it for some providers but ships no guarantee. This PR introduces `aperag/llm/embed/multimodal_input.py` — `build_multimodal_input_payload(provider, image_data_url, alt_text)` — that dispatches to provider-specific shapes per their documented embedding wire format: * **Voyage AI** (`voyage_ai` / `voyageai` / `voyage`): `[{"content": [{"type": "image_base64", "image_base64": ...}, {"type": "text", "text": ...}]}]` * **Jina** (`jina_ai` / `jinaai` / `jina`): flat single-key list `[{"image": ...}, {"text": ...}]` — fused embedding * **Cohere** (`cohere`): same flat-list shape as Jina * **OpenAI** (`openai` / `openai_multimodal`): chat-multimodal envelope with `image_url` / `text` parts (closest documented shape; OpenAI text-embedding endpoints don't accept images yet so the failure surfaces as a provider-side 4xx with a clear message) * **Unknown provider**: falls back to the Wave 5 P2 baseline shape so prior behaviour is preserved (hard-cut directive: no shim, but also no regression for unmapped providers) Also fixes a latent Wave 5 P2 chunk 1 bug: `_embed_image_via_litellm` referenced `self.base_url` (undefined) where it should be `self.api_base` (the constructor-set attribute matching the text path). The first real production `embed_image()` call would have raised `AttributeError`; this PR makes the production code path actually reachable. Tests ----- `tests/unit_test/llm/test_multimodal_input.py` — 11 tests: * Voyage / Jina / Cohere / OpenAI / unknown-provider shape pinning * alias matching (`voyage` / `voyageai` / `VOYAGE` / ` Voyage `) * empty / whitespace `alt_text` skips the text part on every provider (otherwise pairing the image with " " changes the cache key and may confuse the embedder) * `provider=None` resolves to the default Full unit suite: **1038 passed, 29 skipped**, ruff + format clean. Out of scope (per task #39 boundary) ------------------------------------ * No new operator config — provider keyword already drives the rest of the embedding stack (cache key, text path, error wrap). * No backend rename / migration — task #36 territory. * No cache-layer changes — task #37 already shipped that wiring.

earayu merged commit 74327c0 into main Apr 27, 2026
4 checks passed

earayu deleted the ming-shu/wave6-39-provider-formats branch April 27, 2026 11:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(celery Wave 6 #39): per-provider multimodal embedding input payload + fix self.base_url bug#1739

feat(celery Wave 6 #39): per-provider multimodal embedding input payload + fix self.base_url bug#1739
earayu merged 1 commit into
mainfrom
ming-shu/wave6-39-provider-formats

earayu commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

earayu commented Apr 27, 2026

Summary

Per-provider shapes

Bug fix

Tests

Out of scope

Simple-stable 4 guardrail check

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant