Skip to content

fix: bypass LiteLLM for Ollama embeddings to resolve 400 Bad Request (#1425)#1438

Open
GratefulDave wants to merge 1 commit intoagent0ai:mainfrom
GratefulDave:fix/ollama-embedding-400-bad-request
Open

fix: bypass LiteLLM for Ollama embeddings to resolve 400 Bad Request (#1425)#1438
GratefulDave wants to merge 1 commit intoagent0ai:mainfrom
GratefulDave:fix/ollama-embedding-400-bad-request

Conversation

@GratefulDave
Copy link
Copy Markdown

@GratefulDave GratefulDave commented Apr 4, 2026

Fixes #1425 — persistent `httpx.HTTPStatusError: 400 Bad Request` on Ollama `/api/embed` during memory similarity search.

Root Cause

Two distinct causes, both fixed:

  1. LiteLLM's Ollama handler sends a malformed request — leaks the `ollama/` prefix into the model name field and forwards unsupported kwargs (e.g. `encoding_format: null`) that Ollama 0.18.x+ rejects with 400.

  2. `None` values in the embedding input array — when a `None` ends up in the texts list (e.g. from a failed upstream LLM call or race condition), it serialises to JSON `null`, which Ollama rejects with `{"error": "invalid input type"}` → 400.

Changes

`models.py`

  • `_ollama_embed()` — new helper on `LiteLLMEmbeddingWrapper` that calls Ollama's `/api/embed` directly via `httpx` (already a transitive dependency), bypassing LiteLLM entirely
    • Strips the `ollama/` prefix from the model name
    • Sanitises input: converts `None` → `""` and any non-str → `str()` before sending, with a `logging.warning` that records which inputs were coerced
    • Retries only on transient errors (429, 503) with exponential backoff; raises immediately on 400 (retrying a bad payload is pointless)
    • Logs the HTTP status, Ollama response body, and first 100 chars of each input on non-200 responses
  • `embed_query` / `embed_documents` — route through `_ollama_embed()` when `provider == "ollama"`
  • `_is_ollama()` — helper predicate

`plugins/_memory/helpers/memory.py`

  • `search_similarity_threshold` — wraps `asearch` in `try/except` so any embedding failure returns `[]` (no memories this turn) instead of propagating and crashing the agent's monologue loop

Test Plan

  • Start agent-zero with `ollama/nomic-embed-text` as the embedding model
  • Trigger a memory operation — confirm no 400 errors
  • Verify memory similarity search returns results
  • Verify `embed_documents` path works (store and retrieve a memory)
  • Verify agent continues normally when Ollama is temporarily unreachable (returns empty memories, does not crash)

Tested locally on Docker with `nomic-embed-text` (768-dim), Ollama 0.18.3, `host.docker.internal:11434`.

LiteLLM's Ollama embedding handler sends a malformed request to Ollama's
/api/embed endpoint, causing a 400 Bad Request error on Ollama 0.18.x.

- Add `_ollama_embed()` to `LiteLLMEmbeddingWrapper` that calls Ollama's
  `/api/embed` directly via httpx, stripping the "ollama/" prefix from
  the model name (the root cause of the malformed request)
- Route `embed_query` and `embed_documents` through this helper when
  provider == "ollama", bypassing LiteLLM entirely
- Wrap `search_similarity_threshold` in try/except so an embedding
  failure returns [] instead of crashing the agent

Fixes agent0ai#1425

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@GratefulDave GratefulDave force-pushed the fix/ollama-embedding-400-bad-request branch from eca2212 to bf6954f Compare April 9, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemini said [BUG] Persistent 400 Bad Request on /api/embed using Ollama with LiteLLM

2 participants