Skip to content

Embedding calls crash with 400 when stored memory exceeds model context window #1436

@PaoloC68

Description

@PaoloC68

Description

When Agent Zero stores or searches memory documents that exceed the embedding model's context window (e.g. BAAI/bge-m3: 8192 tokens), the embedding call crashes with a 400 error:

```
litellm.BadRequestError: Error code: 400 - You passed 8193 input tokens and requested 0 output tokens.
However, the model's context length is only 8192 tokens.
```

This is a hard crash — the agent errors out and memory operations fail completely.

Root Cause

Two separate issues in LiteLLMEmbeddingWrapper:

Issue 1: LiteLLM ≥1.80.11 sends encoding_format: null in embedding requests when the parameter is not set. Strict OpenAI-compatible validators (DeepInfra, vLLM, HuggingFace TEI) reject null with 422. (Upstream LiteLLM issue: BerriAI/litellm#19174)

Issue 2: There is no input truncation before embedding calls. If a memory document exceeds the model's context window, the API returns 400. Additionally, when truncation is applied using cl100k_base (GPT tokenizer), it can undercount tokens compared to the model's own tokenizer (e.g. bge-m3 SentencePiece), causing 400 errors even at the apparent limit.

Steps to Reproduce

  1. Configure an OpenAI-compatible embedding provider via api_base (e.g. DeepInfra with BAAI/bge-m3)
  2. Ask the agent to memorize or search a long document (>8192 tokens)
  3. The embedding call raises BadRequestError: 400

Fix

In models.py, LiteLLMEmbeddingWrapper.embed_documents and embed_query:

  1. Default encoding_format to "float" before merging kwargs (prevents null being sent)
  2. Truncate input to ctx_length - 500 tokens before embedding (the 500-token margin accounts for tokenizer divergence between cl100k_base used for counting and the model's actual tokenizer)

PR: #PLACEHOLDER

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions