Failed to run local embedding model using llama.cpp #22210

Wei-W2025-code · 2026-04-21T12:14:54Z

Wei-W2025-code
Apr 21, 2026

Strat Code :

C:\MyAPP\llama-b8851-bin-win-vulkan-x64>llama-server.exe -m "D:\APP\ModelScope\gguf\bge_zh.gguf" --port 9090 --embedding

My pthon code:

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    base_url="http://127.0.0.1:9090/v1",  
    api_key="dummy",                     
    model="Bgelargezh"                       
)

print("testing the service...")

try:
    result = embeddings.embed_query("hi this is an embedding test")
    print(f"sucssesful")
    print(f"len: {len(result)}")
    

except Exception as e:
    print(f"false: {e}")

Service OutPut :

slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
srv    send_error: task id = 4, error: Prompt contains invalid tokens
srv  process_sing: failed to launch slot with task, id_task = 4
srv  update_slots: all slots are idle
srv          stop: cancel task, id_task = 4
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/embeddings 127.0.0.1 400

My Cmd Answer：

testing the service...
false: Error code: 400 - {'error': {'code': 400, 'message': 'Prompt contains invalid tokens', 'type': 'invalid_request_error'}}

MukundaKatta · 2026-04-21T17:22:52Z

MukundaKatta
Apr 21, 2026

"Invalid tokens" on embeddings usually points at the tokenizer/vocab, not the request format. Likely causes: vocab mismatch if the GGUF was re-quantized without matching tokenizer.json, missing --pooling mean flag (BGE needs it), or the LangChain request shape the OpenAI-compat shim doesn't accept. Verify with a raw curl against /v1/embeddings first to isolate. Rerun llama-server -v and the logs show tokenized IDs, any out-of-range one is the confirmation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to run local embedding model using llama.cpp #22210

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Failed to run local embedding model using llama.cpp #22210

Uh oh!

Uh oh!

Wei-W2025-code Apr 21, 2026

Replies: 1 comment

Uh oh!

Uh oh!

MukundaKatta Apr 21, 2026

Wei-W2025-code
Apr 21, 2026

MukundaKatta
Apr 21, 2026