Skip to content

fix: prewarm boundary tiktoken encoding at API startup#285

Open
rendigua2025-gif wants to merge 1 commit into
EverMind-AI:mainfrom
rendigua2025-gif:codex/prewarm-boundary-tokenizer
Open

fix: prewarm boundary tiktoken encoding at API startup#285
rendigua2025-gif wants to merge 1 commit into
EverMind-AI:mainfrom
rendigua2025-gif:codex/prewarm-boundary-tokenizer

Conversation

@rendigua2025-gif

Copy link
Copy Markdown

Summary

Fixes #277.

The first /api/v1/memory/add request could trigger an on-demand tiktoken download for the boundary tokenizer encoding (o200k_base). If that network/TLS request failed, users saw a generic 500 from the write path even though no memory data had been written yet.

This PR adds a small FastAPI lifespan provider that prewarms the boundary tokenizer during API startup.

Scope

  • Resolve tiktoken.get_encoding("o200k_base") during API startup.
  • Fail before serving traffic if the encoding cannot be resolved.
  • Keep /add from being the first place that discovers this cold-start dependency.

This is not full offline support. If the encoding is not already cached and the environment cannot reach the tokenizer asset, startup will fail with a clearer tokenizer prewarm error. A separate change would be needed to bundle/cache tokenizer assets for fully offline deployments.

Tests

  • tests/unit/test_entrypoints/test_api/test_lifespans/test_boundary_tokenizer.py
  • tests/unit/test_entrypoints/test_api/test_lifespans/test_llm.py
  • tests/unit/test_entrypoints/test_api/test_lifespans/test_cascade.py

Note: I verified locally on Windows with a test-process-only fcntl stub because the current repository still imports POSIX fcntl during test collection on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

First /add can fail when tiktoken downloads o200k_base at runtime

1 participant