Skip to content

fix(llm): thread per-scope LLM timeout into constructed providers (#2452)#2471

Open
r266-tech wants to merge 1 commit into
vectorize-io:mainfrom
r266-tech:fix/llm-per-scope-timeout-2452
Open

fix(llm): thread per-scope LLM timeout into constructed providers (#2452)#2471
r266-tech wants to merge 1 commit into
vectorize-io:mainfrom
r266-tech:fix/llm-per-scope-timeout-2452

Conversation

@r266-tech

Copy link
Copy Markdown
Contributor

Summary

The per-scope retain/reflect/consolidation LLM timeouts are resolved into config but never threaded into the constructed LLM provider, so they are silently ignored: a retain call always falls back to the global HINDSIGHT_API_LLM_TIMEOUT / 120s default regardless of HINDSIGHT_API_RETAIN_LLM_TIMEOUT. Fixes #2452.

Root cause

config.retain_llm_timeout (and the reflect/consolidation variants) are parsed in config.py, but the per-op LLMConfig builds in memory_engine.py never passed timeout= to the provider. LLMProvider / create_llm_provider had no timeout parameter, so LiteLLMLLM.__init__ fell back to os.getenv(ENV_LLM_TIMEOUT, DEFAULT_LLM_TIMEOUT) — the global value — no matter the per-scope setting. HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 was therefore a no-op; a long retain call still aborted at 120s, producing the LiteLLM call exceeded timeout=120s symptom from the issue.

Fix

Thread timeout through the same plumbing the service-tier / default-headers config already uses:

  • create_llm_provider and LLMProvider.__init__ gain a timeout: float | None param, forwarded into the litellm and bedrock LiteLLM constructions (LiteLLMLLM.__init__ already accepts timeout).
  • The per-op LLMConfig builds in memory_engine.py pass timeout=config.<op>_llm_timeout if not None else config.llm_timeout for retain/reflect/consolidation; the default/member builds and from_env pass the global llm_timeout. The is not None check keeps an explicit 0 honored and never resolves to None (which would disable the existing asyncio.wait_for hard-timeout backstop).

Non-LiteLLM providers ignore timeout (the same way they ignore service_tier); the reported symptom and the existing hard-timeout backstop are both LiteLLM-side.

Tests

tests/test_llm_timeout_threading.py mirrors test_bedrock_service_tier.py: it asserts the timeout reaches LiteLLMLLM via both LLMConfig and create_llm_provider (litellm + bedrock alias) and that an unset timeout still resolves to the finite default (never None). The new tests + test_bedrock_service_tier.py + test_litellm_timeout.py pass; broader test_llm_provider / test_config_validation / test_llm_per_op_concurrency are green; ruff check and ruff format --check clean. (Like the service-tier wiring tests, coverage is at the LLMProvider -> provider_impl bridge; the memory_engine.py per-op timeout= lines sit alongside the existing service_tier= lines.)

The retain/reflect/consolidation LLM timeouts were resolved into config
but never passed to the provider, so they were silently ignored and every
call fell back to the global HINDSIGHT_API_LLM_TIMEOUT / 120s default.

Thread timeout through create_llm_provider and LLMProvider into the LiteLLM
(and bedrock-alias) constructions, and pass the resolved per-op value from
the MemoryEngine builds (global llm_timeout fallback). Add threading tests.

Fixes vectorize-io#2452
@r266-tech r266-tech force-pushed the fix/llm-per-scope-timeout-2452 branch from d36d219 to 146819b Compare June 30, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(retain): per-scope retain timeout does not propagate to the underlying provider implementation

1 participant