fix(llm): thread per-scope LLM timeout into constructed providers (#2452)#2471
Open
r266-tech wants to merge 1 commit into
Open
fix(llm): thread per-scope LLM timeout into constructed providers (#2452)#2471r266-tech wants to merge 1 commit into
r266-tech wants to merge 1 commit into
Conversation
The retain/reflect/consolidation LLM timeouts were resolved into config but never passed to the provider, so they were silently ignored and every call fell back to the global HINDSIGHT_API_LLM_TIMEOUT / 120s default. Thread timeout through create_llm_provider and LLMProvider into the LiteLLM (and bedrock-alias) constructions, and pass the resolved per-op value from the MemoryEngine builds (global llm_timeout fallback). Add threading tests. Fixes vectorize-io#2452
d36d219 to
146819b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The per-scope retain/reflect/consolidation LLM timeouts are resolved into config but never threaded into the constructed LLM provider, so they are silently ignored: a retain call always falls back to the global
HINDSIGHT_API_LLM_TIMEOUT/ 120s default regardless ofHINDSIGHT_API_RETAIN_LLM_TIMEOUT. Fixes #2452.Root cause
config.retain_llm_timeout(and the reflect/consolidation variants) are parsed inconfig.py, but the per-opLLMConfigbuilds inmemory_engine.pynever passedtimeout=to the provider.LLMProvider/create_llm_providerhad notimeoutparameter, soLiteLLMLLM.__init__fell back toos.getenv(ENV_LLM_TIMEOUT, DEFAULT_LLM_TIMEOUT)— the global value — no matter the per-scope setting.HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300was therefore a no-op; a long retain call still aborted at 120s, producing theLiteLLM call exceeded timeout=120ssymptom from the issue.Fix
Thread
timeoutthrough the same plumbing the service-tier / default-headers config already uses:create_llm_providerandLLMProvider.__init__gain atimeout: float | Noneparam, forwarded into thelitellmandbedrockLiteLLM constructions (LiteLLMLLM.__init__already acceptstimeout).LLMConfigbuilds inmemory_engine.pypasstimeout=config.<op>_llm_timeout if not None else config.llm_timeoutfor retain/reflect/consolidation; the default/member builds andfrom_envpass the globalllm_timeout. Theis not Nonecheck keeps an explicit0honored and never resolves toNone(which would disable the existingasyncio.wait_forhard-timeout backstop).Non-LiteLLM providers ignore
timeout(the same way they ignoreservice_tier); the reported symptom and the existing hard-timeout backstop are both LiteLLM-side.Tests
tests/test_llm_timeout_threading.pymirrorstest_bedrock_service_tier.py: it asserts the timeout reachesLiteLLMLLMvia bothLLMConfigandcreate_llm_provider(litellm + bedrock alias) and that an unset timeout still resolves to the finite default (neverNone). The new tests +test_bedrock_service_tier.py+test_litellm_timeout.pypass; broadertest_llm_provider/test_config_validation/test_llm_per_op_concurrencyare green;ruff checkandruff format --checkclean. (Like the service-tier wiring tests, coverage is at theLLMProvider -> provider_implbridge; thememory_engine.pyper-optimeout=lines sit alongside the existingservice_tier=lines.)