fix(llm): propagate per-scope LLM timeout + retry policy to the provider (#2452)#2470
Merged
Merged
Conversation
cde6e1a to
e12b659
Compare
…der (#2452) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after #2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.
e12b659 to
e33d93a
Compare
Contributor
|
I did a focused local pass on this branch: cd hindsight-api-slim
env -u ALL_PROXY -u all_proxy -u HTTPS_PROXY -u https_proxy -u HTTP_PROXY -u http_proxy \
uv run pytest tests/test_llm_timeout_propagation.py tests/test_multi_llm_config.py -q
uv run ruff check hindsight_api/engine/llm_wrapper.py hindsight_api/engine/memory_engine.py tests/test_llm_timeout_propagation.py tests/test_multi_llm_config.pyResult: The current A rerun may pass if Vertex AI is faster. If it keeps flaking, the smallest CI-only fix would be to give this smoke startup loop more headroom or skip startup LLM verification in |
oliverv
pushed a commit
to oliverv/collabmind-hindsight
that referenced
this pull request
Jul 1, 2026
…der (vectorize-io#2452) (vectorize-io#2470) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after vectorize-io#2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2452. Several per-operation LLM request settings were resolved into
HindsightConfigbut never reached the provider that uses them, so configuring them was a silent no-op. This PR threads them through for all operation scopes (default / retain / reflect / consolidation).What was dead
*_llm_timeout(retain/reflect/consolidation) and the globalllm_timeoutnever reached the provider impl — it fell back toHINDSIGHT_API_LLM_TIMEOUT/120s, soHINDSIGHT_API_RETAIN_LLM_TIMEOUT=300did nothing (LiteLLM call exceeded timeout=120.0s).reflect_llm_max_retries/initial_backoff/max_backoffandconsolidation_llm_initial_backoff/max_backoffwere never consumed. Reflect and consolidation used the hardcodedcall()/call_with_tools()defaults (10/5), ignoring the documented "falls back tollm_max_retries" contract.Fix
MemoryEngineresolves each operation's effective request defaults (per-op override else global) into a small_LLMCallDefaultsbundle and carries them on theLLMProvider:config → LLMProvider → create_llm_provider → provider implfor the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous).Nonepreserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical.LLMProviderinstance defaults thatcall()/call_with_tools()use when the per-call argument is omitted. Explicit per-call args still win (retain's resolved values; reflect's deliberately-fast structured-extraction call). Providers built without config (from_env, tests) keep the 10/5 method fallback.The four operation scopes and multi-LLM chain members all share their operation's resolved values.
Behaviour note
Reflect/consolidation calls that previously fell through to the hardcoded
call()default of 10 retries now follow the documentedllm_max_retries(default 3) — i.e. the config the docs always claimed governed them. Per-op overrides (HINDSIGHT_API_REFLECT_LLM_MAX_RETRIES, …) now take effect.max_concurrent— left as-is + doc clarifiedThe per-op
*_LLM_MAX_CONCURRENTcaps are already wired (process-global composed semaphores, added in a prior PR). They're intentionally not moved onto the per-op config path — they're server-level, env-only, fixed at startup. The docs are updated to call out that distinction vs. the timeout/retry/backoff knobs.Drive-by
Fixes a pre-existing breakage in
test_llm_router_provider's__new__-based helper (missing_default_headersafter #2466) so the suite is green on this branch.Tests
tests/test_llm_timeout_propagation.py:_provider_impl.timeout(LiteLLM + OpenAI-compatible);None→DEFAULT_LLM_TIMEOUTregression guard;call()retry policy: falls back to the provider's configured default, an explicit per-call arg overrides it, and an unconfigured provider keeps the 10 method fallback;MemoryEngineresolves each per-op timeout + retry/backoff override and falls back to global / default.Verified locally: full focused LLM/provider/per-op/retry suite green (115 passed);
ruff/tyclean.