fix(llm): propagate per-scope LLM timeout + retry policy to the provider (#2452) by nicoloboschi · Pull Request #2470 · vectorize-io/hindsight

nicoloboschi · 2026-06-30T08:53:43Z

Summary

Fixes #2452. Several per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op. This PR threads them through for all operation scopes (default / retain / reflect / consolidation).

What was dead

*_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl — it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing (LiteLLM call exceeded timeout=120.0s).
reflect_llm_max_retries / initial_backoff / max_backoff and consolidation_llm_initial_backoff / max_backoff were never consumed. Reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract.

Fix

MemoryEngine resolves each operation's effective request defaults (per-op override else global) into a small _LLMCallDefaults bundle and carries them on the LLMProvider:

timeout → threaded config → LLMProvider → create_llm_provider → provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical.
max_retries / initial_backoff / max_backoff → become LLMProvider instance defaults that call()/call_with_tools() use when the per-call argument is omitted. Explicit per-call args still win (retain's resolved values; reflect's deliberately-fast structured-extraction call). Providers built without config (from_env, tests) keep the 10/5 method fallback.

The four operation scopes and multi-LLM chain members all share their operation's resolved values.

Behaviour note

Reflect/consolidation calls that previously fell through to the hardcoded call() default of 10 retries now follow the documented llm_max_retries (default 3) — i.e. the config the docs always claimed governed them. Per-op overrides (HINDSIGHT_API_REFLECT_LLM_MAX_RETRIES, …) now take effect.

`max_concurrent` — left as-is + doc clarified

The per-op *_LLM_MAX_CONCURRENT caps are already wired (process-global composed semaphores, added in a prior PR). They're intentionally not moved onto the per-op config path — they're server-level, env-only, fixed at startup. The docs are updated to call out that distinction vs. the timeout/retry/backoff knobs.

Drive-by

Fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after #2466) so the suite is green on this branch.

Tests

tests/test_llm_timeout_propagation.py:

timeout reaches _provider_impl.timeout (LiteLLM + OpenAI-compatible); None → DEFAULT_LLM_TIMEOUT regression guard;
call() retry policy: falls back to the provider's configured default, an explicit per-call arg overrides it, and an unconfigured provider keeps the 10 method fallback;
MemoryEngine resolves each per-op timeout + retry/backoff override and falls back to global / default.

Verified locally: full focused LLM/provider/per-op/retry suite green (115 passed); ruff/ty clean.

…der (#2452) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after #2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.

koriyoshi2041 · 2026-06-30T11:48:54Z

I did a focused local pass on this branch:

cd hindsight-api-slim
env -u ALL_PROXY -u all_proxy -u HTTPS_PROXY -u https_proxy -u HTTP_PROXY -u http_proxy \
  uv run pytest tests/test_llm_timeout_propagation.py tests/test_multi_llm_config.py -q
uv run ruff check hindsight_api/engine/llm_wrapper.py hindsight_api/engine/memory_engine.py tests/test_llm_timeout_propagation.py tests/test_multi_llm_config.py

Result: 39 passed for the focused tests; ruff clean.

The current test-pip-slim CI failure looks timing-related rather than a test/assertion failure. The log shows the server starts at 09:32:17, Vertex AI verification takes 46.726s, migrations complete and the DB pool is created at 09:33:16.273, and the 60s health loop exits at 09:33:16.697. So the app was still finishing startup right at the health-check deadline.

A rerun may pass if Vertex AI is faster. If it keeps flaking, the smallest CI-only fix would be to give this smoke startup loop more headroom or skip startup LLM verification in test-pip-slim if the test only needs retain/recall smoke coverage.

…der (vectorize-io#2452) (vectorize-io#2470) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after vectorize-io#2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.

nicoloboschi force-pushed the fix/llm-timeout-propagation-2452 branch from cde6e1a to e12b659 Compare June 30, 2026 09:19

nicoloboschi changed the title ~~fix(retain): propagate per-scope LLM timeout to the provider impl (#2452)~~ fix(llm): propagate per-scope LLM timeout + retry policy to the provider (#2452) Jun 30, 2026

nicoloboschi force-pushed the fix/llm-timeout-propagation-2452 branch from e12b659 to e33d93a Compare June 30, 2026 09:31

nicoloboschi merged commit 40fe7aa into main Jun 30, 2026
94 of 101 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(llm): propagate per-scope LLM timeout + retry policy to the provider (#2452)#2470

fix(llm): propagate per-scope LLM timeout + retry policy to the provider (#2452)#2470
nicoloboschi merged 1 commit into
mainfrom
fix/llm-timeout-propagation-2452

nicoloboschi commented Jun 30, 2026 •

edited

Loading

Uh oh!

koriyoshi2041 commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nicoloboschi commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What was dead

Fix

Behaviour note

max_concurrent — left as-is + doc clarified

Drive-by

Tests

Uh oh!

koriyoshi2041 commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nicoloboschi commented Jun 30, 2026 •

edited

Loading

`max_concurrent` — left as-is + doc clarified