fix(llm): wire default_headers into LiteLLM-backed providers (#2458)#2466
Merged
nicoloboschi merged 4 commits intoJun 30, 2026
Merged
Conversation
…ze-io#2458) HINDSIGHT_API_LLM_DEFAULT_HEADERS is documented and parsed but only wired into the Anthropic provider, so it silently no-ops for the litellm / litellmrouter / bedrock providers -- the proxy-routing providers where custom headers (auditing, policy, request-tracing) matter most. The create_llm_provider docstring even noted "other providers may opt in as needed"; this opts the LiteLLM-backed providers in. Forward the configured headers to litellm.acompletion via the extra_headers kwarg, mirroring the existing Anthropic default_headers wiring. setdefault keeps any explicit per-call extra_headers authoritative, and the dict is defensively copied on construction and per call to avoid cross-request contamination. LiteLLMRouterLLM inherits this through its **kwargs forward to the shared LiteLLM base. Adds regression tests covering storage, the acompletion extra_headers path, the no-headers omission, router forwarding, and copy-isolation. Closes vectorize-io#2458
The Router subclass overrides _build_common_kwargs without calling super(), so stored default_headers never reached acompletion for the litellmrouter provider. Inject extra_headers in the override too, and replace the storage-only router test with call()-driven coverage.
Newer ruff collapses two multi-line log strings that now fit the line length. The file was byte-identical to main; this brings it in sync with the lint gate so verify-generated-files passes.
nicoloboschi
added a commit
that referenced
this pull request
Jun 30, 2026
…der (#2452) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after #2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.
nicoloboschi
added a commit
that referenced
this pull request
Jun 30, 2026
…der (#2452) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after #2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.
nicoloboschi
added a commit
that referenced
this pull request
Jun 30, 2026
…der (#2452) (#2470) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after #2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.
oliverv
pushed a commit
to oliverv/collabmind-hindsight
that referenced
this pull request
Jul 1, 2026
…der (vectorize-io#2452) (vectorize-io#2470) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after vectorize-io#2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.
oliverv
pushed a commit
to oliverv/collabmind-hindsight
that referenced
this pull request
Jul 1, 2026
…ze-io#2458) (vectorize-io#2466) * fix(llm): wire default_headers into LiteLLM-backed providers (vectorize-io#2458) HINDSIGHT_API_LLM_DEFAULT_HEADERS is documented and parsed but only wired into the Anthropic provider, so it silently no-ops for the litellm / litellmrouter / bedrock providers -- the proxy-routing providers where custom headers (auditing, policy, request-tracing) matter most. The create_llm_provider docstring even noted "other providers may opt in as needed"; this opts the LiteLLM-backed providers in. Forward the configured headers to litellm.acompletion via the extra_headers kwarg, mirroring the existing Anthropic default_headers wiring. setdefault keeps any explicit per-call extra_headers authoritative, and the dict is defensively copied on construction and per call to avoid cross-request contamination. LiteLLMRouterLLM inherits this through its **kwargs forward to the shared LiteLLM base. Adds regression tests covering storage, the acompletion extra_headers path, the no-headers omission, router forwarding, and copy-isolation. Closes vectorize-io#2458 * fix(llm): forward default_headers from LiteLLM Router call path The Router subclass overrides _build_common_kwargs without calling super(), so stored default_headers never reached acompletion for the litellmrouter provider. Inject extra_headers in the override too, and replace the storage-only router test with call()-driven coverage. * style: apply ruff format to migrations.py (pre-existing lint drift) Newer ruff collapses two multi-line log strings that now fit the line length. The file was byte-identical to main; this brings it in sync with the lint gate so verify-generated-files passes. --------- Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
HINDSIGHT_API_LLM_DEFAULT_HEADERSis a documented, parsed config knob — itsllm_default_headersfield description advertises it "for proxies / request tracing". But the factory only forwards it to the Anthropic provider; thelitellm,litellmrouter, andbedrock(LiteLLM-alias) branches drop it, so it silently no-ops for exactly the proxy-routing providers where custom headers (auditing, policy, request-tracing) matter most.The
create_llm_providerdocstring already anticipated this: "Currently wired into the Anthropic provider; other providers may opt in as needed." This PR opts the LiteLLM-backed providers in.How
LiteLLMLLM.__init__acceptsdefault_headers(appended afterbedrock_service_tierto preserve positional ABI) and stores a defensive copy._build_common_kwargsforwards them tolitellm.acompletionas theextra_headerskwarg, usingsetdefaultso an explicit per-callextra_headersstays authoritative, and a per-call copy so LiteLLM/downstream can't mutate the stored dict.default_headersinto thelitellm,litellmrouter, andbedrockbranches.LiteLLMRouterLLMinherits it through its existing**kwargsforward to the shared base.Scoped to the LiteLLM-backed providers per the issue.
openai_compatibleandgeminiremain unwired and can opt in as a follow-up if needed.Tests
tests/test_llm_extra_body.pymirrors the existingextra_bodycoverage: storage, theacompletionextra_headerspath, the no-headers omission, router forwarding, and copy-isolation (caller dict + per-call). All 26 tests pass locally (base deps, no ML extras);ruff check/ruff formatclean.Closes #2458