fix(llm): wire default_headers into LiteLLM-backed providers (#2458) by r266-tech · Pull Request #2466 · vectorize-io/hindsight

r266-tech · 2026-06-30T00:46:47Z

What

HINDSIGHT_API_LLM_DEFAULT_HEADERS is a documented, parsed config knob — its llm_default_headers field description advertises it "for proxies / request tracing". But the factory only forwards it to the Anthropic provider; the litellm, litellmrouter, and bedrock (LiteLLM-alias) branches drop it, so it silently no-ops for exactly the proxy-routing providers where custom headers (auditing, policy, request-tracing) matter most.

The create_llm_provider docstring already anticipated this: "Currently wired into the Anthropic provider; other providers may opt in as needed." This PR opts the LiteLLM-backed providers in.

How

LiteLLMLLM.__init__ accepts default_headers (appended after bedrock_service_tier to preserve positional ABI) and stores a defensive copy.
_build_common_kwargs forwards them to litellm.acompletion as the extra_headers kwarg, using setdefault so an explicit per-call extra_headers stays authoritative, and a per-call copy so LiteLLM/downstream can't mutate the stored dict.
The factory passes default_headers into the litellm, litellmrouter, and bedrock branches. LiteLLMRouterLLM inherits it through its existing **kwargs forward to the shared base.
Docstring updated to match.

Scoped to the LiteLLM-backed providers per the issue. openai_compatible and gemini remain unwired and can opt in as a follow-up if needed.

Tests

tests/test_llm_extra_body.py mirrors the existing extra_body coverage: storage, the acompletion extra_headers path, the no-headers omission, router forwarding, and copy-isolation (caller dict + per-call). All 26 tests pass locally (base deps, no ML extras); ruff check / ruff format clean.

Note: touches litellm_llm.py / test_llm_extra_body.py alongside open #2462 (omit temperature for reasoning models) but in disjoint regions — should rebase cleanly.

Closes #2458

…ze-io#2458) HINDSIGHT_API_LLM_DEFAULT_HEADERS is documented and parsed but only wired into the Anthropic provider, so it silently no-ops for the litellm / litellmrouter / bedrock providers -- the proxy-routing providers where custom headers (auditing, policy, request-tracing) matter most. The create_llm_provider docstring even noted "other providers may opt in as needed"; this opts the LiteLLM-backed providers in. Forward the configured headers to litellm.acompletion via the extra_headers kwarg, mirroring the existing Anthropic default_headers wiring. setdefault keeps any explicit per-call extra_headers authoritative, and the dict is defensively copied on construction and per call to avoid cross-request contamination. LiteLLMRouterLLM inherits this through its **kwargs forward to the shared LiteLLM base. Adds regression tests covering storage, the acompletion extra_headers path, the no-headers omission, router forwarding, and copy-isolation. Closes vectorize-io#2458

The Router subclass overrides _build_common_kwargs without calling super(), so stored default_headers never reached acompletion for the litellmrouter provider. Inject extra_headers in the override too, and replace the storage-only router test with call()-driven coverage.

…eaders-2458

Newer ruff collapses two multi-line log strings that now fit the line length. The file was byte-identical to main; this brings it in sync with the lint gate so verify-generated-files passes.

…der (#2452) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after #2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.

…der (#2452) (#2470) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after #2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.

…der (vectorize-io#2452) (vectorize-io#2470) The per-operation LLM request settings were resolved into HindsightConfig but never reached the provider that uses them, so configuring them was a silent no-op: - *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded timeout=120.0s"). - reflect_llm_max_retries/initial_backoff/max_backoff and consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and consolidation used the hardcoded call()/call_with_tools() defaults (10/5), ignoring the documented "falls back to llm_max_retries" contract. Fix: resolve each operation's effective request defaults (per-op override else global) in MemoryEngine and carry them on the LLMProvider: - timeout is threaded config -> LLMProvider -> create_llm_provider -> provider impl for the providers that honour a configurable request timeout (LiteLLM, LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own default, so Anthropic/Gemini keep their bespoke timeouts and the no-config path is byte-identical. - max_retries/initial_backoff/max_backoff become LLMProvider instance defaults that call()/call_with_tools() use when the per-call arg is omitted. Explicit per-call args (retain's resolved values, reflect's fast structured-extraction path) still win; providers built without config (from_env, tests) keep the 10/5 method fallback. The four operation scopes (default/retain/reflect/consolidation) and multi-LLM chain members all share their operation's resolved values via a small _LLMCallDefaults bundle. max_concurrent is intentionally left as-is (process-global semaphores read from env at startup, server-level only); the docs are clarified to call out that distinction. Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based helper (missing _default_headers after vectorize-io#2466) so the suite is green. Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout threading, the call() retry-policy fallback/override, and per-op resolution/fallback in MemoryEngine.

…ze-io#2458) (vectorize-io#2466) * fix(llm): wire default_headers into LiteLLM-backed providers (vectorize-io#2458) HINDSIGHT_API_LLM_DEFAULT_HEADERS is documented and parsed but only wired into the Anthropic provider, so it silently no-ops for the litellm / litellmrouter / bedrock providers -- the proxy-routing providers where custom headers (auditing, policy, request-tracing) matter most. The create_llm_provider docstring even noted "other providers may opt in as needed"; this opts the LiteLLM-backed providers in. Forward the configured headers to litellm.acompletion via the extra_headers kwarg, mirroring the existing Anthropic default_headers wiring. setdefault keeps any explicit per-call extra_headers authoritative, and the dict is defensively copied on construction and per call to avoid cross-request contamination. LiteLLMRouterLLM inherits this through its **kwargs forward to the shared LiteLLM base. Adds regression tests covering storage, the acompletion extra_headers path, the no-headers omission, router forwarding, and copy-isolation. Closes vectorize-io#2458 * fix(llm): forward default_headers from LiteLLM Router call path The Router subclass overrides _build_common_kwargs without calling super(), so stored default_headers never reached acompletion for the litellmrouter provider. Inject extra_headers in the override too, and replace the storage-only router test with call()-driven coverage. * style: apply ruff format to migrations.py (pre-existing lint drift) Newer ruff collapses two multi-line log strings that now fit the line length. The file was byte-identical to main; this brings it in sync with the lint gate so verify-generated-files passes. --------- Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>

r266-tech and others added 4 commits June 30, 2026 08:45

Merge remote-tracking branch 'origin/main' into fix/litellm-default-h…

7029761

…eaders-2458

style: apply ruff format to migrations.py (pre-existing lint drift)

f25e4d4

Newer ruff collapses two multi-line log strings that now fit the line length. The file was byte-identical to main; this brings it in sync with the lint gate so verify-generated-files passes.

nicoloboschi merged commit 5e73d5f into vectorize-io:main Jun 30, 2026
87 checks passed

nicoloboschi mentioned this pull request Jun 30, 2026

fix(llm): propagate per-scope LLM timeout + retry policy to the provider (#2452) #2470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(llm): wire default_headers into LiteLLM-backed providers (#2458)#2466

fix(llm): wire default_headers into LiteLLM-backed providers (#2458)#2466
nicoloboschi merged 4 commits into
vectorize-io:mainfrom
r266-tech:fix/litellm-default-headers-2458

r266-tech commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

r266-tech commented Jun 30, 2026

What

How

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants