Skip to content

fix(llm): wire default_headers into LiteLLM-backed providers (#2458)#2466

Merged
nicoloboschi merged 4 commits into
vectorize-io:mainfrom
r266-tech:fix/litellm-default-headers-2458
Jun 30, 2026
Merged

fix(llm): wire default_headers into LiteLLM-backed providers (#2458)#2466
nicoloboschi merged 4 commits into
vectorize-io:mainfrom
r266-tech:fix/litellm-default-headers-2458

Conversation

@r266-tech

Copy link
Copy Markdown
Contributor

What

HINDSIGHT_API_LLM_DEFAULT_HEADERS is a documented, parsed config knob — its llm_default_headers field description advertises it "for proxies / request tracing". But the factory only forwards it to the Anthropic provider; the litellm, litellmrouter, and bedrock (LiteLLM-alias) branches drop it, so it silently no-ops for exactly the proxy-routing providers where custom headers (auditing, policy, request-tracing) matter most.

The create_llm_provider docstring already anticipated this: "Currently wired into the Anthropic provider; other providers may opt in as needed." This PR opts the LiteLLM-backed providers in.

How

  • LiteLLMLLM.__init__ accepts default_headers (appended after bedrock_service_tier to preserve positional ABI) and stores a defensive copy.
  • _build_common_kwargs forwards them to litellm.acompletion as the extra_headers kwarg, using setdefault so an explicit per-call extra_headers stays authoritative, and a per-call copy so LiteLLM/downstream can't mutate the stored dict.
  • The factory passes default_headers into the litellm, litellmrouter, and bedrock branches. LiteLLMRouterLLM inherits it through its existing **kwargs forward to the shared base.
  • Docstring updated to match.

Scoped to the LiteLLM-backed providers per the issue. openai_compatible and gemini remain unwired and can opt in as a follow-up if needed.

Tests

tests/test_llm_extra_body.py mirrors the existing extra_body coverage: storage, the acompletion extra_headers path, the no-headers omission, router forwarding, and copy-isolation (caller dict + per-call). All 26 tests pass locally (base deps, no ML extras); ruff check / ruff format clean.

Note: touches litellm_llm.py / test_llm_extra_body.py alongside open #2462 (omit temperature for reasoning models) but in disjoint regions — should rebase cleanly.

Closes #2458

r266-tech and others added 4 commits June 30, 2026 08:45
…ze-io#2458)

HINDSIGHT_API_LLM_DEFAULT_HEADERS is documented and parsed but only wired
into the Anthropic provider, so it silently no-ops for the litellm /
litellmrouter / bedrock providers -- the proxy-routing providers where
custom headers (auditing, policy, request-tracing) matter most. The
create_llm_provider docstring even noted "other providers may opt in as
needed"; this opts the LiteLLM-backed providers in.

Forward the configured headers to litellm.acompletion via the extra_headers
kwarg, mirroring the existing Anthropic default_headers wiring. setdefault
keeps any explicit per-call extra_headers authoritative, and the dict is
defensively copied on construction and per call to avoid cross-request
contamination. LiteLLMRouterLLM inherits this through its **kwargs forward
to the shared LiteLLM base.

Adds regression tests covering storage, the acompletion extra_headers path,
the no-headers omission, router forwarding, and copy-isolation.

Closes vectorize-io#2458
The Router subclass overrides _build_common_kwargs without calling super(),
so stored default_headers never reached acompletion for the litellmrouter
provider. Inject extra_headers in the override too, and replace the
storage-only router test with call()-driven coverage.
Newer ruff collapses two multi-line log strings that now fit the line
length. The file was byte-identical to main; this brings it in sync with
the lint gate so verify-generated-files passes.
@nicoloboschi nicoloboschi merged commit 5e73d5f into vectorize-io:main Jun 30, 2026
87 checks passed
nicoloboschi added a commit that referenced this pull request Jun 30, 2026
…der (#2452)

The per-operation LLM request settings were resolved into HindsightConfig but
never reached the provider that uses them, so configuring them was a silent
no-op:

- *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never
  reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so
  HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded
  timeout=120.0s").
- reflect_llm_max_retries/initial_backoff/max_backoff and
  consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and
  consolidation used the hardcoded call()/call_with_tools() defaults (10/5),
  ignoring the documented "falls back to llm_max_retries" contract.

Fix: resolve each operation's effective request defaults (per-op override else
global) in MemoryEngine and carry them on the LLMProvider:

- timeout is threaded config -> LLMProvider -> create_llm_provider -> provider
  impl for the providers that honour a configurable request timeout (LiteLLM,
  LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own
  default, so Anthropic/Gemini keep their bespoke timeouts and the no-config
  path is byte-identical.
- max_retries/initial_backoff/max_backoff become LLMProvider instance defaults
  that call()/call_with_tools() use when the per-call arg is omitted. Explicit
  per-call args (retain's resolved values, reflect's fast structured-extraction
  path) still win; providers built without config (from_env, tests) keep the
  10/5 method fallback.

The four operation scopes (default/retain/reflect/consolidation) and multi-LLM
chain members all share their operation's resolved values via a small
_LLMCallDefaults bundle.

max_concurrent is intentionally left as-is (process-global semaphores read from
env at startup, server-level only); the docs are clarified to call out that
distinction.

Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based
helper (missing _default_headers after #2466) so the suite is green.

Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout
threading, the call() retry-policy fallback/override, and per-op
resolution/fallback in MemoryEngine.
nicoloboschi added a commit that referenced this pull request Jun 30, 2026
…der (#2452)

The per-operation LLM request settings were resolved into HindsightConfig but
never reached the provider that uses them, so configuring them was a silent
no-op:

- *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never
  reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so
  HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded
  timeout=120.0s").
- reflect_llm_max_retries/initial_backoff/max_backoff and
  consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and
  consolidation used the hardcoded call()/call_with_tools() defaults (10/5),
  ignoring the documented "falls back to llm_max_retries" contract.

Fix: resolve each operation's effective request defaults (per-op override else
global) in MemoryEngine and carry them on the LLMProvider:

- timeout is threaded config -> LLMProvider -> create_llm_provider -> provider
  impl for the providers that honour a configurable request timeout (LiteLLM,
  LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own
  default, so Anthropic/Gemini keep their bespoke timeouts and the no-config
  path is byte-identical.
- max_retries/initial_backoff/max_backoff become LLMProvider instance defaults
  that call()/call_with_tools() use when the per-call arg is omitted. Explicit
  per-call args (retain's resolved values, reflect's fast structured-extraction
  path) still win; providers built without config (from_env, tests) keep the
  10/5 method fallback.

The four operation scopes (default/retain/reflect/consolidation) and multi-LLM
chain members all share their operation's resolved values via a small
_LLMCallDefaults bundle.

max_concurrent is intentionally left as-is (process-global semaphores read from
env at startup, server-level only); the docs are clarified to call out that
distinction.

Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based
helper (missing _default_headers after #2466) so the suite is green.

Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout
threading, the call() retry-policy fallback/override, and per-op
resolution/fallback in MemoryEngine.
nicoloboschi added a commit that referenced this pull request Jun 30, 2026
…der (#2452) (#2470)

The per-operation LLM request settings were resolved into HindsightConfig but
never reached the provider that uses them, so configuring them was a silent
no-op:

- *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never
  reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so
  HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded
  timeout=120.0s").
- reflect_llm_max_retries/initial_backoff/max_backoff and
  consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and
  consolidation used the hardcoded call()/call_with_tools() defaults (10/5),
  ignoring the documented "falls back to llm_max_retries" contract.

Fix: resolve each operation's effective request defaults (per-op override else
global) in MemoryEngine and carry them on the LLMProvider:

- timeout is threaded config -> LLMProvider -> create_llm_provider -> provider
  impl for the providers that honour a configurable request timeout (LiteLLM,
  LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own
  default, so Anthropic/Gemini keep their bespoke timeouts and the no-config
  path is byte-identical.
- max_retries/initial_backoff/max_backoff become LLMProvider instance defaults
  that call()/call_with_tools() use when the per-call arg is omitted. Explicit
  per-call args (retain's resolved values, reflect's fast structured-extraction
  path) still win; providers built without config (from_env, tests) keep the
  10/5 method fallback.

The four operation scopes (default/retain/reflect/consolidation) and multi-LLM
chain members all share their operation's resolved values via a small
_LLMCallDefaults bundle.

max_concurrent is intentionally left as-is (process-global semaphores read from
env at startup, server-level only); the docs are clarified to call out that
distinction.

Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based
helper (missing _default_headers after #2466) so the suite is green.

Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout
threading, the call() retry-policy fallback/override, and per-op
resolution/fallback in MemoryEngine.
oliverv pushed a commit to oliverv/collabmind-hindsight that referenced this pull request Jul 1, 2026
…der (vectorize-io#2452) (vectorize-io#2470)

The per-operation LLM request settings were resolved into HindsightConfig but
never reached the provider that uses them, so configuring them was a silent
no-op:

- *_llm_timeout (retain/reflect/consolidation) and the global llm_timeout never
  reached the provider impl; it fell back to HINDSIGHT_API_LLM_TIMEOUT/120s, so
  HINDSIGHT_API_RETAIN_LLM_TIMEOUT=300 did nothing ("LiteLLM call exceeded
  timeout=120.0s").
- reflect_llm_max_retries/initial_backoff/max_backoff and
  consolidation_llm_initial_backoff/max_backoff were never consumed; reflect and
  consolidation used the hardcoded call()/call_with_tools() defaults (10/5),
  ignoring the documented "falls back to llm_max_retries" contract.

Fix: resolve each operation's effective request defaults (per-op override else
global) in MemoryEngine and carry them on the LLMProvider:

- timeout is threaded config -> LLMProvider -> create_llm_provider -> provider
  impl for the providers that honour a configurable request timeout (LiteLLM,
  LiteLLM Router, OpenAI-compatible, Nous). None preserves each provider's own
  default, so Anthropic/Gemini keep their bespoke timeouts and the no-config
  path is byte-identical.
- max_retries/initial_backoff/max_backoff become LLMProvider instance defaults
  that call()/call_with_tools() use when the per-call arg is omitted. Explicit
  per-call args (retain's resolved values, reflect's fast structured-extraction
  path) still win; providers built without config (from_env, tests) keep the
  10/5 method fallback.

The four operation scopes (default/retain/reflect/consolidation) and multi-LLM
chain members all share their operation's resolved values via a small
_LLMCallDefaults bundle.

max_concurrent is intentionally left as-is (process-global semaphores read from
env at startup, server-level only); the docs are clarified to call out that
distinction.

Also fixes a pre-existing breakage in test_llm_router_provider's __new__-based
helper (missing _default_headers after vectorize-io#2466) so the suite is green.

Tests: tests/test_llm_timeout_propagation.py covers provider-impl timeout
threading, the call() retry-policy fallback/override, and per-op
resolution/fallback in MemoryEngine.
oliverv pushed a commit to oliverv/collabmind-hindsight that referenced this pull request Jul 1, 2026
…ze-io#2458) (vectorize-io#2466)

* fix(llm): wire default_headers into LiteLLM-backed providers (vectorize-io#2458)

HINDSIGHT_API_LLM_DEFAULT_HEADERS is documented and parsed but only wired
into the Anthropic provider, so it silently no-ops for the litellm /
litellmrouter / bedrock providers -- the proxy-routing providers where
custom headers (auditing, policy, request-tracing) matter most. The
create_llm_provider docstring even noted "other providers may opt in as
needed"; this opts the LiteLLM-backed providers in.

Forward the configured headers to litellm.acompletion via the extra_headers
kwarg, mirroring the existing Anthropic default_headers wiring. setdefault
keeps any explicit per-call extra_headers authoritative, and the dict is
defensively copied on construction and per call to avoid cross-request
contamination. LiteLLMRouterLLM inherits this through its **kwargs forward
to the shared LiteLLM base.

Adds regression tests covering storage, the acompletion extra_headers path,
the no-headers omission, router forwarding, and copy-isolation.

Closes vectorize-io#2458

* fix(llm): forward default_headers from LiteLLM Router call path

The Router subclass overrides _build_common_kwargs without calling super(),
so stored default_headers never reached acompletion for the litellmrouter
provider. Inject extra_headers in the override too, and replace the
storage-only router test with call()-driven coverage.

* style: apply ruff format to migrations.py (pre-existing lint drift)

Newer ruff collapses two multi-line log strings that now fit the line
length. The file was byte-identical to main; this brings it in sync with
the lint gate so verify-generated-files passes.

---------

Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire HINDSIGHT_API_LLM_DEFAULT_HEADERS into LiteLLM provider

2 participants