Skip to content

feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY#1607

Open
connorblack wants to merge 2 commits into
vectorize-io:mainfrom
connorblack:feat/per-scope-llm-extra-body
Open

feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY#1607
connorblack wants to merge 2 commits into
vectorize-io:mainfrom
connorblack:feat/per-scope-llm-extra-body

Conversation

@connorblack
Copy link
Copy Markdown
Contributor

Summary

Hindsight currently exposes only a single global HINDSIGHT_API_LLM_EXTRA_BODY. This forces an all-or-nothing choice for reasoning-model knobs like Qwen3's chat_template_kwargs.enable_thinking — useful for retain/consolidation (structured JSON extraction; thinking adds latency without quality gain) but breaks reflect (an agentic tool loop that needs reasoning to decide whether to call tools).

We hit this concretely with Qwen3.6-35B-A3B-FP8 served via vLLM: with enable_thinking=false set globally for consolidation throughput, every reflect call returned 0 tool calls and short-circuited to "I don't have information" without ever searching the bank.

Change

Add per-scope env vars matching the existing per-scope provider/model/timeout pattern:

  • HINDSIGHT_API_RETAIN_LLM_EXTRA_BODY
  • HINDSIGHT_API_REFLECT_LLM_EXTRA_BODY
  • HINDSIGHT_API_CONSOLIDATION_LLM_EXTRA_BODY

Each parses as JSON. When set, it overrides the global HINDSIGHT_API_LLM_EXTRA_BODY for that scope only. When unset, falls back to the global. When neither is set, no extra_body is sent (existing behavior).

Wiring is a single-line change at each LLMConfig instantiation in MemoryEngine.__init__:

extra_body=config.<scope>_llm_extra_body or config.llm_extra_body,

Test plan

  • 4 new tests in tests/test_per_scope_llm_extra_body.py:
    • unset → all fields None
    • per-scope env parses as JSON
    • global only → per-scope stays None (fallback happens at engine init, not parse time)
    • invalid JSON raises at parse time
  • All 13 existing tests/test_per_operation_llm_config.py tests still pass
  • ruff check clean on touched files

Hindsight previously only supported a single global llm_extra_body, set
via HINDSIGHT_API_LLM_EXTRA_BODY. This forced an all-or-nothing choice
for reasoning-model knobs like enable_thinking — useful for retain and
consolidation (structured JSON extraction; thinking just adds latency)
but breaks reflect (an agentic tool loop that needs reasoning to decide
when to call tools).

Add per-scope env vars matching the existing per-scope provider/model/
timeout pattern:

  HINDSIGHT_API_RETAIN_LLM_EXTRA_BODY
  HINDSIGHT_API_REFLECT_LLM_EXTRA_BODY
  HINDSIGHT_API_CONSOLIDATION_LLM_EXTRA_BODY

Each parses as JSON and falls back to the global llm_extra_body when
unset. Wiring is a single-line change at each LLMConfig instantiation in
MemoryEngine.

Tests:
- 4 new tests in tests/test_per_scope_llm_extra_body.py covering parse,
  fallback, and invalid-JSON paths
- All 13 existing per_operation_llm_config tests still pass
Copilot AI review requested due to automatic review settings May 13, 2026 09:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds per-scope overrides for LLMConfig.extra_body so retain/reflect/consolidation can independently supply JSON extra_body values via environment variables, falling back to the existing global HINDSIGHT_API_LLM_EXTRA_BODY when the per-scope var is not set.

Changes:

  • Added HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY env vars and corresponding HindsightConfig fields parsed as JSON.
  • Wired per-scope extra_body into MemoryEngine’s per-operation LLMConfig construction with fallback to the global extra_body.
  • Added a new test module covering unset/global/per-scope parsing and invalid JSON.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
hindsight-api-slim/hindsight_api/config.py Adds per-scope env var constants + config fields and parses them from env as JSON.
hindsight-api-slim/hindsight_api/engine/memory_engine.py Uses per-scope extra_body values (when present) when creating per-operation LLMConfigs.
hindsight-api-slim/tests/test_per_scope_llm_extra_body.py Introduces tests for per-scope parsing, fallback behavior, and invalid JSON handling.
Comments suppressed due to low confidence (2)

hindsight-api-slim/hindsight_api/engine/memory_engine.py:603

  • Same issue as above: config.reflect_llm_extra_body or config.llm_extra_body treats {} as falsy and will incorrectly fall back to the global extra_body. Use an explicit None check so an empty per-scope dict can intentionally override the global config.
        self._reflect_llm_config = LLMConfig(
            provider=reflect_provider,
            api_key=reflect_api_key,
            base_url=reflect_base_url,
            model=reflect_model,
            extra_body=config.reflect_llm_extra_body or config.llm_extra_body,
            default_headers=config.llm_default_headers,
            litellmrouter_config=config.reflect_llm_litellmrouter_config or config.llm_litellmrouter_config,
        )

hindsight-api-slim/hindsight_api/engine/memory_engine.py:627

  • Same issue as above: config.consolidation_llm_extra_body or config.llm_extra_body will not allow an explicit empty object ({}) to override the global extra_body. Switch to an explicit None check to distinguish “unset” from “set to empty dict”.
        self._consolidation_llm_config = LLMConfig(
            provider=consolidation_provider,
            api_key=consolidation_api_key,
            base_url=consolidation_base_url,
            model=consolidation_model,
            extra_body=config.consolidation_llm_extra_body or config.llm_extra_body,
            default_headers=config.llm_default_headers,
            litellmrouter_config=config.consolidation_llm_litellmrouter_config or config.llm_litellmrouter_config,
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +576 to 578
extra_body=config.retain_llm_extra_body or config.llm_extra_body,
default_headers=config.llm_default_headers,
litellmrouter_config=config.retain_llm_litellmrouter_config or config.llm_litellmrouter_config,
assert config.retain_llm_extra_body is None
assert config.reflect_llm_extra_body is None
assert config.consolidation_llm_extra_body is None

vLLM v0.20.2 + nightly still ship open bugs #35936 and #33965 where
tool_choice="required" and tool_choice={"type":"function","function":
{"name":...}} both bypass the configured --tool-call-parser and use
JSON-only validation. For Qwen3 models that emit XML-style tool calls
via the qwen3_coder parser, this silently returns tool_calls=[] while
finish_reason still reports "tool_calls". Hindsight reflect's first
three iterations use forced specific-name tool_choice to drive the
hierarchical retrieval path (mental_models -> observations -> recall),
which means those iterations are no-ops when run against a current vLLM
serving any Qwen3 instruct/coder/MoE model.

Add HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE env var that, when
set, falls back to tool_choice="auto" for every iteration. The model
still calls retrieval tools when given factual queries — just without
API-level forcing.

Default behavior is unchanged. Operators on affected vLLM builds can
opt in until the upstream PRs land, at which point the env var becomes
a no-op and can be removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants