feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY#1607
Open
connorblack wants to merge 2 commits into
Open
feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY#1607connorblack wants to merge 2 commits into
connorblack wants to merge 2 commits into
Conversation
Hindsight previously only supported a single global llm_extra_body, set via HINDSIGHT_API_LLM_EXTRA_BODY. This forced an all-or-nothing choice for reasoning-model knobs like enable_thinking — useful for retain and consolidation (structured JSON extraction; thinking just adds latency) but breaks reflect (an agentic tool loop that needs reasoning to decide when to call tools). Add per-scope env vars matching the existing per-scope provider/model/ timeout pattern: HINDSIGHT_API_RETAIN_LLM_EXTRA_BODY HINDSIGHT_API_REFLECT_LLM_EXTRA_BODY HINDSIGHT_API_CONSOLIDATION_LLM_EXTRA_BODY Each parses as JSON and falls back to the global llm_extra_body when unset. Wiring is a single-line change at each LLMConfig instantiation in MemoryEngine. Tests: - 4 new tests in tests/test_per_scope_llm_extra_body.py covering parse, fallback, and invalid-JSON paths - All 13 existing per_operation_llm_config tests still pass
There was a problem hiding this comment.
Pull request overview
This PR adds per-scope overrides for LLMConfig.extra_body so retain/reflect/consolidation can independently supply JSON extra_body values via environment variables, falling back to the existing global HINDSIGHT_API_LLM_EXTRA_BODY when the per-scope var is not set.
Changes:
- Added
HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODYenv vars and correspondingHindsightConfigfields parsed as JSON. - Wired per-scope
extra_bodyintoMemoryEngine’s per-operationLLMConfigconstruction with fallback to the global extra_body. - Added a new test module covering unset/global/per-scope parsing and invalid JSON.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
hindsight-api-slim/hindsight_api/config.py |
Adds per-scope env var constants + config fields and parses them from env as JSON. |
hindsight-api-slim/hindsight_api/engine/memory_engine.py |
Uses per-scope extra_body values (when present) when creating per-operation LLMConfigs. |
hindsight-api-slim/tests/test_per_scope_llm_extra_body.py |
Introduces tests for per-scope parsing, fallback behavior, and invalid JSON handling. |
Comments suppressed due to low confidence (2)
hindsight-api-slim/hindsight_api/engine/memory_engine.py:603
- Same issue as above:
config.reflect_llm_extra_body or config.llm_extra_bodytreats{}as falsy and will incorrectly fall back to the global extra_body. Use an explicitNonecheck so an empty per-scope dict can intentionally override the global config.
self._reflect_llm_config = LLMConfig(
provider=reflect_provider,
api_key=reflect_api_key,
base_url=reflect_base_url,
model=reflect_model,
extra_body=config.reflect_llm_extra_body or config.llm_extra_body,
default_headers=config.llm_default_headers,
litellmrouter_config=config.reflect_llm_litellmrouter_config or config.llm_litellmrouter_config,
)
hindsight-api-slim/hindsight_api/engine/memory_engine.py:627
- Same issue as above:
config.consolidation_llm_extra_body or config.llm_extra_bodywill not allow an explicit empty object ({}) to override the global extra_body. Switch to an explicitNonecheck to distinguish “unset” from “set to empty dict”.
self._consolidation_llm_config = LLMConfig(
provider=consolidation_provider,
api_key=consolidation_api_key,
base_url=consolidation_base_url,
model=consolidation_model,
extra_body=config.consolidation_llm_extra_body or config.llm_extra_body,
default_headers=config.llm_default_headers,
litellmrouter_config=config.consolidation_llm_litellmrouter_config or config.llm_litellmrouter_config,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+576
to
578
| extra_body=config.retain_llm_extra_body or config.llm_extra_body, | ||
| default_headers=config.llm_default_headers, | ||
| litellmrouter_config=config.retain_llm_litellmrouter_config or config.llm_litellmrouter_config, |
| assert config.retain_llm_extra_body is None | ||
| assert config.reflect_llm_extra_body is None | ||
| assert config.consolidation_llm_extra_body is None | ||
|
|
vLLM v0.20.2 + nightly still ship open bugs #35936 and #33965 where
tool_choice="required" and tool_choice={"type":"function","function":
{"name":...}} both bypass the configured --tool-call-parser and use
JSON-only validation. For Qwen3 models that emit XML-style tool calls
via the qwen3_coder parser, this silently returns tool_calls=[] while
finish_reason still reports "tool_calls". Hindsight reflect's first
three iterations use forced specific-name tool_choice to drive the
hierarchical retrieval path (mental_models -> observations -> recall),
which means those iterations are no-ops when run against a current vLLM
serving any Qwen3 instruct/coder/MoE model.
Add HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE env var that, when
set, falls back to tool_choice="auto" for every iteration. The model
still calls retrieval tools when given factual queries — just without
API-level forcing.
Default behavior is unchanged. Operators on affected vLLM builds can
opt in until the upstream PRs land, at which point the env var becomes
a no-op and can be removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hindsight currently exposes only a single global
HINDSIGHT_API_LLM_EXTRA_BODY. This forces an all-or-nothing choice for reasoning-model knobs like Qwen3'schat_template_kwargs.enable_thinking— useful for retain/consolidation (structured JSON extraction; thinking adds latency without quality gain) but breaks reflect (an agentic tool loop that needs reasoning to decide whether to call tools).We hit this concretely with Qwen3.6-35B-A3B-FP8 served via vLLM: with
enable_thinking=falseset globally for consolidation throughput, every reflect call returned 0 tool calls and short-circuited to "I don't have information" without ever searching the bank.Change
Add per-scope env vars matching the existing per-scope provider/model/timeout pattern:
HINDSIGHT_API_RETAIN_LLM_EXTRA_BODYHINDSIGHT_API_REFLECT_LLM_EXTRA_BODYHINDSIGHT_API_CONSOLIDATION_LLM_EXTRA_BODYEach parses as JSON. When set, it overrides the global
HINDSIGHT_API_LLM_EXTRA_BODYfor that scope only. When unset, falls back to the global. When neither is set, no extra_body is sent (existing behavior).Wiring is a single-line change at each
LLMConfiginstantiation inMemoryEngine.__init__:Test plan
tests/test_per_scope_llm_extra_body.py:tests/test_per_operation_llm_config.pytests still passruff checkclean on touched files