feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY by connorblack · Pull Request #1607 · vectorize-io/hindsight

connorblack · 2026-05-13T09:17:35Z

Summary

Hindsight currently exposes only a single global HINDSIGHT_API_LLM_EXTRA_BODY. This forces an all-or-nothing choice for reasoning-model knobs like Qwen3's chat_template_kwargs.enable_thinking — useful for retain/consolidation (structured JSON extraction; thinking adds latency without quality gain) but breaks reflect (an agentic tool loop that needs reasoning to decide whether to call tools).

We hit this concretely with Qwen3.6-35B-A3B-FP8 served via vLLM: with enable_thinking=false set globally for consolidation throughput, every reflect call returned 0 tool calls and short-circuited to "I don't have information" without ever searching the bank.

Change

Add per-scope env vars matching the existing per-scope provider/model/timeout pattern:

HINDSIGHT_API_RETAIN_LLM_EXTRA_BODY
HINDSIGHT_API_REFLECT_LLM_EXTRA_BODY
HINDSIGHT_API_CONSOLIDATION_LLM_EXTRA_BODY

Each parses as JSON. When set, it overrides the global HINDSIGHT_API_LLM_EXTRA_BODY for that scope only. When unset, falls back to the global. When neither is set, no extra_body is sent (existing behavior).

Wiring is a single-line change at each LLMConfig instantiation in MemoryEngine.__init__:

extra_body=config.<scope>_llm_extra_body or config.llm_extra_body,

Test plan

4 new tests in tests/test_per_scope_llm_extra_body.py:
- unset → all fields None
- per-scope env parses as JSON
- global only → per-scope stays None (fallback happens at engine init, not parse time)
- invalid JSON raises at parse time
All 13 existing tests/test_per_operation_llm_config.py tests still pass
ruff check clean on touched files

Hindsight previously only supported a single global llm_extra_body, set via HINDSIGHT_API_LLM_EXTRA_BODY. This forced an all-or-nothing choice for reasoning-model knobs like enable_thinking — useful for retain and consolidation (structured JSON extraction; thinking just adds latency) but breaks reflect (an agentic tool loop that needs reasoning to decide when to call tools). Add per-scope env vars matching the existing per-scope provider/model/ timeout pattern: HINDSIGHT_API_RETAIN_LLM_EXTRA_BODY HINDSIGHT_API_REFLECT_LLM_EXTRA_BODY HINDSIGHT_API_CONSOLIDATION_LLM_EXTRA_BODY Each parses as JSON and falls back to the global llm_extra_body when unset. Wiring is a single-line change at each LLMConfig instantiation in MemoryEngine. Tests: - 4 new tests in tests/test_per_scope_llm_extra_body.py covering parse, fallback, and invalid-JSON paths - All 13 existing per_operation_llm_config tests still pass

Copilot

Pull request overview

This PR adds per-scope overrides for LLMConfig.extra_body so retain/reflect/consolidation can independently supply JSON extra_body values via environment variables, falling back to the existing global HINDSIGHT_API_LLM_EXTRA_BODY when the per-scope var is not set.

Changes:

Added HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY env vars and corresponding HindsightConfig fields parsed as JSON.
Wired per-scope extra_body into MemoryEngine’s per-operation LLMConfig construction with fallback to the global extra_body.
Added a new test module covering unset/global/per-scope parsing and invalid JSON.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`hindsight-api-slim/hindsight_api/config.py`	Adds per-scope env var constants + config fields and parses them from env as JSON.
`hindsight-api-slim/hindsight_api/engine/memory_engine.py`	Uses per-scope extra_body values (when present) when creating per-operation `LLMConfig`s.
`hindsight-api-slim/tests/test_per_scope_llm_extra_body.py`	Introduces tests for per-scope parsing, fallback behavior, and invalid JSON handling.

Comments suppressed due to low confidence (2)

hindsight-api-slim/hindsight_api/engine/memory_engine.py:603

Same issue as above: config.reflect_llm_extra_body or config.llm_extra_body treats {} as falsy and will incorrectly fall back to the global extra_body. Use an explicit None check so an empty per-scope dict can intentionally override the global config.

        self._reflect_llm_config = LLMConfig(
            provider=reflect_provider,
            api_key=reflect_api_key,
            base_url=reflect_base_url,
            model=reflect_model,
            extra_body=config.reflect_llm_extra_body or config.llm_extra_body,
            default_headers=config.llm_default_headers,
            litellmrouter_config=config.reflect_llm_litellmrouter_config or config.llm_litellmrouter_config,
        )

hindsight-api-slim/hindsight_api/engine/memory_engine.py:627

Same issue as above: config.consolidation_llm_extra_body or config.llm_extra_body will not allow an explicit empty object ({}) to override the global extra_body. Switch to an explicit None check to distinguish “unset” from “set to empty dict”.

        self._consolidation_llm_config = LLMConfig(
            provider=consolidation_provider,
            api_key=consolidation_api_key,
            base_url=consolidation_base_url,
            model=consolidation_model,
            extra_body=config.consolidation_llm_extra_body or config.llm_extra_body,
            default_headers=config.llm_default_headers,
            litellmrouter_config=config.consolidation_llm_litellmrouter_config or config.llm_litellmrouter_config,
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            extra_body=config.retain_llm_extra_body or config.llm_extra_body,
            default_headers=config.llm_default_headers,
            litellmrouter_config=config.retain_llm_litellmrouter_config or config.llm_litellmrouter_config,


+        assert config.retain_llm_extra_body is None
+        assert config.reflect_llm_extra_body is None
+        assert config.consolidation_llm_extra_body is None
+


vLLM v0.20.2 + nightly still ship open bugs #35936 and #33965 where tool_choice="required" and tool_choice={"type":"function","function": {"name":...}} both bypass the configured --tool-call-parser and use JSON-only validation. For Qwen3 models that emit XML-style tool calls via the qwen3_coder parser, this silently returns tool_calls=[] while finish_reason still reports "tool_calls". Hindsight reflect's first three iterations use forced specific-name tool_choice to drive the hierarchical retrieval path (mental_models -> observations -> recall), which means those iterations are no-ops when run against a current vLLM serving any Qwen3 instruct/coder/MoE model. Add HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE env var that, when set, falls back to tool_choice="auto" for every iteration. The model still calls retrieval tools when given factual queries — just without API-level forcing. Default behavior is unchanged. Operators on affected vLLM builds can opt in until the upstream PRs land, at which point the env var becomes a no-op and can be removed.

nicoloboschi

Two changes requested before merge:

1. Remove `HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE` from this PR

The reflect/agent.py change is scope creep — the PR title and summary advertise only the per-scope LLM_EXTRA_BODY env vars, but agent.py quietly adds a second, unrelated env var with no tests and no documentation. Please remove it from this PR.

If we still want that escape hatch, it should land separately and:

Be parsed once in HindsightConfig.from_env() with an ENV_* constant (matching the established .lower() == "true" pattern at config.py:1664-1687), not via raw os.getenv() inside the per-iteration hot path at agent.py:579.
Be read off the engine's config rather than re-importing os in agent.py just for this.
Have at least one test that monkeypatches the env var and asserts iter_tool_choice == "auto".

2. Document the three new env vars

hindsight-docs/docs/developer/configuration.md already documents the global HINDSIGHT_API_LLM_EXTRA_BODY (line 190) and the full per-scope table for provider/model/timeout (lines 378–404). Please add the three new vars (HINDSIGHT_API_{RETAIN,REFLECT,CONSOLIDATION}_LLM_EXTRA_BODY) to that table following the existing "Falls back to HINDSIGHT_API_LLM_EXTRA_BODY" phrasing — CLAUDE.md requires docs updates for any new config flag.

The core extra-body change itself is clean and good to go once these two items are addressed.

nicoloboschi · 2026-06-05T09:13:01Z

closing as stale

Copilot AI review requested due to automatic review settings May 13, 2026 09:17

Copilot started reviewing on behalf of connorblack May 13, 2026 09:18 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

kevin-ho mentioned this pull request May 18, 2026

feat(config): per-operation LLM provider/model endpoints (extraction vs consolidation vs reflect) #1646

Closed

nicoloboschi requested changes May 25, 2026

View reviewed changes

nicoloboschi closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY#1607

feat(config): add per-scope HINDSIGHT_API_<SCOPE>_LLM_EXTRA_BODY#1607
connorblack wants to merge 2 commits into
vectorize-io:mainfrom
connorblack:feat/per-scope-llm-extra-body

connorblack commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

nicoloboschi left a comment

Uh oh!

nicoloboschi commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

connorblack commented May 13, 2026

Summary

Change

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

nicoloboschi left a comment

Choose a reason for hiding this comment

1. Remove HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE from this PR

2. Document the three new env vars

Uh oh!

nicoloboschi commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Remove `HINDSIGHT_API_REFLECT_DISABLE_FORCED_TOOL_CHOICE` from this PR