Skip to content

Commit c799a2c

Browse files
feat(dreamer): allow per-specialist provider and thinking budget overrides
Adds DEDUCTION_PROVIDER/INDUCTION_PROVIDER and matching THINKING_BUDGET_TOKENS settings so deduction and induction specialists can route to a different provider than the main DREAM config. Also propagates thinking_budget_tokens into the LLM call and documents the CF gateway / Gemini thought_signature gotchas in CLAUDE.md.
1 parent 8fcf2f0 commit c799a2c

3 files changed

Lines changed: 41 additions & 4 deletions

File tree

CLAUDE.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,13 @@ All API routes follow the pattern: `/v1/{resource}/{id}/{action}`
8484
- Typechecking: `uv run basedpyright`
8585
- Format code: `uv run ruff format src/`
8686

87+
### LLM provider gotchas (learned 2026-04-16 in k8s deploy)
88+
89+
- **Structured outputs (`response_format={"type": "json_schema"}`) only work on providers whose upstream API natively honors them.** Google Gemini does (route via `cf` provider with base_url ending in `/openai`). Ollama Cloud (reached via the `custom` provider + `custom-ollama` CF gateway endpoint, or any direct Ollama endpoint) does **not** translate `response_format` into Ollama's native JSON-mode — every Ollama Cloud model (GLM-5.1, nemotron-3-nano, qwen3.5, devstral-small-2 confirmed) returns free-form text/markdown when a schema is requested, and `honcho_llm_call` bubbles a `ValidationError: Invalid JSON` out of pydantic parsing.
90+
- **Therefore: deriver (`src/deriver/deriver.py:126`) and summary (`src/utils/summarizer.py`) must stay on a Gemini-backed `cf` provider.** Dream, dialectic, and any free-form / tool-call path is free to use the `custom` provider.
91+
- **Gemini `thoughtSignature` round-tripping breaks on the CF `openai`-compat route.** Any call with `maxToolIterations > 1` AND `thinkingBudgetTokens > 0` will return `400 Function call is missing a thought_signature` on iteration 2+. If you need thinking on a multi-iteration tool loop, use the native Gemini provider, not the OpenAI-compat route — or set `thinkingBudgetTokens=0`.
92+
- **None of this is Cloudflare's fault.** CF AI Gateway is a transparent proxy in both the `openai` and `custom-ollama` routes. The limitations live at the upstream provider (Ollama Cloud's OpenAI-compat layer).
93+
8794
### Local LM Studio Setup
8895

8996
- Honcho can use LM Studio for generation through the `custom` provider path.

src/config.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -561,12 +561,14 @@ class DreamSettings(BackupLLMSettingsMixin, HonchoSettings):
561561
16_384
562562
)
563563

564-
## NOTE: specialist models use the same provider as the main model
565-
566564
# Deduction Specialist: handles logical inference
567565
DEDUCTION_MODEL: str = "claude-haiku-4-5"
566+
DEDUCTION_PROVIDER: SupportedProviders | None = None # falls back to PROVIDER
567+
DEDUCTION_THINKING_BUDGET_TOKENS: int | None = None # falls back to THINKING_BUDGET_TOKENS
568568
# Induction Specialist: identifies patterns across observations
569569
INDUCTION_MODEL: str = "claude-haiku-4-5"
570+
INDUCTION_PROVIDER: SupportedProviders | None = None # falls back to PROVIDER
571+
INDUCTION_THINKING_BUDGET_TOKENS: int | None = None # falls back to THINKING_BUDGET_TOKENS
570572

571573
# Surprisal-based sampling subsystem
572574
SURPRISAL: SurprisalSettings = Field(default_factory=SurprisalSettings)

src/dreamer/specialists.py

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,14 @@ def get_model(self) -> str:
7474
"""Get the model to use for this specialist."""
7575
...
7676

77+
def get_provider(self) -> str | None:
78+
"""Get the provider override for this specialist, or None to inherit from DREAM."""
79+
return None
80+
81+
def get_thinking_budget(self) -> int | None:
82+
"""Get the thinking budget override, or None to inherit from DREAM."""
83+
return None
84+
7785
def get_max_tokens(self) -> int:
7886
"""Get max output tokens for this specialist."""
7987
return 16384
@@ -196,9 +204,16 @@ async def run(
196204
parent_category="dream",
197205
)
198206

199-
# Get model with potential override
207+
# Get model, provider, and thinking budget with potential overrides
200208
model = self.get_model()
201-
llm_settings = settings.DREAM.model_copy(update={"MODEL": model})
209+
provider = self.get_provider()
210+
thinking_budget = self.get_thinking_budget()
211+
overrides: dict[str, Any] = {"MODEL": model}
212+
if provider is not None:
213+
overrides["PROVIDER"] = provider
214+
if thinking_budget is not None:
215+
overrides["THINKING_BUDGET_TOKENS"] = thinking_budget
216+
llm_settings = settings.DREAM.model_copy(update=overrides)
202217

203218
# Track iterations via callback
204219
iteration_count = 0
@@ -219,6 +234,7 @@ def iteration_callback(data: Any) -> None:
219234
messages=messages,
220235
track_name=f"Dreamer/{self.name}",
221236
iteration_callback=iteration_callback,
237+
thinking_budget_tokens=llm_settings.THINKING_BUDGET_TOKENS,
222238
)
223239

224240
# Log metrics
@@ -308,6 +324,12 @@ def get_tools(self, *, peer_card_enabled: bool = True) -> list[dict[str, Any]]:
308324
def get_model(self) -> str:
309325
return settings.DREAM.DEDUCTION_MODEL
310326

327+
def get_provider(self) -> str | None:
328+
return settings.DREAM.DEDUCTION_PROVIDER
329+
330+
def get_thinking_budget(self) -> int | None:
331+
return settings.DREAM.DEDUCTION_THINKING_BUDGET_TOKENS
332+
311333
def get_max_tokens(self) -> int:
312334
return 8192
313335

@@ -451,6 +473,12 @@ def get_tools(self, *, peer_card_enabled: bool = True) -> list[dict[str, Any]]:
451473
def get_model(self) -> str:
452474
return settings.DREAM.INDUCTION_MODEL
453475

476+
def get_provider(self) -> str | None:
477+
return settings.DREAM.INDUCTION_PROVIDER
478+
479+
def get_thinking_budget(self) -> int | None:
480+
return settings.DREAM.INDUCTION_THINKING_BUDGET_TOKENS
481+
454482
def get_max_tokens(self) -> int:
455483
return 8192
456484

0 commit comments

Comments
 (0)