Skip to content

Feat/add litellm provider#653

Open
RheagalFire wants to merge 3 commits into
plastic-labs:mainfrom
RheagalFire:feat/add-litellm-provider
Open

Feat/add litellm provider#653
RheagalFire wants to merge 3 commits into
plastic-labs:mainfrom
RheagalFire:feat/add-litellm-provider

Conversation

@RheagalFire
Copy link
Copy Markdown

@RheagalFire RheagalFire commented May 5, 2026

Summary

  • Adds LiteLLM as a fourth provider backend alongside OpenAI, Anthropic, and Gemini
  • Enables access to 100+ LLM providers via provider-prefixed model names

Motivation

Honcho currently supports three LLM transports (OpenAI, Anthropic, Gemini) with dedicated SDK backends. LiteLLM adds a unified gateway that handles provider-specific parameter translation and authentication,
letting users route to any of 100+ providers (Bedrock, Vertex AI, Groq, Mistral, etc.) without needing a dedicated backend for each.

Changes

  • src/llm/backends/litellm.py - New LiteLLMBackend implementing the ProviderBackend protocol with complete() and stream()
  • src/llm/backends/__init__.py - Exported LiteLLMBackend
  • src/llm/registry.py - Wired LiteLLM into backend_for_provider(), client_for_model_config(), and get_backend()
  • src/config.py - Added "litellm" to ModelTransport literal type
  • pyproject.toml - Added litellm>=1.65,<1.85 as optional dependency
  • tests/test_litellm_backend.py - 6 unit tests

Tests

1. Unit tests: uv run pytest tests/test_litellm_backend.py -v

tests/test_litellm_backend.py::test_complete_calls_acompletion PASSED                                                                                                                                              
tests/test_litellm_backend.py::test_complete_omits_blank_credentials PASSED
tests/test_litellm_backend.py::test_complete_forwards_tools PASSED                                                                                                                                                 
tests/test_litellm_backend.py::test_complete_parses_tool_calls PASSED
tests/test_litellm_backend.py::test_complete_forwards_temperature PASSED                                                                                                                                           
tests/test_litellm_backend.py::test_model_transport_includes_litellm PASSED
6 passed                                                                                                                                                                                                           

(Note: tests require --override-ini="asyncio_mode=auto" or a standalone conftest to avoid the DB fixture in the root conftest. The test logic itself is self-contained with litellm mocked.)

Usage

Install:

pip install honcho[litellm]                                                                                                                                                                                        

Config (config.toml):

[models.dialectic]
transport = "litellm"
model = "anthropic/claude-sonnet-4-6"
                                                                                                                                                                                                                   
[models.deriver]
transport = "litellm"                                                                                                                                                                                              
model = "openai/gpt-4o"

Environment variables:

# LiteLLM reads provider-specific env vars automatically
export ANTHROPIC_API_KEY=sk-ant-...                                                                                                                                                                                
export OPENAI_API_KEY=sk-...

Python (direct backend usage):

from src.llm.backends.litellm import LiteLLMBackend                                                                                                                                                                
                
backend = LiteLLMBackend(api_key="sk-ant-...")                                                                                                                                                                     
result = await backend.complete(
    model="anthropic/claude-sonnet-4-6",                                                                                                                                                                           
    messages=[{"role": "user", "content": "Summarize this conversation"}],
    max_tokens=1024,                                                                                                                                                                                               
    temperature=0.3,
)                                                                                                                                                                                                                  
print(result.content)
print(f"Tokens: {result.input_tokens} in, {result.output_tokens} out")

Via ModelConfig (production path):

from src.config import ModelConfig                                                                                                                                                                                 
from src.llm.registry import get_backend

config = ModelConfig(transport="litellm", model="gemini/gemini-2.5-flash")                                                                                                                                         
backend = get_backend(config)
result = await backend.complete(                                                                                                                                                                                   
    model="gemini/gemini-2.5-flash",
    messages=[{"role": "user", "content": "Hello"}],                                                                                                                                                               
    max_tokens=512,
)                                                                                                                                                                                                                  

Risk / Compatibility

  • Additive only. Existing OpenAI, Anthropic, and Gemini backends untouched.
  • litellm is an optional dependency. Base install unaffected.
  • LiteLLM lazy-imported inside complete() and stream() to avoid import errors when not installed.
  • drop_params=True by default for cross-provider kwarg compatibility.
  • Uses OpenAIHistoryAdapter since LiteLLM follows OpenAI message format.

Summary by CodeRabbit

  • New Features

    • Added LiteLLM provider support as a new model transport option for LLM access.
  • Dependency

    • Introduced an optional liteLLM dependency group to enable the new provider.
  • Tests

    • Added a comprehensive test suite validating the LiteLLM provider backend.
  • Chores

    • Minor runtime/API surface adjustments to accommodate optional provider handling.

@RheagalFire
Copy link
Copy Markdown
Author

cc @VVoruganti

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Walkthrough

Added LiteLLM as an optional LLM backend: new LiteLLMBackend wrapping litellm.acompletion, wired into the registry and exports, ModelTransport updated to accept "litellm", optional dependency declared, runtime plan client typing relaxed, and tests added with a stubbed litellm.

Changes

LiteLLM Backend Provider Integration

Layer / File(s) Summary
Dependency & Configuration
pyproject.toml, src/config.py
Added optional dependency group litellm = ["litellm>=1.65,<1.85"] and expanded ModelTransport literal to include "litellm".
Backend Export
src/llm/backends/__init__.py
Export LiteLLMBackend by importing it and adding to __all__.
Backend Implementation
src/llm/backends/litellm.py
New LiteLLMBackend with __init__, complete(), stream(), _build_params(), _normalize_response(), and _convert_tools(); builds request params (maps thinking_effort→reasoning_effort, handles max_output_tokens override, selective extra_params), supports streaming and non-streaming flows, parses tool_calls (JSON-decoding) and maps responses to CompletionResult/StreamChunk.
Registry Wiring & History Adapter
src/llm/registry.py
Imported LiteLLMBackend; client_for_model_config() may return None for litellm; backend_for_provider() returns LiteLLMBackend() for litellm; history_adapter_for_provider() maps litellm to OpenAIHistoryAdapter; get_backend() has a litellm fast-path constructing LiteLLMBackend(api_key=..., api_base=...).
Runtime Typing
src/llm/runtime.py
AttemptPlan.client annotated as `ProviderClient
Tests
tests/test_litellm_backend.py
New test suite with autouse litellm stub (injects async acompletion mock); tests parameter forwarding (model, api_key, drop_params), omission of blank credentials, tools/tool_choice forwarding, tool-call JSON decoding into ToolCallResult, temperature forwarding, and ModelTransport includes "litellm".

Sequence Diagram

sequenceDiagram
    participant App as Client
    participant Reg as Registry
    participant Backend as LiteLLMBackend
    participant Lite as litellm Library
    participant Norm as Normalizer

    Note over App,Norm: Non-streaming complete()
    App->>Reg: request backend for transport "litellm"
    Reg-->>App: LiteLLMBackend(api_key, api_base)
    App->>Backend: complete(model, messages, max_tokens, ...)
    Backend->>Backend: _build_params() (map fields, apply extras)
    Backend->>Lite: acompletion(model, messages, max_tokens, ...)
    Lite-->>Backend: response {choices, usage, tool_calls?}
    Backend->>Norm: _normalize_response(response)
    Norm-->>Backend: CompletionResult
    Backend-->>App: CompletionResult

    Note over App,Norm: Streaming stream()
    App->>Backend: stream(...)
    Backend->>Lite: acompletion(..., stream=True)
    loop streamed chunks
        Lite-->>Backend: {delta, finish_reason?, usage?}
        Backend-->>App: StreamChunk(delta)
    end
    Backend-->>App: final StreamChunk(is_done=True, usage, finish_reason)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble code paths, litellm in sight,
I map your params and stream through the night,
Tool calls unwrapped, tokens counted true,
A tiny backend hop—happy builds to you! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat/add litellm provider' accurately summarizes the main change: adding LiteLLM as a new provider backend to the system.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_litellm_backend.py (1)

41-161: ⚡ Quick win

Add stream-path tests for terminal chunk semantics.

This suite validates complete() well, but LiteLLMBackend.stream() is currently untested. Please add focused tests for: (1) final is_done=True chunk when usage appears, and (2) fallback done chunk when only finish_reason is seen.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_litellm_backend.py` around lines 41 - 161, Add two async pytest
tests for LiteLLMBackend.stream: create test_stream_final_chunk_with_usage that
stubs litellm_stub.astream to yield normal partial chunks then a chunk
containing a non-empty usage object and assert the final yielded chunk has
is_done True and contains correct token counts; and create
test_stream_fallback_done_chunk_with_finish_reason that stubs astream to yield
partial chunks and a chunk with only finish_reason set, then assert the stream
emits a final is_done True chunk with appropriate finish_reason fallback
handling. Reference LiteLLMBackend.stream and the test stub litellm_stub.astream
(and existing helpers like _mock_response/_mock_stream_response) to locate where
to hook the stream outputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llm/backends/litellm.py`:
- Around line 54-55: The unguarded "import litellm" in the complete() and
stream() flows should be replaced with a small static helper (e.g.,
_import_litellm) that attempts to import litellm in a try/except and raises
ValidationException (from src/exceptions.py) with clear install guidance if
ModuleNotFoundError occurs; update both complete() and stream() to call this
helper instead of importing directly so callers receive the deterministic
ValidationException rather than a raw ModuleNotFoundError.

In `@src/llm/registry.py`:
- Around line 135-137: client_for_model_config currently returns None for
provider "litellm" despite its ProviderClient return type, causing a None to be
stored in AttemptPlan.client and later a runtime error in honcho_llm_call_inner;
to fix, change the "litellm" branch in client_for_model_config to raise a clear
exception (e.g., ValueError) explaining that LiteLLM credentials are managed via
get_backend and should not be resolved here, and update any call sites
(AttemptPlan.client creation and honcho_llm_call_inner) to not expect a None
from client_for_model_config for litellm so the control flow consistently uses
get_backend/CLIENTS for LiteLLM.

---

Nitpick comments:
In `@tests/test_litellm_backend.py`:
- Around line 41-161: Add two async pytest tests for LiteLLMBackend.stream:
create test_stream_final_chunk_with_usage that stubs litellm_stub.astream to
yield normal partial chunks then a chunk containing a non-empty usage object and
assert the final yielded chunk has is_done True and contains correct token
counts; and create test_stream_fallback_done_chunk_with_finish_reason that stubs
astream to yield partial chunks and a chunk with only finish_reason set, then
assert the stream emits a final is_done True chunk with appropriate
finish_reason fallback handling. Reference LiteLLMBackend.stream and the test
stub litellm_stub.astream (and existing helpers like
_mock_response/_mock_stream_response) to locate where to hook the stream
outputs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9dcecdef-c691-4849-881a-e1332cb3d449

📥 Commits

Reviewing files that changed from the base of the PR and between ad7c1b3 and 738b058.

📒 Files selected for processing (6)
  • pyproject.toml
  • src/config.py
  • src/llm/backends/__init__.py
  • src/llm/backends/litellm.py
  • src/llm/registry.py
  • tests/test_litellm_backend.py

Comment thread src/llm/backends/litellm.py Outdated
Comment thread src/llm/registry.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/llm/registry.py (1)

140-153: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

backend_for_provider discards LiteLLM credentials; only get_backend plumbs them.

For litellm, honcho_llm_call_inner receives selected_config with api_key/base_url but never passes it to backend construction. Instead, it calls backend_for_provider("litellm", None) (since client_for_model_config returns None for litellm and CLIENTS has no litellm entry), which instantiates LiteLLMBackend() with no arguments. The backend then falls back to environment variables instead of using the provided credentials.

In contrast, get_backend(config) (lines 174-175) correctly passes credentials: LiteLLMBackend(api_key=config.api_key, api_base=config.base_url).

This means any litellm ModelConfig with custom api_key/base_url set by the user is silently dropped in the normal planning flow (honcho_llm_call → honcho_llm_call_inner → backend_for_provider), causing the backend to pick up the wrong provider key or fail.

Options:

  • Plumb credentials through backend_for_provider for litellm by checking selected_config, or
  • Have honcho_llm_call_inner call get_backend(selected_config) for litellm instead of backend_for_provider, or
  • Drop the litellm branch in backend_for_provider and enforce all litellm construction through get_backend.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/registry.py` around lines 140 - 153, The litellm credentials from
ModelConfig (selected_config.api_key / api_base) are being dropped because
backend_for_provider("litellm", None) constructs LiteLLMBackend() with no args;
fix by ensuring litellm construction receives the config credentials: either
(preferred) change honcho_llm_call_inner to call get_backend(selected_config)
for provider "litellm" (so LiteLLMBackend is created with api_key/api_base), or
update backend_for_provider to accept and detect a ModelConfig/credentials
parameter and pass them into LiteLLMBackend(api_key=..., api_base=...); update
references around honcho_llm_call_inner, backend_for_provider,
client_for_model_config, and LiteLLMBackend so litellm never falls back to env
vars when selected_config supplies credentials.
🧹 Nitpick comments (3)
src/llm/backends/litellm.py (2)

163-169: 💤 Low value

Redundant if/else — both branches do the same thing.

The isinstance(... BaseModel) check is dead — both branches assign response_format unchanged. Either drop the inner branching or differentiate the value (e.g. model_json_schema() for the non-BaseModel path) if the intent was to normalize.

♻️ Proposed simplification
-        if response_format is not None:
-            if isinstance(response_format, type) and issubclass(
-                response_format, BaseModel
-            ):
-                params["response_format"] = response_format
-            else:
-                params["response_format"] = response_format
+        if response_format is not None:
+            params["response_format"] = response_format
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/litellm.py` around lines 163 - 169, The inner
isinstance/issubclass check around response_format is redundant because both
branches set params["response_format"] to the same value; simplify by removing
the branching and just assign params["response_format"] = response_format. If
the original intent was to normalize Pydantic types, instead set
params["response_format"] = response_format.model_json_schema() when
issubclass(response_format, BaseModel), otherwise assign the value directly;
update the code around the response_format handling in the function that builds
params (the response_format variable, BaseModel reference, and params dict)
accordingly.

209-223: 💤 Low value

_convert_tools only inspects the first tool.

tools[0].get("type") == "function" decides the format for the entire list. If the caller mixes Anthropic-shaped (name/description/input_schema) and OpenAI-shaped (type: function) entries — e.g. a partial migration — only one shape will be valid downstream. Iterate per-tool and convert each one whose shape doesn't already match.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/litellm.py` around lines 209 - 223, The _convert_tools
function currently checks only tools[0] to decide format for the whole list;
update _convert_tools to process each tool individually: iterate over the tools
list and for each tool, if tool.get("type") == "function" keep it as-is,
otherwise convert that single tool into the OpenAI-shaped dict with keys
"type":"function" and a "function" dict containing "name" from tool["name"],
"description" from tool.get("description"), and "parameters" from
tool.get("input_schema"); return the newly built list and defensively handle
missing keys (use .get where appropriate) so mixed-format lists are normalized
per-tool.
src/llm/registry.py (1)

156-162: 💤 Low value

Lost assert_never exhaustiveness check on the new litellm branch.

history_adapter_for_provider previously presumably ended in assert_never(provider); the new fallthrough return OpenAIHistoryAdapter() # litellm uses OpenAI message format opportunistically covers "openai" and "litellm" together but also masks any future ModelTransport value added without an explicit branch (it will silently use the OpenAI adapter rather than failing the type checker / at runtime). Consider an explicit if provider in ("openai", "litellm") followed by assert_never(provider).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/registry.py` around lines 156 - 162, The function
history_adapter_for_provider currently falls through to returning
OpenAIHistoryAdapter for any non-anthropic/gemini provider, which hides
missing-case bugs; change it to explicitly handle "openai" and "litellm" (e.g.
if provider in ("openai", "litellm"): return OpenAIHistoryAdapter()) and then
append assert_never(provider) at the end of history_adapter_for_provider so the
type-checker/runtime will catch any future unknown ModelTransport values instead
of silently using the OpenAI adapter.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/llm/backends/litellm.py`:
- Around line 61-79: complete() and stream() accept thinking_budget_tokens but
never forward it to _build_params() or construct the LiteLLM "thinking" dict, so
Anthropic-style token budgets are ignored; update complete(), stream(), and
_build_params() to accept/handle a thinking parameter by: when
thinking_budget_tokens is not None build a thinking dict like
{"type":"enabled","budget_tokens": thinking_budget_tokens} (optionally include
thinking_effort if present) and pass it into _build_params() (or have
_build_params() accept thinking_budget_tokens and assemble the thinking dict
there), ensuring the final params sent to LiteLLM include this thinking field;
alternatively, if you choose not to support it, add an explicit
log/documentation in complete()/stream() and _build_params() stating
thinking_budget_tokens is ignored.

---

Outside diff comments:
In `@src/llm/registry.py`:
- Around line 140-153: The litellm credentials from ModelConfig
(selected_config.api_key / api_base) are being dropped because
backend_for_provider("litellm", None) constructs LiteLLMBackend() with no args;
fix by ensuring litellm construction receives the config credentials: either
(preferred) change honcho_llm_call_inner to call get_backend(selected_config)
for provider "litellm" (so LiteLLMBackend is created with api_key/api_base), or
update backend_for_provider to accept and detect a ModelConfig/credentials
parameter and pass them into LiteLLMBackend(api_key=..., api_base=...); update
references around honcho_llm_call_inner, backend_for_provider,
client_for_model_config, and LiteLLMBackend so litellm never falls back to env
vars when selected_config supplies credentials.

---

Nitpick comments:
In `@src/llm/backends/litellm.py`:
- Around line 163-169: The inner isinstance/issubclass check around
response_format is redundant because both branches set params["response_format"]
to the same value; simplify by removing the branching and just assign
params["response_format"] = response_format. If the original intent was to
normalize Pydantic types, instead set params["response_format"] =
response_format.model_json_schema() when issubclass(response_format, BaseModel),
otherwise assign the value directly; update the code around the response_format
handling in the function that builds params (the response_format variable,
BaseModel reference, and params dict) accordingly.
- Around line 209-223: The _convert_tools function currently checks only
tools[0] to decide format for the whole list; update _convert_tools to process
each tool individually: iterate over the tools list and for each tool, if
tool.get("type") == "function" keep it as-is, otherwise convert that single tool
into the OpenAI-shaped dict with keys "type":"function" and a "function" dict
containing "name" from tool["name"], "description" from tool.get("description"),
and "parameters" from tool.get("input_schema"); return the newly built list and
defensively handle missing keys (use .get where appropriate) so mixed-format
lists are normalized per-tool.

In `@src/llm/registry.py`:
- Around line 156-162: The function history_adapter_for_provider currently falls
through to returning OpenAIHistoryAdapter for any non-anthropic/gemini provider,
which hides missing-case bugs; change it to explicitly handle "openai" and
"litellm" (e.g. if provider in ("openai", "litellm"): return
OpenAIHistoryAdapter()) and then append assert_never(provider) at the end of
history_adapter_for_provider so the type-checker/runtime will catch any future
unknown ModelTransport values instead of silently using the OpenAI adapter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 58eb7baa-9763-49be-99e1-342e5840e8e9

📥 Commits

Reviewing files that changed from the base of the PR and between 738b058 and 60ba21b.

📒 Files selected for processing (3)
  • src/llm/backends/litellm.py
  • src/llm/registry.py
  • src/llm/runtime.py

Comment on lines +61 to +79
thinking_budget_tokens: int | None = None,
thinking_effort: str | None = None,
max_output_tokens: int | None = None,
extra_params: dict[str, Any] | None = None,
) -> CompletionResult:
litellm = self._import_litellm()

params = self._build_params(
model=model,
messages=messages,
max_tokens=max_output_tokens or max_tokens,
temperature=temperature,
stop=stop,
tools=tools,
tool_choice=tool_choice,
response_format=response_format,
thinking_effort=thinking_effort,
extra_params=extra_params,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What is the exact parameter name and shape for setting an Anthropic thinking budget through litellm.acompletion in litellm 1.65?

💡 Result:

In liteLLM 1.65, when you use litellm.completion(...)/litellm.acompletion(...) with Anthropic extended thinking, the exact parameter is named thinking and its shape is: thinking={"type": "enabled", "budget_tokens": } Example: response = litellm.acompletion( model="anthropic/claude-sonnet-4-6", messages=[{"role": "user", "content": "…"}], thinking={"type": "enabled", "budget_tokens": 1024},) liteLLM also documents an OpenAI-style alternative mapping: reasoning_effort is mapped to Anthropic output_config.effort (not budget_tokens) for Claude 4.6/4.7 models. If you specifically need a token budget for thinking, use the thinking parameter above, not reasoning_effort. [1][2][3]

Citations:


🏁 Script executed:

# First, let's examine the litellm.py file to understand its structure
wc -l src/llm/backends/litellm.py

Repository: plastic-labs/honcho

Length of output: 95


🏁 Script executed:

# Read the file with line numbers to see the relevant sections
cat -n src/llm/backends/litellm.py | sed -n '50,180p'

Repository: plastic-labs/honcho

Length of output: 5776


🏁 Script executed:

# Also search for thinking_budget_tokens usage in the file
rg 'thinking_budget_tokens' src/llm/backends/litellm.py -n

Repository: plastic-labs/honcho

Length of output: 171


🏁 Script executed:

# Search for _build_params method to see what parameters it handles
rg '_build_params' src/llm/backends/litellm.py -A 30 -n

Repository: plastic-labs/honcho

Length of output: 3803


Forward thinking_budget_tokens parameter or document that it's ignored.

complete() and stream() accept thinking_budget_tokens but never forward it to _build_params() or use it. LiteLLM requires thinking={"type": "enabled", "budget_tokens": ...} for Anthropic extended thinking with token budgets. Without plumbing this parameter, callers will silently get incorrect behavior. Either construct and pass the thinking parameter based on thinking_budget_tokens, or explicitly log/document that this backend ignores it.

Also applies to: stream() (lines 95–113) and _build_params() (lines 135–176)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llm/backends/litellm.py` around lines 61 - 79, complete() and stream()
accept thinking_budget_tokens but never forward it to _build_params() or
construct the LiteLLM "thinking" dict, so Anthropic-style token budgets are
ignored; update complete(), stream(), and _build_params() to accept/handle a
thinking parameter by: when thinking_budget_tokens is not None build a thinking
dict like {"type":"enabled","budget_tokens": thinking_budget_tokens} (optionally
include thinking_effort if present) and pass it into _build_params() (or have
_build_params() accept thinking_budget_tokens and assemble the thinking dict
there), ensuring the final params sent to LiteLLM include this thinking field;
alternatively, if you choose not to support it, add an explicit
log/documentation in complete()/stream() and _build_params() stating
thinking_budget_tokens is ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant