Feat/add litellm provider#653
Conversation
|
cc @VVoruganti |
WalkthroughAdded LiteLLM as an optional LLM backend: new ChangesLiteLLM Backend Provider Integration
Sequence DiagramsequenceDiagram
participant App as Client
participant Reg as Registry
participant Backend as LiteLLMBackend
participant Lite as litellm Library
participant Norm as Normalizer
Note over App,Norm: Non-streaming complete()
App->>Reg: request backend for transport "litellm"
Reg-->>App: LiteLLMBackend(api_key, api_base)
App->>Backend: complete(model, messages, max_tokens, ...)
Backend->>Backend: _build_params() (map fields, apply extras)
Backend->>Lite: acompletion(model, messages, max_tokens, ...)
Lite-->>Backend: response {choices, usage, tool_calls?}
Backend->>Norm: _normalize_response(response)
Norm-->>Backend: CompletionResult
Backend-->>App: CompletionResult
Note over App,Norm: Streaming stream()
App->>Backend: stream(...)
Backend->>Lite: acompletion(..., stream=True)
loop streamed chunks
Lite-->>Backend: {delta, finish_reason?, usage?}
Backend-->>App: StreamChunk(delta)
end
Backend-->>App: final StreamChunk(is_done=True, usage, finish_reason)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_litellm_backend.py (1)
41-161: ⚡ Quick winAdd stream-path tests for terminal chunk semantics.
This suite validates
complete()well, butLiteLLMBackend.stream()is currently untested. Please add focused tests for: (1) finalis_done=Truechunk when usage appears, and (2) fallback done chunk when onlyfinish_reasonis seen.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_litellm_backend.py` around lines 41 - 161, Add two async pytest tests for LiteLLMBackend.stream: create test_stream_final_chunk_with_usage that stubs litellm_stub.astream to yield normal partial chunks then a chunk containing a non-empty usage object and assert the final yielded chunk has is_done True and contains correct token counts; and create test_stream_fallback_done_chunk_with_finish_reason that stubs astream to yield partial chunks and a chunk with only finish_reason set, then assert the stream emits a final is_done True chunk with appropriate finish_reason fallback handling. Reference LiteLLMBackend.stream and the test stub litellm_stub.astream (and existing helpers like _mock_response/_mock_stream_response) to locate where to hook the stream outputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/llm/backends/litellm.py`:
- Around line 54-55: The unguarded "import litellm" in the complete() and
stream() flows should be replaced with a small static helper (e.g.,
_import_litellm) that attempts to import litellm in a try/except and raises
ValidationException (from src/exceptions.py) with clear install guidance if
ModuleNotFoundError occurs; update both complete() and stream() to call this
helper instead of importing directly so callers receive the deterministic
ValidationException rather than a raw ModuleNotFoundError.
In `@src/llm/registry.py`:
- Around line 135-137: client_for_model_config currently returns None for
provider "litellm" despite its ProviderClient return type, causing a None to be
stored in AttemptPlan.client and later a runtime error in honcho_llm_call_inner;
to fix, change the "litellm" branch in client_for_model_config to raise a clear
exception (e.g., ValueError) explaining that LiteLLM credentials are managed via
get_backend and should not be resolved here, and update any call sites
(AttemptPlan.client creation and honcho_llm_call_inner) to not expect a None
from client_for_model_config for litellm so the control flow consistently uses
get_backend/CLIENTS for LiteLLM.
---
Nitpick comments:
In `@tests/test_litellm_backend.py`:
- Around line 41-161: Add two async pytest tests for LiteLLMBackend.stream:
create test_stream_final_chunk_with_usage that stubs litellm_stub.astream to
yield normal partial chunks then a chunk containing a non-empty usage object and
assert the final yielded chunk has is_done True and contains correct token
counts; and create test_stream_fallback_done_chunk_with_finish_reason that stubs
astream to yield partial chunks and a chunk with only finish_reason set, then
assert the stream emits a final is_done True chunk with appropriate
finish_reason fallback handling. Reference LiteLLMBackend.stream and the test
stub litellm_stub.astream (and existing helpers like
_mock_response/_mock_stream_response) to locate where to hook the stream
outputs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9dcecdef-c691-4849-881a-e1332cb3d449
📒 Files selected for processing (6)
pyproject.tomlsrc/config.pysrc/llm/backends/__init__.pysrc/llm/backends/litellm.pysrc/llm/registry.pytests/test_litellm_backend.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/llm/registry.py (1)
140-153:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
backend_for_providerdiscards LiteLLM credentials; onlyget_backendplumbs them.For litellm,
honcho_llm_call_innerreceivesselected_configwithapi_key/base_urlbut never passes it to backend construction. Instead, it callsbackend_for_provider("litellm", None)(sinceclient_for_model_configreturnsNonefor litellm andCLIENTShas no litellm entry), which instantiatesLiteLLMBackend()with no arguments. The backend then falls back to environment variables instead of using the provided credentials.In contrast,
get_backend(config)(lines 174-175) correctly passes credentials:LiteLLMBackend(api_key=config.api_key, api_base=config.base_url).This means any litellm
ModelConfigwith customapi_key/base_urlset by the user is silently dropped in the normal planning flow (honcho_llm_call → honcho_llm_call_inner → backend_for_provider), causing the backend to pick up the wrong provider key or fail.Options:
- Plumb credentials through
backend_for_providerfor litellm by checkingselected_config, or- Have
honcho_llm_call_innercallget_backend(selected_config)for litellm instead ofbackend_for_provider, or- Drop the litellm branch in
backend_for_providerand enforce all litellm construction throughget_backend.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/registry.py` around lines 140 - 153, The litellm credentials from ModelConfig (selected_config.api_key / api_base) are being dropped because backend_for_provider("litellm", None) constructs LiteLLMBackend() with no args; fix by ensuring litellm construction receives the config credentials: either (preferred) change honcho_llm_call_inner to call get_backend(selected_config) for provider "litellm" (so LiteLLMBackend is created with api_key/api_base), or update backend_for_provider to accept and detect a ModelConfig/credentials parameter and pass them into LiteLLMBackend(api_key=..., api_base=...); update references around honcho_llm_call_inner, backend_for_provider, client_for_model_config, and LiteLLMBackend so litellm never falls back to env vars when selected_config supplies credentials.
🧹 Nitpick comments (3)
src/llm/backends/litellm.py (2)
163-169: 💤 Low valueRedundant
if/else— both branches do the same thing.The
isinstance(... BaseModel)check is dead — both branches assignresponse_formatunchanged. Either drop the inner branching or differentiate the value (e.g.model_json_schema()for the non-BaseModel path) if the intent was to normalize.♻️ Proposed simplification
- if response_format is not None: - if isinstance(response_format, type) and issubclass( - response_format, BaseModel - ): - params["response_format"] = response_format - else: - params["response_format"] = response_format + if response_format is not None: + params["response_format"] = response_format🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/backends/litellm.py` around lines 163 - 169, The inner isinstance/issubclass check around response_format is redundant because both branches set params["response_format"] to the same value; simplify by removing the branching and just assign params["response_format"] = response_format. If the original intent was to normalize Pydantic types, instead set params["response_format"] = response_format.model_json_schema() when issubclass(response_format, BaseModel), otherwise assign the value directly; update the code around the response_format handling in the function that builds params (the response_format variable, BaseModel reference, and params dict) accordingly.
209-223: 💤 Low value
_convert_toolsonly inspects the first tool.
tools[0].get("type") == "function"decides the format for the entire list. If the caller mixes Anthropic-shaped (name/description/input_schema) and OpenAI-shaped (type: function) entries — e.g. a partial migration — only one shape will be valid downstream. Iterate per-tool and convert each one whose shape doesn't already match.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/backends/litellm.py` around lines 209 - 223, The _convert_tools function currently checks only tools[0] to decide format for the whole list; update _convert_tools to process each tool individually: iterate over the tools list and for each tool, if tool.get("type") == "function" keep it as-is, otherwise convert that single tool into the OpenAI-shaped dict with keys "type":"function" and a "function" dict containing "name" from tool["name"], "description" from tool.get("description"), and "parameters" from tool.get("input_schema"); return the newly built list and defensively handle missing keys (use .get where appropriate) so mixed-format lists are normalized per-tool.src/llm/registry.py (1)
156-162: 💤 Low valueLost
assert_neverexhaustiveness check on the newlitellmbranch.
history_adapter_for_providerpreviously presumably ended inassert_never(provider); the new fallthroughreturn OpenAIHistoryAdapter() # litellm uses OpenAI message formatopportunistically covers"openai"and"litellm"together but also masks any futureModelTransportvalue added without an explicit branch (it will silently use the OpenAI adapter rather than failing the type checker / at runtime). Consider an explicitif provider in ("openai", "litellm")followed byassert_never(provider).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llm/registry.py` around lines 156 - 162, The function history_adapter_for_provider currently falls through to returning OpenAIHistoryAdapter for any non-anthropic/gemini provider, which hides missing-case bugs; change it to explicitly handle "openai" and "litellm" (e.g. if provider in ("openai", "litellm"): return OpenAIHistoryAdapter()) and then append assert_never(provider) at the end of history_adapter_for_provider so the type-checker/runtime will catch any future unknown ModelTransport values instead of silently using the OpenAI adapter.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/llm/backends/litellm.py`:
- Around line 61-79: complete() and stream() accept thinking_budget_tokens but
never forward it to _build_params() or construct the LiteLLM "thinking" dict, so
Anthropic-style token budgets are ignored; update complete(), stream(), and
_build_params() to accept/handle a thinking parameter by: when
thinking_budget_tokens is not None build a thinking dict like
{"type":"enabled","budget_tokens": thinking_budget_tokens} (optionally include
thinking_effort if present) and pass it into _build_params() (or have
_build_params() accept thinking_budget_tokens and assemble the thinking dict
there), ensuring the final params sent to LiteLLM include this thinking field;
alternatively, if you choose not to support it, add an explicit
log/documentation in complete()/stream() and _build_params() stating
thinking_budget_tokens is ignored.
---
Outside diff comments:
In `@src/llm/registry.py`:
- Around line 140-153: The litellm credentials from ModelConfig
(selected_config.api_key / api_base) are being dropped because
backend_for_provider("litellm", None) constructs LiteLLMBackend() with no args;
fix by ensuring litellm construction receives the config credentials: either
(preferred) change honcho_llm_call_inner to call get_backend(selected_config)
for provider "litellm" (so LiteLLMBackend is created with api_key/api_base), or
update backend_for_provider to accept and detect a ModelConfig/credentials
parameter and pass them into LiteLLMBackend(api_key=..., api_base=...); update
references around honcho_llm_call_inner, backend_for_provider,
client_for_model_config, and LiteLLMBackend so litellm never falls back to env
vars when selected_config supplies credentials.
---
Nitpick comments:
In `@src/llm/backends/litellm.py`:
- Around line 163-169: The inner isinstance/issubclass check around
response_format is redundant because both branches set params["response_format"]
to the same value; simplify by removing the branching and just assign
params["response_format"] = response_format. If the original intent was to
normalize Pydantic types, instead set params["response_format"] =
response_format.model_json_schema() when issubclass(response_format, BaseModel),
otherwise assign the value directly; update the code around the response_format
handling in the function that builds params (the response_format variable,
BaseModel reference, and params dict) accordingly.
- Around line 209-223: The _convert_tools function currently checks only
tools[0] to decide format for the whole list; update _convert_tools to process
each tool individually: iterate over the tools list and for each tool, if
tool.get("type") == "function" keep it as-is, otherwise convert that single tool
into the OpenAI-shaped dict with keys "type":"function" and a "function" dict
containing "name" from tool["name"], "description" from tool.get("description"),
and "parameters" from tool.get("input_schema"); return the newly built list and
defensively handle missing keys (use .get where appropriate) so mixed-format
lists are normalized per-tool.
In `@src/llm/registry.py`:
- Around line 156-162: The function history_adapter_for_provider currently falls
through to returning OpenAIHistoryAdapter for any non-anthropic/gemini provider,
which hides missing-case bugs; change it to explicitly handle "openai" and
"litellm" (e.g. if provider in ("openai", "litellm"): return
OpenAIHistoryAdapter()) and then append assert_never(provider) at the end of
history_adapter_for_provider so the type-checker/runtime will catch any future
unknown ModelTransport values instead of silently using the OpenAI adapter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 58eb7baa-9763-49be-99e1-342e5840e8e9
📒 Files selected for processing (3)
src/llm/backends/litellm.pysrc/llm/registry.pysrc/llm/runtime.py
| thinking_budget_tokens: int | None = None, | ||
| thinking_effort: str | None = None, | ||
| max_output_tokens: int | None = None, | ||
| extra_params: dict[str, Any] | None = None, | ||
| ) -> CompletionResult: | ||
| litellm = self._import_litellm() | ||
|
|
||
| params = self._build_params( | ||
| model=model, | ||
| messages=messages, | ||
| max_tokens=max_output_tokens or max_tokens, | ||
| temperature=temperature, | ||
| stop=stop, | ||
| tools=tools, | ||
| tool_choice=tool_choice, | ||
| response_format=response_format, | ||
| thinking_effort=thinking_effort, | ||
| extra_params=extra_params, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the exact parameter name and shape for setting an Anthropic thinking budget through litellm.acompletion in litellm 1.65?
💡 Result:
In liteLLM 1.65, when you use litellm.completion(...)/litellm.acompletion(...) with Anthropic extended thinking, the exact parameter is named thinking and its shape is: thinking={"type": "enabled", "budget_tokens": } Example: response = litellm.acompletion( model="anthropic/claude-sonnet-4-6", messages=[{"role": "user", "content": "…"}], thinking={"type": "enabled", "budget_tokens": 1024},) liteLLM also documents an OpenAI-style alternative mapping: reasoning_effort is mapped to Anthropic output_config.effort (not budget_tokens) for Claude 4.6/4.7 models. If you specifically need a token budget for thinking, use the thinking parameter above, not reasoning_effort. [1][2][3]
Citations:
- 1: https://docs.litellm.ai/docs/providers/anthropic_effort
- 2: https://docs.litellm.ai/docs/providers/anthropic
- 3: https://docs.litellm.ai/docs/reasoning_content
🏁 Script executed:
# First, let's examine the litellm.py file to understand its structure
wc -l src/llm/backends/litellm.pyRepository: plastic-labs/honcho
Length of output: 95
🏁 Script executed:
# Read the file with line numbers to see the relevant sections
cat -n src/llm/backends/litellm.py | sed -n '50,180p'Repository: plastic-labs/honcho
Length of output: 5776
🏁 Script executed:
# Also search for thinking_budget_tokens usage in the file
rg 'thinking_budget_tokens' src/llm/backends/litellm.py -nRepository: plastic-labs/honcho
Length of output: 171
🏁 Script executed:
# Search for _build_params method to see what parameters it handles
rg '_build_params' src/llm/backends/litellm.py -A 30 -nRepository: plastic-labs/honcho
Length of output: 3803
Forward thinking_budget_tokens parameter or document that it's ignored.
complete() and stream() accept thinking_budget_tokens but never forward it to _build_params() or use it. LiteLLM requires thinking={"type": "enabled", "budget_tokens": ...} for Anthropic extended thinking with token budgets. Without plumbing this parameter, callers will silently get incorrect behavior. Either construct and pass the thinking parameter based on thinking_budget_tokens, or explicitly log/document that this backend ignores it.
Also applies to: stream() (lines 95–113) and _build_params() (lines 135–176)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/llm/backends/litellm.py` around lines 61 - 79, complete() and stream()
accept thinking_budget_tokens but never forward it to _build_params() or
construct the LiteLLM "thinking" dict, so Anthropic-style token budgets are
ignored; update complete(), stream(), and _build_params() to accept/handle a
thinking parameter by: when thinking_budget_tokens is not None build a thinking
dict like {"type":"enabled","budget_tokens": thinking_budget_tokens} (optionally
include thinking_effort if present) and pass it into _build_params() (or have
_build_params() accept thinking_budget_tokens and assemble the thinking dict
there), ensuring the final params sent to LiteLLM include this thinking field;
alternatively, if you choose not to support it, add an explicit
log/documentation in complete()/stream() and _build_params() stating
thinking_budget_tokens is ignored.
Summary
Motivation
Honcho currently supports three LLM transports (OpenAI, Anthropic, Gemini) with dedicated SDK backends. LiteLLM adds a unified gateway that handles provider-specific parameter translation and authentication,
letting users route to any of 100+ providers (Bedrock, Vertex AI, Groq, Mistral, etc.) without needing a dedicated backend for each.
Changes
src/llm/backends/litellm.py- NewLiteLLMBackendimplementing theProviderBackendprotocol withcomplete()andstream()src/llm/backends/__init__.py- ExportedLiteLLMBackendsrc/llm/registry.py- Wired LiteLLM intobackend_for_provider(),client_for_model_config(), andget_backend()src/config.py- Added"litellm"toModelTransportliteral typepyproject.toml- Addedlitellm>=1.65,<1.85as optional dependencytests/test_litellm_backend.py- 6 unit testsTests
1. Unit tests:
uv run pytest tests/test_litellm_backend.py -v(Note: tests require
--override-ini="asyncio_mode=auto"or a standalone conftest to avoid the DB fixture in the root conftest. The test logic itself is self-contained with litellm mocked.)Usage
Install:
Config (config.toml):
Environment variables:
Python (direct backend usage):
Via ModelConfig (production path):
Risk / Compatibility
litellmis an optional dependency. Base install unaffected.complete()andstream()to avoid import errors when not installed.drop_params=Trueby default for cross-provider kwarg compatibility.OpenAIHistoryAdaptersince LiteLLM follows OpenAI message format.Summary by CodeRabbit
New Features
Dependency
Tests
Chores