Skip to content

Commit d0afddf

Browse files
committed
Add GitHub Actions workflow for automated Docker image publishing
- Created build-and-publish workflow to publish images to ghcr.io - Updated docker-compose.yml to use published image by default - Added docker-compose.override.yml for local development builds - Updated README with simpler setup using pre-built images - Added download instructions for quick start without cloning - Moved archived files to .archive directory - Updated documentation to reference base container project Users can now get started with just docker-compose.yml and .env without needing to clone or build the repository.
1 parent ddab8f0 commit d0afddf

38 files changed

Lines changed: 4569 additions & 15 deletions
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
# Investigation Report: LiteLLM Anthropic Provider vs Claude Code SDK Provider
2+
3+
## 1. Missing Parameters
4+
5+
Our current implementation is missing several important parameters that the official Anthropic provider supports:
6+
7+
### Core Parameters
8+
- **`temperature`** - Controls randomness (0-1)
9+
- **`top_p`** - Nucleus sampling parameter
10+
- **`top_k`** - Top-k sampling parameter
11+
- **`stop`/`stop_sequences`** - Stop sequences for generation
12+
- **`max_tokens`/`max_completion_tokens`** - Maximum tokens to generate
13+
- **`system`** - System messages support
14+
- **`metadata`** - User tracking and custom metadata
15+
- **`response_format`** - JSON mode support
16+
- **`tools`/`tool_choice`** - Function/tool calling support
17+
- **`parallel_tool_calls`** - Parallel tool execution control
18+
- **`stream_options`** - Streaming configuration (e.g., include_usage)
19+
- **`extra_headers`** - Custom HTTP headers
20+
- **`timeout`** - Request timeout control
21+
- **`user`** - User ID for tracking
22+
23+
### Advanced Anthropic Features
24+
- **`thinking`/`reasoning_effort`** - Claude's reasoning/thinking tokens
25+
- **`web_search_options`** - Web search integration
26+
- **`cache_control`** - Prompt caching support
27+
- **MCP servers** - Model Context Protocol support
28+
- **Container upload** - Code execution environments
29+
- **Citations** - Source attribution support
30+
31+
## 2. Implementation Patterns We Should Adopt
32+
33+
### A. Proper Message Formatting
34+
```python
35+
# Current (basic)
36+
def format_messages_to_prompt(self, messages: List[Dict]) -> str:
37+
prompt_parts = []
38+
for message in messages:
39+
# Simple string concatenation
40+
return "\n\n".join(prompt_parts)
41+
42+
# Better (Anthropic style)
43+
def transform_request(self, model, messages, optional_params, litellm_params, headers):
44+
# 1. Extract and transform system messages separately
45+
anthropic_system_message_list = self.translate_system_message(messages)
46+
47+
# 2. Use proper Anthropic message formatting
48+
anthropic_messages = anthropic_messages_pt(
49+
model=model,
50+
messages=messages,
51+
llm_provider="anthropic"
52+
)
53+
54+
# 3. Build proper request structure
55+
data = {
56+
"model": model,
57+
"messages": anthropic_messages,
58+
"system": anthropic_system_message_list,
59+
**optional_params
60+
}
61+
```
62+
63+
### B. Streaming Response Handling
64+
```python
65+
# Current (basic chunking)
66+
async def astreaming(self, model: str, messages: List[Dict], **kwargs):
67+
# Manual text splitting
68+
if len(content) > 50:
69+
words = content.split(' ')
70+
# Manual chunking logic
71+
72+
# Better (proper streaming chunks)
73+
class ModelResponseIterator:
74+
def chunk_parser(self, chunk: dict) -> ModelResponseStream:
75+
# Handle different chunk types properly
76+
if type_chunk == "content_block_delta":
77+
# Process text, tool calls, thinking blocks
78+
elif type_chunk == "content_block_start":
79+
# Handle block initialization
80+
elif type_chunk == "message_delta":
81+
# Process usage, finish reasons
82+
83+
# Return properly formatted streaming chunk
84+
return ModelResponseStream(
85+
choices=[StreamingChoices(...)],
86+
usage=usage,
87+
)
88+
```
89+
90+
### C. Error Handling
91+
```python
92+
# Current (none)
93+
# No error handling
94+
95+
# Better
96+
class AnthropicError(BaseLLMException):
97+
def __init__(self, status_code: int, message: str, headers: Optional[httpx.Headers] = None):
98+
self.status_code = status_code
99+
self.message = message
100+
self.headers = headers
101+
super().__init__(message)
102+
```
103+
104+
### D. Usage Tracking
105+
```python
106+
# Current (hardcoded)
107+
usage = Usage(prompt_tokens=100, completion_tokens=50, total_tokens=150)
108+
109+
# Better (calculated)
110+
def calculate_usage(self, usage_object: dict, reasoning_content: Optional[str]) -> Usage:
111+
prompt_tokens = usage_object.get("input_tokens", 0)
112+
completion_tokens = usage_object.get("output_tokens", 0)
113+
114+
# Handle cache tokens
115+
cache_creation_input_tokens = usage_object.get("cache_creation_input_tokens", 0)
116+
cache_read_input_tokens = usage_object.get("cache_read_input_tokens", 0)
117+
118+
# Handle reasoning tokens
119+
completion_token_details = CompletionTokensDetailsWrapper(
120+
reasoning_tokens=token_counter(text=reasoning_content, count_response_tokens=True)
121+
) if reasoning_content else None
122+
123+
return Usage(
124+
prompt_tokens=prompt_tokens,
125+
completion_tokens=completion_tokens,
126+
total_tokens=prompt_tokens + completion_tokens,
127+
prompt_tokens_details=prompt_tokens_details,
128+
completion_tokens_details=completion_token_details,
129+
cache_creation_input_tokens=cache_creation_input_tokens,
130+
cache_read_input_tokens=cache_read_input_tokens,
131+
)
132+
```
133+
134+
## 3. Specific Recommendations
135+
136+
### A. Inherit from BaseConfig
137+
```python
138+
from litellm.llms.base_llm.chat.transformation import BaseConfig
139+
140+
class ClaudeCodeConfig(BaseConfig):
141+
@property
142+
def custom_llm_provider(self) -> str:
143+
return "claude-code-sdk"
144+
145+
def get_supported_openai_params(self, model: str):
146+
return [
147+
"stream", "stop", "temperature", "top_p", "max_tokens",
148+
"tools", "tool_choice", "extra_headers", "timeout",
149+
"response_format", "user", "metadata"
150+
]
151+
152+
def map_openai_params(self, non_default_params, optional_params, model, drop_params):
153+
# Map OpenAI params to Claude SDK format
154+
for param, value in non_default_params.items():
155+
if param == "temperature":
156+
optional_params["temperature"] = value
157+
# ... etc
158+
return optional_params
159+
```
160+
161+
### B. Implement Proper Response Transformation
162+
```python
163+
def transform_response(self, model, raw_response, model_response, logging_obj, **kwargs):
164+
# Extract content, tool calls, usage
165+
text_content, tool_calls, usage = self.extract_response_content(raw_response)
166+
167+
# Build proper message object
168+
message = litellm.Message(
169+
content=text_content,
170+
tool_calls=tool_calls,
171+
role="assistant"
172+
)
173+
174+
# Set on model response
175+
model_response.choices[0].message = message
176+
model_response.usage = usage
177+
178+
return model_response
179+
```
180+
181+
### C. Add Parameter Support in ClaudeCodeOptions
182+
```python
183+
# When creating options
184+
options = ClaudeCodeOptions(
185+
model=claude_model,
186+
temperature=kwargs.get("temperature"),
187+
max_tokens=kwargs.get("max_tokens"),
188+
stop_sequences=kwargs.get("stop"),
189+
# ... other params if Claude SDK supports them
190+
)
191+
```
192+
193+
### D. Implement Async HTTP Client Usage
194+
```python
195+
from litellm.llms.custom_httpx.http_handler import AsyncHTTPHandler, get_async_httpx_client
196+
197+
async def acompletion(self, model, messages, client=None, **kwargs):
198+
if client is None:
199+
client = get_async_httpx_client(llm_provider=litellm.LlmProviders.CLAUDE_CODE_SDK)
200+
201+
# Use client for HTTP operations if needed
202+
```
203+
204+
## 4. Limitations & Considerations
205+
206+
1. **Claude Code SDK Limitations**: Many advanced features (tools, web search, MCP) may not be supported by the Claude Code SDK itself. We should check what the SDK actually supports.
207+
208+
2. **OAuth vs API Key**: The official provider uses API keys while we use OAuth. This is a fundamental difference we need to maintain.
209+
210+
3. **Streaming Format**: The Claude Code SDK may have different streaming formats than the official Anthropic API, requiring custom adaptation.
211+
212+
4. **Model Naming**: We need to maintain our model extraction logic since we're routing through the SDK rather than directly to Anthropic.
213+
214+
## 5. Priority Implementation Order
215+
216+
### High Priority (Basic functionality):
217+
- Add `temperature`, `top_p`, `max_tokens` support
218+
- Implement proper error handling with `ClaudeCodeError`
219+
- Fix usage calculation to use actual token counts
220+
- Add `stop_sequences` support
221+
222+
### Medium Priority (Common features):
223+
- Support system messages properly
224+
- Add `metadata` and `user` tracking
225+
- Implement `timeout` handling
226+
- Add `extra_headers` support
227+
228+
### Low Priority (Advanced features):
229+
- Tool calling support (if SDK supports it)
230+
- Response format / JSON mode
231+
- Caching support
232+
- Web search integration
233+
234+
This investigation shows that our current provider is quite basic compared to the official implementation. The main improvements would come from proper parameter handling, better error management, accurate usage tracking, and more sophisticated message formatting.

0 commit comments

Comments
 (0)