Skip to content

Commit a2ecac7

Browse files
committed
fix: Docs rewrite, file cleanup, and provider bug fixes
- Remove stale scripts/litellm-entrypoint.sh (referenced deleted startup.py) - Remove unused API key placeholders from .env.example, fix CLI command - Replace ASCII art with Mermaid diagrams in README, fix step numbering - Fix npm package name (@anthropic-ai/claude-code, not claude-agent-sdk) - Add parameter handling docs to USAGE-EXAMPLES.md (drop_params behavior) - Fix SECURITY.md key verification to not leak secret value - Fix provider: dict-based usage parsing with input_tokens/output_tokens - Fix provider: asyncio.run() with thread-pool fallback for running loops - Fix provider: ResultMessage handling in streaming with len//4 fallback - Fix provider: try/except with APIError around query() calls - Fix CI: test job builds local image instead of pulling unpushed registry image
1 parent b34df09 commit a2ecac7

7 files changed

Lines changed: 135 additions & 118 deletions

File tree

.env.example

Lines changed: 1 addition & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -14,46 +14,13 @@ DATABASE_URL=postgresql://llmproxy:your-secure-database-password-here@db:5432/li
1414
STORE_MODEL_IN_DB=True
1515

1616
# REQUIRED for Claude Pro/Max users: OAuth token
17-
# Generate with: claude oauth start
17+
# Generate with: claude setup-token
1818
# Token format: sk-ant-oat01-...
1919
CLAUDE_CODE_OAUTH_TOKEN=
2020

2121
# Logging
2222
LITELLM_LOG=INFO
2323

24-
# OpenAI (not used but may be referenced)
25-
OPENAI_API_KEY=""
26-
OPENAI_BASE_URL=""
27-
28-
# Anthropic (not used but may be referenced)
29-
ANTHROPIC_API_KEY=""
30-
31-
# Cohere
32-
COHERE_API_KEY=""
33-
34-
# Azure
35-
AZURE_API_BASE=""
36-
AZURE_API_VERSION=""
37-
AZURE_API_KEY=""
38-
39-
# Replicate
40-
REPLICATE_API_KEY=""
41-
REPLICATE_API_TOKEN=""
42-
43-
# OpenRouter
44-
OR_SITE_URL=""
45-
OR_APP_NAME="LiteLLM Claude Code Provider"
46-
OR_API_KEY=""
47-
48-
# Infisical
49-
INFISICAL_TOKEN=""
50-
51-
# Novita AI
52-
NOVITA_API_KEY=""
53-
54-
# INFINITY
55-
INFINITY_API_KEY=""
56-
5724
# Open WebUI Configuration (for compose-openwebui.yaml)
5825
# Port for Open WebUI interface (default: 8090)
5926
OPEN_WEBUI_PORT=8090

.github/workflows/build-and-publish.yml

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -73,35 +73,37 @@ jobs:
7373
BUILDKIT_INLINE_CACHE=1
7474
7575
test:
76-
needs: build
7776
if: github.event_name == 'pull_request'
7877
runs-on: ubuntu-latest
7978

8079
steps:
8180
- name: Checkout repository
8281
uses: actions/checkout@v4
8382

83+
- name: Build test image
84+
run: docker build -t litellm-claude-test .
85+
8486
- name: Test container
8587
run: |
8688
# Test that the container can start
8789
docker run --rm \
8890
-e LITELLM_MASTER_KEY=sk-test-key \
8991
-e CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-test \
90-
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.number }} \
92+
litellm-claude-test \
9193
python --version
92-
94+
9395
# Test that LiteLLM is installed
9496
docker run --rm \
9597
-e LITELLM_MASTER_KEY=sk-test-key \
9698
-e CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-test \
97-
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.number }} \
99+
litellm-claude-test \
98100
litellm --version || echo "LiteLLM version check"
99-
101+
100102
# Test that the Claude Agent SDK is importable
101103
docker run --rm \
102104
-e LITELLM_MASTER_KEY=sk-test-key \
103105
-e CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-test \
104-
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.number }} \
106+
litellm-claude-test \
105107
python -c "import claude_agent_sdk; print('Claude Agent SDK available')"
106108
107109
release:

README.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,10 @@
55

66
Dockerized LiteLLM with custom provider that makes Claude Agent SDK available through the standard OpenAI-compatible API interface. Based on Anthropic's official [Claude Agent SDK](https://docs.anthropic.com/en/docs/claude-agent-sdk) documentation:
77

8-
```
9-
┌─────────────────┐ ╭──────────────╮ ┌─────────────────┐
10-
│ │ │ │ │ Open WebUI, │
11-
│ Claude Agent │ ◄─────► │ LiteLLM │ ◄─────► │ Grafiti, │
12-
│ SDK │ │ │ │ LangChain, etc. │
13-
└─────────────────┘ ╰──────────────╯ └─────────────────┘
14-
OAuth/API Translation OpenAI Compatible App
8+
```mermaid
9+
graph LR
10+
A["Claude Agent SDK<br/>OAuth/API"] <--> B["LiteLLM<br/>Translation"]
11+
B <--> C["Open WebUI, Graphiti,<br/>LangChain, etc."]
1512
```
1613

1714
## Available Image
@@ -56,18 +53,18 @@ based on our [Claude Code SDK Docker images](https://github.com/cabinlab/claude-
5653
cp .env.example .env
5754
```
5855

59-
3. **Set your master key** (REQUIRED):
56+
2. **Set your master key** (REQUIRED):
6057
```bash
6158
# Edit .env and update LITELLM_MASTER_KEY
6259
LITELLM_MASTER_KEY=sk-your-desired-custom-key
6360
```
6461

6562
See [Security Guide](docs/SECURITY.md) for key generation best practices
6663

67-
4. **Get your Claude OAuth token** (wherever you have Claude Code installed):
64+
3. **Get your Claude OAuth token** (wherever you have Claude Code installed):
6865
```bash
6966
# If you don't have the Claude CLI installed:
70-
npm install -g @anthropic-ai/claude-agent-sdk
67+
npm install -g @anthropic-ai/claude-code
7168

7269
# Generate a long-lived token
7370
claude setup-token
@@ -77,18 +74,18 @@ based on our [Claude Code SDK Docker images](https://github.com/cabinlab/claude-
7774
```
7875

7976

80-
5. **Add the token to your .env file**:
77+
4. **Add the token to your .env file**:
8178
```bash
8279
# Edit .env and add your token:
8380
CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-your-token-here
8481
```
8582

86-
6. **Start the services**:
83+
5. **Start the services**:
8784
```bash
8885
docker-compose up -d
8986
```
9087

91-
7. **Verify it's working**:
88+
6. **Verify it's working**:
9289

9390
### Web UI
9491
Navigate to `http://localhost:4000/ui/` and select Test Key:
@@ -139,7 +136,7 @@ docker-compose restart litellm
139136
1. **Long-lived OAuth Tokens** (Recommended for Claude Pro/Max users)
140137
- Generate with `claude setup-token` on your host machine
141138
- Set `CLAUDE_CODE_OAUTH_TOKEN` in your `.env` file
142-
- Tokens start with `sk-ant-oat01-` and last for 1 year
139+
- Tokens start with `sk-ant-oat01-`
143140
- Authentication persists across container restarts via Docker volume
144141

145142
2. **Interactive Authentication** (Alternative)
@@ -178,8 +175,12 @@ This ensures authentication persists across container restarts.
178175
179176
## Architecture
180177
181-
```
182-
Client Application → LiteLLM Proxy → Claude Agent SDK Provider → Claude Agent SDK → Claude API
178+
```mermaid
179+
graph LR
180+
A[Client App] --> B[LiteLLM Proxy]
181+
B --> C[Claude Agent SDK Provider]
182+
C --> D[Claude Agent SDK]
183+
D --> E[Claude API]
183184
```
184185

185186
The provider:

docs/SECURITY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,8 @@ This system uses OAuth for Claude authentication (stored in Docker volume) and A
6767
To verify your security setup:
6868

6969
```bash
70-
# Check if custom key is set (should not show default)
71-
docker-compose exec litellm env | grep LITELLM_MASTER_KEY
70+
# Confirm the key is set (does not print the actual value)
71+
docker-compose exec litellm sh -c 'echo "LITELLM_MASTER_KEY is set (${#LITELLM_MASTER_KEY} chars)"'
7272

7373
# Check startup logs for warnings
7474
docker-compose logs litellm | grep "WARNING"

docs/USAGE-EXAMPLES.md

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ curl -X POST http://localhost:4000/v1/chat/completions \
8080
}'
8181
```
8282

83+
> **Note:** The `temperature` parameter in this example is silently dropped due to `drop_params: true` in `litellm_config.yaml`. See [Parameter Handling](#parameter-handling) below.
84+
8385
## JavaScript/TypeScript
8486

8587
```javascript
@@ -111,10 +113,39 @@ main();
111113
4. **Features Supported**:
112114
- Chat completions (`/v1/chat/completions`)
113115
- Model listing (`/v1/models`)
114-
- Standard OpenAI parameters (temperature, max_tokens, etc.)
116+
- Streaming responses (`"stream": true`)
115117

116118
5. **Features NOT Supported**:
117119
- Embeddings
120+
- OpenAI-specific parameters (see [Parameter Handling](#parameter-handling) below)
121+
122+
## Parameter Handling
123+
124+
Because LiteLLM is configured with `drop_params: true` and the Claude Agent SDK manages its own parameters, most OpenAI-specific parameters are silently dropped.
125+
126+
### Parameters that work
127+
128+
| Parameter | Description |
129+
|-----------|-------------|
130+
| `model` | Model selection (`sonnet`, `opus`, `haiku`) |
131+
| `messages` | Conversation messages array |
132+
| `stream` | Enable streaming responses (`true`/`false`) |
133+
134+
### Parameters silently dropped
135+
136+
The following parameters are accepted without error but have **no effect**:
137+
138+
| Parameter | Why |
139+
|-----------|-----|
140+
| `temperature` | Claude Agent SDK manages sampling internally |
141+
| `top_p` | Claude Agent SDK manages sampling internally |
142+
| `max_tokens` | Claude Agent SDK manages output length internally |
143+
| `frequency_penalty` | Not supported by Claude Agent SDK |
144+
| `presence_penalty` | Not supported by Claude Agent SDK |
145+
| `stop` | Not supported by Claude Agent SDK |
146+
| `tools` / `tool_choice` | Not supported through this provider |
147+
148+
This is configured via `drop_params: true` in `config/litellm_config.yaml`. Without this setting, unsupported parameters would cause errors.
118149

119150

120151
## Environment Variables for Your App

providers/claude_agent_provider.py

Lines changed: 75 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import asyncio
2+
import concurrent.futures
23
from typing import Dict, List, Iterator, AsyncIterator
34
import uuid
45
from datetime import datetime
@@ -68,12 +69,18 @@ def create_litellm_response(
6869

6970
def completion(self, model: str, messages: List[Dict], **kwargs) -> ModelResponse:
7071
"""Sync completion wrapper."""
71-
loop = asyncio.new_event_loop()
72-
asyncio.set_event_loop(loop)
7372
try:
74-
return loop.run_until_complete(self.acompletion(model, messages, **kwargs))
75-
finally:
76-
loop.close()
73+
loop = asyncio.get_running_loop()
74+
except RuntimeError:
75+
loop = None
76+
77+
if loop and loop.is_running():
78+
with concurrent.futures.ThreadPoolExecutor() as pool:
79+
return pool.submit(
80+
asyncio.run, self.acompletion(model, messages, **kwargs)
81+
).result()
82+
else:
83+
return asyncio.run(self.acompletion(model, messages, **kwargs))
7784

7885
async def acompletion(self, model: str, messages: List[Dict], **kwargs) -> ModelResponse:
7986
"""Async completion using Claude Agent SDK with model selection."""
@@ -86,15 +93,28 @@ async def acompletion(self, model: str, messages: List[Dict], **kwargs) -> Model
8693
prompt_tokens = 0
8794
completion_tokens = 0
8895

89-
async for message in query(prompt=prompt, options=options):
90-
if isinstance(message, AssistantMessage):
91-
for block in message.content:
92-
if isinstance(block, TextBlock):
93-
response_content += block.text
94-
elif isinstance(message, ResultMessage):
95-
if hasattr(message, "usage") and message.usage:
96-
prompt_tokens = getattr(message.usage, "prompt_tokens", 0) or 0
97-
completion_tokens = getattr(message.usage, "completion_tokens", 0) or 0
96+
try:
97+
async for message in query(prompt=prompt, options=options):
98+
if isinstance(message, AssistantMessage):
99+
for block in message.content:
100+
if isinstance(block, TextBlock):
101+
response_content += block.text
102+
elif isinstance(message, ResultMessage):
103+
if hasattr(message, "usage") and message.usage:
104+
usage_data = message.usage
105+
if isinstance(usage_data, dict):
106+
prompt_tokens = usage_data.get("input_tokens", 0) or 0
107+
completion_tokens = usage_data.get("output_tokens", 0) or 0
108+
else:
109+
prompt_tokens = getattr(usage_data, "input_tokens", 0) or 0
110+
completion_tokens = getattr(usage_data, "output_tokens", 0) or 0
111+
except Exception as e:
112+
raise litellm.exceptions.APIError(
113+
status_code=500,
114+
message=f"Claude Agent SDK query failed: {e}",
115+
model=model,
116+
llm_provider="claude-agent-sdk",
117+
)
98118

99119
return self.create_litellm_response(
100120
response_content, model, prompt_tokens, completion_tokens
@@ -112,23 +132,45 @@ async def astreaming(self, model: str, messages: List[Dict], **kwargs) -> AsyncI
112132
options = ClaudeAgentOptions(model=claude_model)
113133

114134
total_content = ""
135+
prompt_tokens = 0
136+
completion_tokens = 0
115137

116-
async for message in query(prompt=prompt, options=options):
117-
if isinstance(message, AssistantMessage):
118-
for block in message.content:
119-
if isinstance(block, TextBlock):
120-
content = block.text
121-
total_content += content
122-
123-
chunk: GenericStreamingChunk = {
124-
"text": content,
125-
"is_finished": False,
126-
"finish_reason": None,
127-
"index": 0,
128-
"tool_use": None,
129-
"usage": None,
130-
}
131-
yield chunk
138+
try:
139+
async for message in query(prompt=prompt, options=options):
140+
if isinstance(message, AssistantMessage):
141+
for block in message.content:
142+
if isinstance(block, TextBlock):
143+
content = block.text
144+
total_content += content
145+
146+
chunk: GenericStreamingChunk = {
147+
"text": content,
148+
"is_finished": False,
149+
"finish_reason": None,
150+
"index": 0,
151+
"tool_use": None,
152+
"usage": None,
153+
}
154+
yield chunk
155+
elif isinstance(message, ResultMessage):
156+
if hasattr(message, "usage") and message.usage:
157+
usage_data = message.usage
158+
if isinstance(usage_data, dict):
159+
prompt_tokens = usage_data.get("input_tokens", 0) or 0
160+
completion_tokens = usage_data.get("output_tokens", 0) or 0
161+
else:
162+
prompt_tokens = getattr(usage_data, "input_tokens", 0) or 0
163+
completion_tokens = getattr(usage_data, "output_tokens", 0) or 0
164+
except Exception as e:
165+
raise litellm.exceptions.APIError(
166+
status_code=500,
167+
message=f"Claude Agent SDK streaming query failed: {e}",
168+
model=model,
169+
llm_provider="claude-agent-sdk",
170+
)
171+
172+
if not prompt_tokens and not completion_tokens:
173+
completion_tokens = len(total_content) // 4
132174

133175
final_chunk: GenericStreamingChunk = {
134176
"text": "",
@@ -137,9 +179,9 @@ async def astreaming(self, model: str, messages: List[Dict], **kwargs) -> AsyncI
137179
"index": 0,
138180
"tool_use": None,
139181
"usage": {
140-
"completion_tokens": len(total_content.split()),
141-
"prompt_tokens": 0,
142-
"total_tokens": len(total_content.split()),
182+
"completion_tokens": completion_tokens,
183+
"prompt_tokens": prompt_tokens,
184+
"total_tokens": prompt_tokens + completion_tokens,
143185
},
144186
}
145187

0 commit comments

Comments
 (0)