fix: Docs rewrite, file cleanup, and provider bug fixes

cabinlab · cabinlab · commit a2ecac711741 · 2026-02-12T02:10:13.000-09:00
- Remove stale scripts/litellm-entrypoint.sh (referenced deleted startup.py)
- Remove unused API key placeholders from .env.example, fix CLI command
- Replace ASCII art with Mermaid diagrams in README, fix step numbering
- Fix npm package name (@anthropic-ai/claude-code, not claude-agent-sdk)
- Add parameter handling docs to USAGE-EXAMPLES.md (drop_params behavior)
- Fix SECURITY.md key verification to not leak secret value
- Fix provider: dict-based usage parsing with input_tokens/output_tokens
- Fix provider: asyncio.run() with thread-pool fallback for running loops
- Fix provider: ResultMessage handling in streaming with len//4 fallback
- Fix provider: try/except with APIError around query() calls
- Fix CI: test job builds local image instead of pulling unpushed registry image
diff --git a/.env.example b/.env.example
@@ -14,46 +14,13 @@ DATABASE_URL=postgresql://llmproxy:your-secure-database-password-here@db:5432/li
 STORE_MODEL_IN_DB=True
 
 # REQUIRED for Claude Pro/Max users: OAuth token
-# Generate with: claude oauth start
+# Generate with: claude setup-token
 # Token format: sk-ant-oat01-...
 CLAUDE_CODE_OAUTH_TOKEN=
 
 # Logging
 LITELLM_LOG=INFO
 
-# OpenAI (not used but may be referenced)
-OPENAI_API_KEY=""
-OPENAI_BASE_URL=""
-
-# Anthropic (not used but may be referenced)
-ANTHROPIC_API_KEY=""
-
-# Cohere
-COHERE_API_KEY=""
-
-# Azure
-AZURE_API_BASE=""
-AZURE_API_VERSION=""
-AZURE_API_KEY=""
-
-# Replicate
-REPLICATE_API_KEY=""
-REPLICATE_API_TOKEN=""
-
-# OpenRouter
-OR_SITE_URL=""
-OR_APP_NAME="LiteLLM Claude Code Provider"
-OR_API_KEY=""
-
-# Infisical
-INFISICAL_TOKEN=""
-
-# Novita AI
-NOVITA_API_KEY=""
-
-# INFINITY
-INFINITY_API_KEY=""
-
 # Open WebUI Configuration (for compose-openwebui.yaml)
 # Port for Open WebUI interface (default: 8090)
 OPEN_WEBUI_PORT=8090
diff --git a/.github/workflows/build-and-publish.yml b/.github/workflows/build-and-publish.yml
@@ -73,35 +73,37 @@ jobs:
             BUILDKIT_INLINE_CACHE=1
 
   test:
-    needs: build
     if: github.event_name == 'pull_request'
     runs-on: ubuntu-latest
 
     steps:
       - name: Checkout repository
         uses: actions/checkout@v4
 
+      - name: Build test image
+        run: docker build -t litellm-claude-test .
+
       - name: Test container
         run: |
           # Test that the container can start
           docker run --rm \
             -e LITELLM_MASTER_KEY=sk-test-key \
             -e CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-test \
-            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.number }} \
+            litellm-claude-test \
             python --version
-          
+
           # Test that LiteLLM is installed
           docker run --rm \
             -e LITELLM_MASTER_KEY=sk-test-key \
             -e CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-test \
-            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.number }} \
+            litellm-claude-test \
             litellm --version || echo "LiteLLM version check"
-          
+
           # Test that the Claude Agent SDK is importable
           docker run --rm \
             -e LITELLM_MASTER_KEY=sk-test-key \
             -e CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-test \
-            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:pr-${{ github.event.number }} \
+            litellm-claude-test \
             python -c "import claude_agent_sdk; print('Claude Agent SDK available')"
 
   release:
diff --git a/README.md b/README.md
@@ -5,13 +5,10 @@
 
 Dockerized LiteLLM with custom provider that makes Claude Agent SDK available through the standard OpenAI-compatible API interface. Based on Anthropic's official [Claude Agent SDK](https://docs.anthropic.com/en/docs/claude-agent-sdk) documentation:
 
-```
-┌─────────────────┐         ╭──────────────╮         ┌─────────────────┐
-│                 │         │              │         │   Open WebUI,   │
-│ Claude Agent    │ ◄─────► │   LiteLLM    │ ◄─────► │    Grafiti,     │
-│     SDK         │         │              │         │ LangChain, etc. │
-└─────────────────┘         ╰──────────────╯         └─────────────────┘
-     OAuth/API                 Translation          OpenAI Compatible App
+```mermaid
+graph LR
+    A["Claude Agent SDK<br/>OAuth/API"] <--> B["LiteLLM<br/>Translation"]
+    B <--> C["Open WebUI, Graphiti,<br/>LangChain, etc."]
 ```
 
 ## Available Image
@@ -56,18 +53,18 @@ based on our [Claude Code SDK Docker images](https://github.com/cabinlab/claude-
    cp .env.example .env
    ```
 
-3. **Set your master key** (REQUIRED):
+2. **Set your master key** (REQUIRED):
    ```bash
    # Edit .env and update LITELLM_MASTER_KEY
    LITELLM_MASTER_KEY=sk-your-desired-custom-key
    ```
 
    See [Security Guide](docs/SECURITY.md) for key generation best practices
 
-4. **Get your Claude OAuth token** (wherever you have Claude Code installed):
+3. **Get your Claude OAuth token** (wherever you have Claude Code installed):
    ```bash
    # If you don't have the Claude CLI installed:
-   npm install -g @anthropic-ai/claude-agent-sdk
+   npm install -g @anthropic-ai/claude-code
 
    # Generate a long-lived token
    claude setup-token
@@ -77,18 +74,18 @@ based on our [Claude Code SDK Docker images](https://github.com/cabinlab/claude-
    ```
 
 
-5. **Add the token to your .env file**:
+4. **Add the token to your .env file**:
    ```bash
    # Edit .env and add your token:
    CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-your-token-here
    ```
 
-6. **Start the services**:
+5. **Start the services**:
    ```bash
    docker-compose up -d
    ```
 
-7. **Verify it's working**:
+6. **Verify it's working**:
 
    ### Web UI
    Navigate to `http://localhost:4000/ui/` and select Test Key:
@@ -139,7 +136,7 @@ docker-compose restart litellm
 1. **Long-lived OAuth Tokens** (Recommended for Claude Pro/Max users)
    - Generate with `claude setup-token` on your host machine
    - Set `CLAUDE_CODE_OAUTH_TOKEN` in your `.env` file
-   - Tokens start with `sk-ant-oat01-` and last for 1 year
+   - Tokens start with `sk-ant-oat01-`
    - Authentication persists across container restarts via Docker volume
 
 2. **Interactive Authentication** (Alternative)
@@ -178,8 +175,12 @@ This ensures authentication persists across container restarts.
 
 ## Architecture
 
-```
-Client Application → LiteLLM Proxy → Claude Agent SDK Provider → Claude Agent SDK → Claude API
+```mermaid
+graph LR
+    A[Client App] --> B[LiteLLM Proxy]
+    B --> C[Claude Agent SDK Provider]
+    C --> D[Claude Agent SDK]
+    D --> E[Claude API]
 ```
 
 The provider:
diff --git a/docs/SECURITY.md b/docs/SECURITY.md
@@ -67,8 +67,8 @@ This system uses OAuth for Claude authentication (stored in Docker volume) and A
 To verify your security setup:
 
 ```bash
-# Check if custom key is set (should not show default)
-docker-compose exec litellm env | grep LITELLM_MASTER_KEY
+# Confirm the key is set (does not print the actual value)
+docker-compose exec litellm sh -c 'echo "LITELLM_MASTER_KEY is set (${#LITELLM_MASTER_KEY} chars)"'
 
 # Check startup logs for warnings
 docker-compose logs litellm | grep "WARNING"
diff --git a/docs/USAGE-EXAMPLES.md b/docs/USAGE-EXAMPLES.md
@@ -80,6 +80,8 @@ curl -X POST http://localhost:4000/v1/chat/completions \
   }'
 ```
 
+> **Note:** The `temperature` parameter in this example is silently dropped due to `drop_params: true` in `litellm_config.yaml`. See [Parameter Handling](#parameter-handling) below.
+
 ## JavaScript/TypeScript
 
 ```javascript
@@ -111,10 +113,39 @@ main();
 4. **Features Supported**:
    - Chat completions (`/v1/chat/completions`)
    - Model listing (`/v1/models`)
-   - Standard OpenAI parameters (temperature, max_tokens, etc.)
+   - Streaming responses (`"stream": true`)
 
 5. **Features NOT Supported**:
    - Embeddings
+   - OpenAI-specific parameters (see [Parameter Handling](#parameter-handling) below)
+
+## Parameter Handling
+
+Because LiteLLM is configured with `drop_params: true` and the Claude Agent SDK manages its own parameters, most OpenAI-specific parameters are silently dropped.
+
+### Parameters that work
+
+| Parameter | Description |
+|-----------|-------------|
+| `model` | Model selection (`sonnet`, `opus`, `haiku`) |
+| `messages` | Conversation messages array |
+| `stream` | Enable streaming responses (`true`/`false`) |
+
+### Parameters silently dropped
+
+The following parameters are accepted without error but have **no effect**:
+
+| Parameter | Why |
+|-----------|-----|
+| `temperature` | Claude Agent SDK manages sampling internally |
+| `top_p` | Claude Agent SDK manages sampling internally |
+| `max_tokens` | Claude Agent SDK manages output length internally |
+| `frequency_penalty` | Not supported by Claude Agent SDK |
+| `presence_penalty` | Not supported by Claude Agent SDK |
+| `stop` | Not supported by Claude Agent SDK |
+| `tools` / `tool_choice` | Not supported through this provider |
+
+This is configured via `drop_params: true` in `config/litellm_config.yaml`. Without this setting, unsupported parameters would cause errors.
 
 
 ## Environment Variables for Your App
diff --git a/providers/claude_agent_provider.py b/providers/claude_agent_provider.py
@@ -1,4 +1,5 @@
 import asyncio
+import concurrent.futures
 from typing import Dict, List, Iterator, AsyncIterator
 import uuid
 from datetime import datetime
@@ -68,12 +69,18 @@ def create_litellm_response(
 
     def completion(self, model: str, messages: List[Dict], **kwargs) -> ModelResponse:
         """Sync completion wrapper."""
-        loop = asyncio.new_event_loop()
-        asyncio.set_event_loop(loop)
         try:
-            return loop.run_until_complete(self.acompletion(model, messages, **kwargs))
-        finally:
-            loop.close()
+            loop = asyncio.get_running_loop()
+        except RuntimeError:
+            loop = None
+
+        if loop and loop.is_running():
+            with concurrent.futures.ThreadPoolExecutor() as pool:
+                return pool.submit(
+                    asyncio.run, self.acompletion(model, messages, **kwargs)
+                ).result()
+        else:
+            return asyncio.run(self.acompletion(model, messages, **kwargs))
 
     async def acompletion(self, model: str, messages: List[Dict], **kwargs) -> ModelResponse:
         """Async completion using Claude Agent SDK with model selection."""
@@ -86,15 +93,28 @@ async def acompletion(self, model: str, messages: List[Dict], **kwargs) -> Model
         prompt_tokens = 0
         completion_tokens = 0
 
-        async for message in query(prompt=prompt, options=options):
-            if isinstance(message, AssistantMessage):
-                for block in message.content:
-                    if isinstance(block, TextBlock):
-                        response_content += block.text
-            elif isinstance(message, ResultMessage):
-                if hasattr(message, "usage") and message.usage:
-                    prompt_tokens = getattr(message.usage, "prompt_tokens", 0) or 0
-                    completion_tokens = getattr(message.usage, "completion_tokens", 0) or 0
+        try:
+            async for message in query(prompt=prompt, options=options):
+                if isinstance(message, AssistantMessage):
+                    for block in message.content:
+                        if isinstance(block, TextBlock):
+                            response_content += block.text
+                elif isinstance(message, ResultMessage):
+                    if hasattr(message, "usage") and message.usage:
+                        usage_data = message.usage
+                        if isinstance(usage_data, dict):
+                            prompt_tokens = usage_data.get("input_tokens", 0) or 0
+                            completion_tokens = usage_data.get("output_tokens", 0) or 0
+                        else:
+                            prompt_tokens = getattr(usage_data, "input_tokens", 0) or 0
+                            completion_tokens = getattr(usage_data, "output_tokens", 0) or 0
+        except Exception as e:
+            raise litellm.exceptions.APIError(
+                status_code=500,
+                message=f"Claude Agent SDK query failed: {e}",
+                model=model,
+                llm_provider="claude-agent-sdk",
+            )
 
         return self.create_litellm_response(
             response_content, model, prompt_tokens, completion_tokens
@@ -112,23 +132,45 @@ async def astreaming(self, model: str, messages: List[Dict], **kwargs) -> AsyncI
         options = ClaudeAgentOptions(model=claude_model)
 
         total_content = ""
+        prompt_tokens = 0
+        completion_tokens = 0
 
-        async for message in query(prompt=prompt, options=options):
-            if isinstance(message, AssistantMessage):
-                for block in message.content:
-                    if isinstance(block, TextBlock):
-                        content = block.text
-                        total_content += content
-
-                        chunk: GenericStreamingChunk = {
-                            "text": content,
-                            "is_finished": False,
-                            "finish_reason": None,
-                            "index": 0,
-                            "tool_use": None,
-                            "usage": None,
-                        }
-                        yield chunk
+        try:
+            async for message in query(prompt=prompt, options=options):
+                if isinstance(message, AssistantMessage):
+                    for block in message.content:
+                        if isinstance(block, TextBlock):
+                            content = block.text
+                            total_content += content
+
+                            chunk: GenericStreamingChunk = {
+                                "text": content,
+                                "is_finished": False,
+                                "finish_reason": None,
+                                "index": 0,
+                                "tool_use": None,
+                                "usage": None,
+                            }
+                            yield chunk
+                elif isinstance(message, ResultMessage):
+                    if hasattr(message, "usage") and message.usage:
+                        usage_data = message.usage
+                        if isinstance(usage_data, dict):
+                            prompt_tokens = usage_data.get("input_tokens", 0) or 0
+                            completion_tokens = usage_data.get("output_tokens", 0) or 0
+                        else:
+                            prompt_tokens = getattr(usage_data, "input_tokens", 0) or 0
+                            completion_tokens = getattr(usage_data, "output_tokens", 0) or 0
+        except Exception as e:
+            raise litellm.exceptions.APIError(
+                status_code=500,
+                message=f"Claude Agent SDK streaming query failed: {e}",
+                model=model,
+                llm_provider="claude-agent-sdk",
+            )
+
+        if not prompt_tokens and not completion_tokens:
+            completion_tokens = len(total_content) // 4
 
         final_chunk: GenericStreamingChunk = {
             "text": "",
@@ -137,9 +179,9 @@ async def astreaming(self, model: str, messages: List[Dict], **kwargs) -> AsyncI
             "index": 0,
             "tool_use": None,
             "usage": {
-                "completion_tokens": len(total_content.split()),
-                "prompt_tokens": 0,
-                "total_tokens": len(total_content.split()),
+                "completion_tokens": completion_tokens,
+                "prompt_tokens": prompt_tokens,
+                "total_tokens": prompt_tokens + completion_tokens,
             },
         }
 
diff --git a/scripts/litellm-entrypoint.sh b/scripts/litellm-entrypoint.sh