Skip to content

RandomSynergy17/claudeMax-OpenAI-HTTP-Proxy

Repository files navigation

Claude Max OpenAI HTTP Proxy

Turn your Claude Max subscription into a local API server

Python 3.10+ FastAPI License: MIT Claude Max Version

A drop-in dual API proxy that serves both OpenAI-compatible and Anthropic-compatible endpoints,
powered by the Claude Code CLI and your Claude Max subscription. No API key required.

Any tool, library, or application that speaks the OpenAI or Anthropic API can now use Claude models locally.

Quick StartUsage ExamplesConfigurationAPI Reference


How It Works

                         ┌──────────────────────────┐
 OpenAI SDK clients ────▶│  /openai/v1/*             │
                         │                          │     ┌─────────────────┐
                         │  FastAPI Proxy            │────▶│  Claude Code    │
                         │  (server.py)              │     │  CLI (claude -p)│
                         │  Port 4000                │◀────│  Max Sub Auth   │
 Anthropic SDK clients ─▶│                          │     └─────────────────┘
                         │  /anthropic/v1/*           │
                         └──────────────────────────┘
  1. Your app sends a standard API request to the proxy (OpenAI or Anthropic format)
  2. The proxy translates it and calls claude -p (pipe mode) with your Max subscription auth
  3. Claude's response is formatted back into the correct API response schema
  4. Your app receives a response identical to what the real API would return

Features

OpenAI API   /openai/v1/

  • Chat completions (streaming + non-streaming)
  • Legacy text completions
  • Function / tool calling
  • Model listing with GPT aliases
  • Proper OpenAI error format
  • Backwards-compatible at /v1/*
  • 501 stubs for unsupported endpoints

Anthropic API   /anthropic/v1/

  • Messages API (streaming + non-streaming)
  • Native content blocks (text, tool_use)
  • System prompt (string or block array)
  • tool_choice with auto / any / none / tool
  • Stop sequences support
  • Token count estimation
  • Cache token tracking

Shared across both APIs: Real streaming via SSE • All three Claude tiers (Opus, Sonnet, Haiku) • Multi-turn conversations • CORS enabled • Graceful shutdown • Multi-worker support • Concurrency limiting with 429 backpressure • Request ID tracing • Health check with CLI verification


Prerequisites

Requirement Details
Claude Code CLI Install and authenticate with claude auth login
Claude Max Active subscription
Python 3.10 or later
Linux With systemd (for auto-start — the proxy itself runs anywhere Python runs)

Quick Start

Option 1: Automated Install (Recommended)

git clone https://github.com/RandomSynergy17/claudeMax-OpenAI-HTTP-Proxy.git
cd claudeMax-OpenAI-HTTP-Proxy
chmod +x install.sh
./install.sh

The installer creates a virtual environment, installs dependencies, configures a systemd user service, and starts the proxy automatically.

Option 2: Manual Setup

git clone https://github.com/RandomSynergy17/claudeMax-OpenAI-HTTP-Proxy.git
cd claudeMax-OpenAI-HTTP-Proxy
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python server.py --port 4000 --host 0.0.0.0

Verify it's running

curl http://localhost:4000/health

Usage Examples

Python — OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://192.168.x.x:4000/openai/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)

Python — Anthropic SDK

import anthropic

client = anthropic.Anthropic(
    base_url="http://192.168.x.x:4000/anthropic",
    api_key="not-needed"
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(message.content[0].text)
Streaming (OpenAI)
stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
Streaming (Anthropic)
with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
Tool / Function Calling (OpenAI)
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {"location": {"type": "string"}},
                "required": ["location"]
            }
        }
    }]
)
Tool Use (Anthropic)
message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=[{
        "name": "get_weather",
        "description": "Get current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }],
    messages=[{"role": "user", "content": "What's the weather in Paris?"}]
)
curl
# OpenAI format
curl http://localhost:4000/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"Hello!"}]}'

# Anthropic format
curl http://localhost:4000/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: not-needed" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":1024,"messages":[{"role":"user","content":"Hello!"}]}'
JavaScript / TypeScript
// OpenAI
import OpenAI from "openai";
const openai = new OpenAI({ baseURL: "http://host:4000/openai/v1", apiKey: "na" });
const resp = await openai.chat.completions.create({
    model: "claude-sonnet-4-6",
    messages: [{ role: "user", content: "Hi" }]
});

// Anthropic
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({ baseURL: "http://host:4000/anthropic", apiKey: "na" });
const msg = await anthropic.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Hi" }]
});

Works With

Tool Configuration
LangChain (OpenAI) ChatOpenAI(base_url="http://host:4000/openai/v1", api_key="na")
LangChain (Anthropic) ChatAnthropic(base_url="http://host:4000/anthropic", api_key="na")
LlamaIndex OpenAI(api_base="http://host:4000/openai/v1", api_key="na")
Continue (VS Code) Set API base to http://host:4000/openai/v1
Open WebUI Add as OpenAI provider with http://host:4000/openai/v1
Cursor Set OpenAI base URL to http://host:4000/openai/v1

API Reference

OpenAI Endpoints   /openai/v1/   (also /v1/)

Endpoint Method Status
/openai/v1/chat/completions POST Full support (streaming, tools, system messages)
/openai/v1/completions POST Full support (streaming + non-streaming)
/openai/v1/models GET Lists Claude models + OpenAI aliases
/openai/v1/models/{id} GET Retrieve individual model
/openai/v1/embeddings POST 501 — Not available
/openai/v1/images/* POST 501 — Not available
/openai/v1/audio/* POST 501 — Not available
/openai/v1/fine_tuning/jobs POST 501 — Not available
/openai/v1/moderations POST 501 — Not available

Anthropic Endpoints   /anthropic/v1/

Endpoint Method Status
/anthropic/v1/messages POST Full support (streaming, tools, system, stop_sequences)
/anthropic/v1/models GET Lists all Claude models
/anthropic/v1/models/{id} GET Retrieve individual model
/anthropic/v1/messages/count_tokens POST Token count estimation

Other

Endpoint Description
GET /health Health check — verifies CLI availability, reports active/max concurrent requests, CLI version
GET / Service info and endpoint directory

Model Mapping

Use Claude model names directly on both APIs. The OpenAI surface also accepts these aliases:

OpenAI Alias Routes To Tier
gpt-4, gpt-4-turbo, gpt-4o claude-sonnet-4-6 Balanced
o1-mini, o3-mini, o4-mini claude-sonnet-4-6 Balanced
gpt-4o-mini, gpt-3.5-turbo claude-haiku-4-5 Fast
o1, o1-preview, o3 claude-opus-4-6 Most capable

All models: 200K context window32K max output tokens


Configuration

CLI Arguments

python server.py [OPTIONS]
Flag Default Description
--port 4000 Port to listen on
--host 0.0.0.0 Host to bind to (0.0.0.0 = all interfaces)
--timeout 300 CLI subprocess timeout in seconds
--max-concurrent 10 Max concurrent requests per worker
--workers 1 Number of uvicorn workers
--log-level INFO Log level (DEBUG, INFO, WARNING, ERROR)
--no-access-log Disable uvicorn access log

Environment Variables

Environment variables can be set via shell export, in a .env file in the project directory, or in a systemd EnvironmentFile. The proxy loads .env automatically on startup using python-dotenv.

To get started, copy the template:

cp .env.example .env
# Edit .env with your settings
Variable Default Description
CLAUDE_PROXY_API_KEY (unset) API key for authentication. When set, all requests must provide this key via Authorization: Bearer <key> (OpenAI) or x-api-key: <key> (Anthropic). When unset, no auth is required.
CLAUDE_PROXY_LOG_LEVEL INFO Log level (alternative to --log-level)
CLAUDE_PROXY_CORS_ORIGINS * Comma-separated allowed CORS origins
CLAUDE_PROXY_MAX_BODY_BYTES 10485760 Max request body size (10 MB)
CLAUDE_PROXY_ACCESS_LOG true Enable/disable access log
CLAUDE_PROXY_DRAIN_SECONDS 5 Graceful shutdown drain period

Multi-Worker Mode

For higher throughput, run with multiple workers:

python server.py --workers 4 --max-concurrent 5

Each worker gets its own concurrency pool.
Total capacity = workers × max-concurrent (e.g., 4 × 5 = 20 concurrent requests).

Rate Limiting

When all concurrency slots are full, the proxy returns HTTP 429 with a rate_limit_exceeded error. Clients should back off and retry.


System Service (Linux)

Automated Install

./install.sh

The script handles venv creation, dependency install, systemd service setup, and boot persistence.

Manual Install

mkdir -p ~/.config/systemd/user
cp claude-proxy.service ~/.config/systemd/user/
# Edit paths in the service file if needed
systemctl --user daemon-reload
systemctl --user enable claude-proxy.service
systemctl --user start claude-proxy.service
loginctl enable-linger $USER

Managing the Service

systemctl --user status  claude-proxy    # Check status
systemctl --user restart claude-proxy    # Restart
systemctl --user stop    claude-proxy    # Stop
journalctl --user -u claude-proxy -f     # Tail logs

Updating

cd claudeMax-OpenAI-HTTP-Proxy
git pull
cp server.py ~/claude-proxy/server.py
systemctl --user restart claude-proxy

Troubleshooting

Problem Solution
502 errors Check claude auth login and subscription status. Test: echo "hi" | claude -p
429 errors Server at capacity. Increase --max-concurrent or --workers, or retry with backoff
Connection refused from LAN Ensure --host 0.0.0.0. Check firewall: sudo ufw allow 4000/tcp
Service won't start at boot Run loginctl enable-linger $USER
Port already in use fuser -k 4000/tcp && systemctl --user restart claude-proxy
High latency Normal ~1–2s overhead per subprocess. Use Haiku for faster responses
Rate limiting Bound by Claude Max subscription limits. Reduce concurrent requests

Limitations

Limitation Detail
Subprocess latency ~1–2s overhead per request (subprocess spawn)
Rate limits Bound by Claude Max subscription limits
No embeddings Claude does not provide embedding models
No image generation Claude does not generate images
No audio Claude does not provide TTS/STT
Image input Image URLs/base64 noted in prompt but not passed through to Claude
Tool calling Implemented via prompt injection — works well but not native
n > 1 Only n=1 supported (single completion per request)
Ignored parameters OpenAI params like temperature, top_p, max_tokens are accepted but not forwarded to Claude CLI

Security

Feature Detail
API key auth Optional. Set CLAUDE_PROXY_API_KEY to require authentication. Clients must send the key via Authorization: Bearer (OpenAI) or x-api-key (Anthropic). Uses timing-safe comparison. /health and / are exempt. When unset, all requests are accepted (auth via Claude Max subscription).
No filesystem access Claude Code tools are disabled (--tools ""). Claude cannot access your filesystem
CORS Defaults to * for LAN convenience. Restrict via CLAUDE_PROXY_CORS_ORIGINS
Request size 10 MB default, configurable via CLAUDE_PROXY_MAX_BODY_BYTES
Tool sanitization Tool names and descriptions are sanitized to prevent prompt injection
Error sanitization CLI error messages have file paths stripped before returning to clients
Request tracing Every request gets an X-Request-ID header

License

MIT


Report a BugRequest a FeatureContribute

Made with Claude Code CLI

About

Drop-in OpenAI-compatible API proxy powered by Claude Code CLI and your Claude Max subscription

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors