Skip to content

MatteoMori/litellm-mcp-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LiteLLM MCP Server


Give your AI agent read-only visibility into your LiteLLM proxy to query users, teams, API keys, and spend data through the Model Context Protocol.

Important

This MCP server exposes read-only tools by design. It will never mutate your proxy. Configuration changes should go through your LiteLLM config.yaml.

demo



Why use this?

Your LiteLLM proxy manages access, spend, and model routing across your entire org but querying it manually means digging through the UI or making raw API calls. This MCP server bridges that gap:

  • Ask in plain English. Let an AI agent cross-reference teams, users, and keys for you instead of writing queries yourself.
  • Audit at a glance. Instantly surface who has access to what models, which keys are orphaned, and where spend is concentrated.
  • Safe by design. Every tool is read-only. No accidental mutations, no write permissions to hand out.
  • Composable. Any MCP-compatible client (nClaude Code, Claude Desktop, custom agents ) can connect over HTTP with zero extra setup.

Available tools

This list will grow over time

Tool Description
get_teams_list List all teams
get_users_list List all users with resolved team names
get_keys_list List all API keys with resolved team names
get_team_spend_info Aggregated spend breakdown by model for a specific team and date range

Quick start

1. Configure

cp .env.example .env

Edit .env with your proxy details:

LITELLM_BASE_URL=http://localhost:8081
LITELLM_API_KEY=your_admin_key_here

# Optional β€” defaults shown
MCP_SERVER_LOGGING_LEVEL=INFO
MCP_SERVER_PORT=8000

2. Install

uv sync

3. Run

make run

This runs uv run python ./src/litellm_mcp_server/mcp_server.py and starts the server on port 8000.


Connect to Claude Code

Add the server to your Claude Code MCP settings:

{
  "mcpServers": {
    "litellm-mcp": {
      "type": "http",
      "url": "http://localhost:8000/mcp"
    }
  }
}

Or add it via the CLI:

claude mcp add --transport http litellm-mcp "http://localhost:8000/mcp" --scope user

Then ask Claude something like:

"Give me a full overview of this LiteLLM proxy: list all teams, users, and API keys. Summarize who has access to what."


Deployment

Docker

# Build
make docker-build

# Run (port 8000 exposed)
make docker-run

The image is built on python:3.14-alpine and managed with uv. Environment variables are passed at runtime via -e flags or an env file.

Kubernetes (Helm)

A Helm chart lives in chart/. The default setup targets a KIND cluster named homelab, but any cluster works.

1. Deploy

make deploy

This builds the Docker image, loads it into the KIND cluster, and runs helm upgrade --install into the mcp namespace.

2. Pass secrets at install time

helm upgrade --install litellm-mcp-server chart/ \
  --namespace mcp --create-namespace \
  --set secret.litellmBaseUrl=http://litellm:8081 \
  --set secret.litellmApiKey=your-admin-key

3. Enable ServiceMonitor (optional)

helm upgrade --install litellm-mcp-server chart/ \
  --namespace mcp --create-namespace \
  --set serviceMonitor.enabled=true

See chart/values.yaml for all configurable values.


Example prompt

Using the available tools, produce a single summary table of my LiteLLM gateway with one row per team. Columns: Team, Models Allowed, Users & Keys (format as username β†’ key alias, one per line), Total Spend, Total Requests. Below the table add 2–3 bullet points with the most important findings.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Team     β”‚  Models Allowed  β”‚                 Users & Keys                  β”‚ Total Spend β”‚ Total Requests β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Admins      β”‚ all-proxy-models β”‚ admin-user β†’ β€”                                β”‚ $0.00       β”‚ 0              β”‚
β”‚             β”‚                  β”‚ (service acct) β†’ svc-admin                    β”‚             β”‚                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Developers  β”‚ gpt-4.1-mini     β”‚ software-eng β†’ β€”                              β”‚ $0.00       β”‚ 0              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Engineering β”‚ gpt-5.2          β”‚ developer-1 β†’ β€”                               β”‚ $0.06       β”‚ N/A            β”‚
β”‚             β”‚                  β”‚ engineer-1 β†’ Codex                            β”‚             β”‚                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Platform    β”‚ all-proxy-models β”‚ developer-2 β†’ dev-2-key                       β”‚ $0.00       β”‚ 0              β”‚
β”‚             β”‚                  β”‚ engineer-1 β†’ eng-1-key                        β”‚             β”‚                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key findings:
- Engineering is the only active team β€” 100% of gateway spend ($0.06) comes from the single Codex
  key, all on gpt-5.2.
- Access control gap: engineer-1 is a Platform member but holds the only Engineering key (Codex),
  giving them cross-team access to gpt-5.2 beyond their Platform entitlement.
- Admins team uses a keyless service account pattern: svc-admin has no user_id, making individual
  usage attribution impossible.

πŸ“Š Grafana Dashboard

A pre-built Grafana dashboard is included in dashboard/grafana.json.

example



Project status

Capability Status
List teams βœ…
List users with team resolution βœ…
List API keys with team resolution βœ…
Team spend breakdown by model βœ…
Docker image βœ…
Kubernetes manifests βœ…
Per-user spend breakdown 🚧 Planned
Per-key spend breakdown 🚧 Planned
MCP authentication 🚧 Planned

This project is not affiliated with, endorsed by, or sponsored by BerriAI. "LiteLLM" is a trademark of its respective owner and is used here for descriptive purposes only.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors