Give your AI agent read-only visibility into your LiteLLM proxy to query users, teams, API keys, and spend data through the Model Context Protocol.
Important
This MCP server exposes read-only tools by design. It will never mutate your proxy. Configuration changes should go through your LiteLLM config.yaml.
Your LiteLLM proxy manages access, spend, and model routing across your entire org but querying it manually means digging through the UI or making raw API calls. This MCP server bridges that gap:
- Ask in plain English. Let an AI agent cross-reference teams, users, and keys for you instead of writing queries yourself.
- Audit at a glance. Instantly surface who has access to what models, which keys are orphaned, and where spend is concentrated.
- Safe by design. Every tool is read-only. No accidental mutations, no write permissions to hand out.
- Composable. Any MCP-compatible client (nClaude Code, Claude Desktop, custom agents ) can connect over HTTP with zero extra setup.
This list will grow over time
| Tool | Description |
|---|---|
get_teams_list |
List all teams |
get_users_list |
List all users with resolved team names |
get_keys_list |
List all API keys with resolved team names |
get_team_spend_info |
Aggregated spend breakdown by model for a specific team and date range |
cp .env.example .envEdit .env with your proxy details:
LITELLM_BASE_URL=http://localhost:8081
LITELLM_API_KEY=your_admin_key_here
# Optional β defaults shown
MCP_SERVER_LOGGING_LEVEL=INFO
MCP_SERVER_PORT=8000uv syncmake runThis runs uv run python ./src/litellm_mcp_server/mcp_server.py and starts the server on port 8000.
Add the server to your Claude Code MCP settings:
{
"mcpServers": {
"litellm-mcp": {
"type": "http",
"url": "http://localhost:8000/mcp"
}
}
}Or add it via the CLI:
claude mcp add --transport http litellm-mcp "http://localhost:8000/mcp" --scope userThen ask Claude something like:
"Give me a full overview of this LiteLLM proxy: list all teams, users, and API keys. Summarize who has access to what."
# Build
make docker-build
# Run (port 8000 exposed)
make docker-runThe image is built on python:3.14-alpine and managed with uv. Environment variables are passed at runtime via -e flags or an env file.
A Helm chart lives in chart/. The default setup targets a KIND cluster named homelab, but any cluster works.
1. Deploy
make deployThis builds the Docker image, loads it into the KIND cluster, and runs helm upgrade --install into the mcp namespace.
2. Pass secrets at install time
helm upgrade --install litellm-mcp-server chart/ \
--namespace mcp --create-namespace \
--set secret.litellmBaseUrl=http://litellm:8081 \
--set secret.litellmApiKey=your-admin-key3. Enable ServiceMonitor (optional)
helm upgrade --install litellm-mcp-server chart/ \
--namespace mcp --create-namespace \
--set serviceMonitor.enabled=trueSee chart/values.yaml for all configurable values.
Using the available tools, produce a single summary table of my LiteLLM gateway with one row per team. Columns: Team, Models Allowed, Users & Keys (format as
username β key alias, one per line), Total Spend, Total Requests. Below the table add 2β3 bullet points with the most important findings.
βββββββββββββββ¬βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββ¬βββββββββββββββββ
β Team β Models Allowed β Users & Keys β Total Spend β Total Requests β
βββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββΌβββββββββββββββββ€
β Admins β all-proxy-models β admin-user β β β $0.00 β 0 β
β β β (service acct) β svc-admin β β β
βββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββΌβββββββββββββββββ€
β Developers β gpt-4.1-mini β software-eng β β β $0.00 β 0 β
βββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββΌβββββββββββββββββ€
β Engineering β gpt-5.2 β developer-1 β β β $0.06 β N/A β
β β β engineer-1 β Codex β β β
βββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββΌβββββββββββββββββ€
β Platform β all-proxy-models β developer-2 β dev-2-key β $0.00 β 0 β
β β β engineer-1 β eng-1-key β β β
βββββββββββββββ΄βββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββ΄βββββββββββββββββ
Key findings:
- Engineering is the only active team β 100% of gateway spend ($0.06) comes from the single Codex
key, all on gpt-5.2.
- Access control gap: engineer-1 is a Platform member but holds the only Engineering key (Codex),
giving them cross-team access to gpt-5.2 beyond their Platform entitlement.
- Admins team uses a keyless service account pattern: svc-admin has no user_id, making individual
usage attribution impossible.
A pre-built Grafana dashboard is included in dashboard/grafana.json.
| Capability | Status |
|---|---|
| List teams | β |
| List users with team resolution | β |
| List API keys with team resolution | β |
| Team spend breakdown by model | β |
| Docker image | β |
| Kubernetes manifests | β |
| Per-user spend breakdown | π§ Planned |
| Per-key spend breakdown | π§ Planned |
| MCP authentication | π§ Planned |
This project is not affiliated with, endorsed by, or sponsored by BerriAI. "LiteLLM" is a trademark of its respective owner and is used here for descriptive purposes only.


