This shows how to use Llama Stack to proxy Ollama via an OpenAI compatible API.
Start Ollama and your OpenTelemetry Collector via this repository's README.
docker compose up --force-recreate --remove-orphansClean up when finished, like this:
docker compose downOnce Llama Stack is running, use uv to make an OpenAI request via chat.py:
uv run --exact -q --env-file env.local ../chat.pyOr, for the OpenAI Responses API
uv run --exact -q --env-file env.local ../chat.py --use-responses-apiuv run --exact -q --env-file env.local ../agent.py --use-responses-api- Llama Stack's Responses API connects to MCP servers server-side (unlike aigw
which proxies MCP). The agent passes MCP configuration via
HostedMCPTool. - Uses the
starterdistribution with its built-inremote::openaiprovider, pointing to Ollama viaOPENAI_BASE_URLenvironment variable. - Models require
provider_id/prefix (e.g.,openai/qwen3:0.6b)