This shows how to use the Arch Gateway as an OpenAI LLM router, using
its tracing configuration for OpenTelemetry.
Arch Gateway does not serve OpenAI requests. Rather, it configures an Envoy proxy according to its configuration. Envoy handles requests, collects telemetry and forwards them to Ollama via the OpenAI API.
Start ollama and the otel collector via this repository's README.
Arch Gateway is a python command that internally runs Docker. Hence, you need a
working Docker configuration. Run archgw using uv run from uv:
uv run --python 3.12 --with archgw -- archgw up arch_config.yamlWhen finished, clean up like this:
uv run --python 3.12 --with archgw -- archgw downIf your OpenTelemetry backend is Elasticsearch, you can pump Prometheus metrics coming from Arch Gateway to Elasticsearch like this:
docker compose -f docker-compose-elastic.yml run --rm prometheus-pumpIf you are using otel-tui to visualize OpenTelemetry data, you can add Arch Gateway's Prometheus endpoint to it when starting, like this:
otel-tui --prom-target http://localhost:19901/stats?format=prometheusOnce Arch Gateway is running, use uv to make an OpenAI request via chat.py:
uv run --exact -q --env-file env.local ../chat.pyOpenTelemetry signals are a function of native Envoy support and anything added in Arch Gateway's wasm filter.
archgwinvokesenvoyin a Docker container, which is why this has no instructions to run from Docker (to avoid nested docker).- Traces come from Envoy, whose configuration is written by
archgw. At the moment, this hard-codes aspects including default ports. - Prometheus metrics show the cluster as "ollama_host" - the provider_interface plus the first segment of the hostname (dots truncate the rest). The "host" comes from "host.docker.internal".
- Until this resolves, don't use
--use-responses-api. - This example uses Python 3.12 until torch has wheels for 3.14.
The chat prompt was designed to be idempotent, but the results are not. You may see something besides 'South Atlantic Ocean.'. Just run it again until we find a way to make the results idempotent.