Skip to content

Latest commit

 

History

History
84 lines (60 loc) · 2.94 KB

File metadata and controls

84 lines (60 loc) · 2.94 KB

archgw

This shows how to use the Arch Gateway as an OpenAI LLM router, using its tracing configuration for OpenTelemetry.

Arch Gateway does not serve OpenAI requests. Rather, it configures an Envoy proxy according to its configuration. Envoy handles requests, collects telemetry and forwards them to Ollama via the OpenAI API.

Setup

Start ollama and the otel collector via this repository's README.

Run Arch Gateway

Arch Gateway is a python command that internally runs Docker. Hence, you need a working Docker configuration. Run archgw using uv run from uv:

uv run --python 3.12 --with archgw -- archgw up arch_config.yaml

When finished, clean up like this:

uv run --python 3.12 --with archgw -- archgw down

Start Prometheus Scraping

Elastic Stack

If your OpenTelemetry backend is Elasticsearch, you can pump Prometheus metrics coming from Arch Gateway to Elasticsearch like this:

docker compose -f docker-compose-elastic.yml run --rm prometheus-pump

otel-tui

If you are using otel-tui to visualize OpenTelemetry data, you can add Arch Gateway's Prometheus endpoint to it when starting, like this:

otel-tui --prom-target http://localhost:19901/stats?format=prometheus

Call Arch Gateway with python

Once Arch Gateway is running, use uv to make an OpenAI request via chat.py:

uv run --exact -q --env-file env.local ../chat.py

Notes

OpenTelemetry signals are a function of native Envoy support and anything added in Arch Gateway's wasm filter.

  • archgw invokes envoy in a Docker container, which is why this has no instructions to run from Docker (to avoid nested docker).
  • Traces come from Envoy, whose configuration is written by archgw. At the moment, this hard-codes aspects including default ports.
  • Prometheus metrics show the cluster as "ollama_host" - the provider_interface plus the first segment of the hostname (dots truncate the rest). The "host" comes from "host.docker.internal".
  • Until this resolves, don't use --use-responses-api.
  • This example uses Python 3.12 until torch has wheels for 3.14.

The chat prompt was designed to be idempotent, but the results are not. You may see something besides 'South Atlantic Ocean.'. Just run it again until we find a way to make the results idempotent.