|
1 | 1 | # Relay observability |
2 | 2 |
|
3 | | -The relay Alchemy stack owns Axiom resources for post-hoc diagnostics: |
| 3 | +The relay Alchemy stack owns a focused Axiom trace setup: |
4 | 4 |
|
5 | | -- `t3-code-relay-events` for Effect logs and spans |
6 | | -- `t3-code-relay-metrics` for Effect metrics |
7 | | -- `t3-code-relay-otel-ingest` for Worker OTLP ingest |
8 | | -- `t3-code-relay-readonly-query` for human/agent log lookup |
9 | | -- `T3 Code Relay Operations` dashboard |
10 | | -- starter views for recent logs and recent failures |
11 | | -- monitors for warning/error logs, APNS failures, managed tunnel provisioning failures, and quiet log ingestion |
| 5 | +- `t3-code-relay-traces`, an OpenTelemetry trace dataset for Worker requests |
| 6 | +- `t3-code-relay-otel-ingest`, a dataset-scoped ingest token bound to the Worker |
| 7 | +- `t3-code-relay-readonly-query`, a dataset-scoped token for scripted diagnostics |
| 8 | +- `t3-code-relay-recent-spans`, a view of recent request and endpoint spans |
12 | 9 |
|
13 | 10 | Deploy from `infra/relay` with the normal Alchemy workflow: |
14 | 11 |
|
15 | 12 | ```sh |
16 | 13 | bun run deploy |
17 | 14 | ``` |
18 | 15 |
|
19 | | -Alchemy resolves Axiom credentials through the Axiom provider. Use either environment credentials or `alchemy login --configure` before deploy. |
| 16 | +Alchemy resolves Axiom deployment credentials through its provider. At runtime, the Worker |
| 17 | +receives only the scoped ingest token; it does not receive the diagnostics query token. |
20 | 18 |
|
21 | | -Useful APL queries: |
| 19 | +The Worker emits Effect's built-in HTTP server spans plus endpoint and database child spans. |
| 20 | +Effect's OpenTelemetry exporter stores semantic HTTP attributes below the `attributes.` prefix. |
| 21 | +For example: |
22 | 22 |
|
23 | 23 | ```apl |
24 | | -['t3-code-relay-events'] |
| 24 | +['t3-code-relay-traces'] |
| 25 | +| where name startswith 'http.server' |
| 26 | +| project _time, name, trace_id, duration, |
| 27 | + ['attributes.http.request.method'], |
| 28 | + ['attributes.url.path'], |
| 29 | + ['attributes.http.response.status_code'] |
25 | 30 | | order by _time desc |
26 | 31 | | limit 200 |
27 | 32 | ``` |
28 | 33 |
|
29 | | -```apl |
30 | | -['t3-code-relay-events'] |
31 | | -| extend logSeverity = column_ifexists('severityText', '') |
32 | | -| extend logBody = column_ifexists('body', '') |
33 | | -| where logSeverity in ("WARN", "WARNING", "ERROR", "FATAL") |
34 | | - or logBody contains "failed" |
35 | | - or logBody contains "error" |
36 | | -| order by _time desc |
37 | | -| limit 200 |
38 | | -``` |
39 | | - |
40 | | -Metrics intentionally capture product and state signals that are not just trace counts: |
41 | | - |
42 | | -- `relay_managed_tunnel_provisions_total`: managed tunnel provisioning outcomes, split by `created` versus `reused` |
43 | | -- `relay_environment_links_total`: link and unlink lifecycle operations |
44 | | -- `relay_managed_tunnels_active`: current active managed-tunnel links |
45 | | -- `relay_environment_links_active`: current active environment links |
46 | | -- `relay_mobile_devices_registered`: current registered mobile devices |
47 | | -- `relay_live_activity_targets_active`: current active Live Activity targets |
48 | | -- `relay_agent_activities_active`: current active agent activity rows |
49 | | -- `relay_agent_activity_publishes_total`: agent activity publish/replay lifecycle events |
50 | | -- `relay_apns_deliveries_total`: APNS enqueue/send outcomes for Live Activities and push notifications |
51 | | - |
52 | | -The `*_active` and `*_registered` values are gauges refreshed from the relay database, which is the source of truth for current state. Lifecycle counters are updated from the mutation path after successful writes or delivery outcomes. |
53 | | - |
54 | | -Useful metrics queries: |
55 | | - |
56 | | -```mpl |
57 | | -`t3-code-relay-metrics`:`relay_managed_tunnels_active` |
58 | | -| group using sum |
59 | | -``` |
60 | | - |
61 | | -```mpl |
62 | | -`t3-code-relay-metrics`:`relay_managed_tunnel_provisions_total` |
63 | | -| map increase |
64 | | -| align to 5m using sum |
65 | | -| group by outcome, tunnelProvisionKind using sum |
66 | | -``` |
| 34 | +Endpoint failure annotations and other relay-specific attributes are also emitted under |
| 35 | +`attributes.relay.*` when present on a span. |
67 | 36 |
|
68 | | -Agents should prefer Axiom views or APL queries for completed incidents instead of tailing the Cloudflare Worker. Use the read-only query token when scripted access is needed; keep the ingest token reserved for the Worker. |
| 37 | +Agents should prefer the provisioned view or APL queries for completed incidents instead of |
| 38 | +tailing the Cloudflare Worker. Use the read-only query token when scripted access is needed; |
| 39 | +keep the ingest token reserved for the Worker. |
0 commit comments