Skip to content

Commit 5b7a0ad

Browse files
committed
docs: document the project name dimension and access log transport options for HTTP traffic metering
1 parent 3400477 commit 5b7a0ad

1 file changed

Lines changed: 109 additions & 0 deletions

File tree

docs/enhancements/http-traffic-metering.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,67 @@ Each metric will record the following dimensions:
127127
- `gateway_class`: Underlying GatewayClass (for pricing class differentiation).
128128
- `httproute_name`: The `HTTPRoute` resource name.
129129
- `httproute_namespace`: The `HTTPRoute` namespace.
130+
- `project_name`: Human-readable name of the project that owns the route (see [Surfacing Signals from the Edge](#surfacing-signals-from-the-edge)).
131+
132+
---
133+
134+
### Surfacing Signals from the Edge
135+
136+
All metering signals originate from the **edge cluster**, where the Envoy
137+
Gateway proxies (`datum-downstream-gateway`) actually serve customer traffic.
138+
There is no central collection point that observes individual requests — the
139+
proxy is the only component that sees each request, so the signal must be
140+
captured, enriched, and emitted at the edge before being forwarded to the
141+
central Billing System.
142+
143+
The raw access log already carries everything the meters need *except* one
144+
thing: the `route_name` field identifies the owning project only by its
145+
control-plane namespace UID (e.g. `ns-<project-uid>`), not by the
146+
human-readable project name. To populate the `project_name` dimension, three
147+
components must be updated, all operating at the edge:
148+
149+
1. **Network Services Operator (controller).** When the operator reconciles a
150+
customer `HTTPRoute` into its downstream representation, it injects the
151+
project name as a request header (`x-datum-project-name`) via a
152+
`RequestHeaderModifier` filter on each route rule. The project name is read
153+
from the upstream cluster identity (the Milo project name) that the
154+
operator already holds while mapping upstream → downstream resources. Routes
155+
that already define a `RequestHeaderModifier` are merged into rather than
156+
duplicated, since Gateway API permits at most one such filter per rule.
157+
158+
2. **Envoy access log format.** The `EnvoyProxy` access log JSON format is
159+
extended with a `project_name` field sourced from the injected header:
160+
`project_name: "%REQ(X-DATUM-PROJECT-NAME)%"`. Because the header is set on
161+
the route before the access log is written, every logged request for a
162+
customer route carries the resolved project name. (We use `%REQ()%` rather
163+
than `%METADATA(ROUTE:...)%` because Envoy Gateway's JSON access log
164+
formatter does not register the metadata formatter, so route metadata is not
165+
accessible from JSON access logs.)
166+
167+
3. **Vector billing collector.** The `billing-usage-collector-vector` VRL
168+
transform reads the `project_name` field from each access log line and adds
169+
it as a dimension on all four emitted CloudEvents (requests, ingress-bytes,
170+
egress-bytes, connection-seconds), and subject. An absent or empty value (rendered by
171+
Envoy as `"-"`) is normalized to an empty string so unmatched routes do not
172+
pollute the dimension.
173+
174+
This keeps the entire signal path — request handling, name resolution, log
175+
emission, parsing, and CloudEvent forwarding — co-located on the edge cluster.
176+
177+
#### Transport: how access logs reach Vector
178+
179+
The access log line must travel from the Envoy proxy to the
180+
`billing-usage-collector-vector` agent. Two transports are viable; see
181+
[Access Log Transport](#access-log-transport-file-sink-vs-otlp-sink) under
182+
Alternatives for the trade-offs. In short:
183+
184+
- **File sink (stdout) + `kubernetes_logs`** — the current/baseline approach,
185+
where Envoy writes JSON to stdout and Vector tails the node's container logs.
186+
This requires Vector to run as a per-node DaemonSet co-located with the Envoy
187+
pod, which holds on edge clusters but not where Vector runs as an aggregator.
188+
- **OpenTelemetry (OTLP) sink** — Envoy pushes access logs directly to Vector's
189+
OTLP receiver over the network, independent of pod/node topology. This is
190+
implemented in a draft PR (see below).
130191

131192
---
132193

@@ -373,6 +434,54 @@ The following decisions are tracked for the implementation of this enhancement:
373434

374435
## Alternatives
375436

437+
### Access Log Transport: File Sink vs OTLP Sink
438+
439+
The signal-collection design above is independent of *how* the Envoy access log
440+
line reaches the `billing-usage-collector-vector` agent. Two transports were
441+
evaluated:
442+
443+
#### Option A1: File sink (stdout) + Vector `kubernetes_logs` (baseline)
444+
445+
Envoy keeps its existing `File` access log sink writing JSON to `/dev/stdout`.
446+
The container runtime persists this to the node's container log files, and
447+
Vector tails them via a `kubernetes_logs` source.
448+
449+
- *Pros:* No new ports or network hops; reuses the standard Kubernetes log
450+
collection pattern; the `File` sink is already present in the base
451+
`EnvoyProxy`; logs survive on disk if Vector is briefly down (checkpointed
452+
tailing).
453+
- *Cons:* Requires Vector to run as a **per-node DaemonSet co-located** with the
454+
Envoy pod, because `kubernetes_logs` can only read the node it runs on. This
455+
holds on **edge** clusters (Vector and Envoy are both DaemonSets), but breaks
456+
where the billing Vector runs as a **Stateless-Aggregator** (staging/prod), as
457+
a single aggregator pod cannot tail Envoy stdout on other nodes. It also needs
458+
a `kubernetes_logs` source plus a ClusterRole for pod metadata, pod-label
459+
filtering to avoid ingesting unrelated containers, and a `parse_json(.message)`
460+
step.
461+
462+
#### Option A2: OpenTelemetry (OTLP) sink (implemented in draft PR)
463+
464+
Envoy adds an `OpenTelemetry` access log sink alongside the existing `File`
465+
sink, pushing access logs directly to Vector's OTLP receiver
466+
(`opentelemetry` source, gRPC :4317). The JSON fields arrive as OTLP
467+
log-record attributes, which the VRL transform normalizes to top-level fields.
468+
469+
- *Pros:* **Topology-independent** — works identically whether Vector is a
470+
DaemonSet or an aggregator, since it targets the Vector Service DNS and lets
471+
kube-proxy route. No `kubernetes_logs` source, no ClusterRole, no
472+
per-container filtering, no re-parsing of a stringified message. An OTel
473+
resource attribute (`service.name: nso-httproute-signals`) tags the stream.
474+
- *Cons:* Adds a network sink and OTLP ports to the Vector Service; introduces a
475+
push dependency (mitigated by keeping the `File` sink in parallel as a
476+
fallback / for debugging).
477+
478+
This transport is implemented in a draft PR:
479+
480+
<!-- TODO: reference the draft PR here -->
481+
PR:
482+
483+
484+
376485
### Option B: Prometheus Scrape Delta Calculation
377486
Run an operator loop that polls the Envoy `/stats` endpoint periodically.
378487
- *Rejected because:* Loses per-request granularity and increases NSO statefulness/risk of double-counting.

0 commit comments

Comments
 (0)