@@ -127,6 +127,67 @@ Each metric will record the following dimensions:
127127- ` gateway_class ` : Underlying GatewayClass (for pricing class differentiation).
128128- ` httproute_name ` : The ` HTTPRoute ` resource name.
129129- ` httproute_namespace ` : The ` HTTPRoute ` namespace.
130+ - ` project_name ` : Human-readable name of the project that owns the route (see [ Surfacing Signals from the Edge] ( #surfacing-signals-from-the-edge ) ).
131+
132+ ---
133+
134+ ### Surfacing Signals from the Edge
135+
136+ All metering signals originate from the ** edge cluster** , where the Envoy
137+ Gateway proxies (` datum-downstream-gateway ` ) actually serve customer traffic.
138+ There is no central collection point that observes individual requests — the
139+ proxy is the only component that sees each request, so the signal must be
140+ captured, enriched, and emitted at the edge before being forwarded to the
141+ central Billing System.
142+
143+ The raw access log already carries everything the meters need * except* one
144+ thing: the ` route_name ` field identifies the owning project only by its
145+ control-plane namespace UID (e.g. ` ns-<project-uid> ` ), not by the
146+ human-readable project name. To populate the ` project_name ` dimension, three
147+ components must be updated, all operating at the edge:
148+
149+ 1 . ** Network Services Operator (controller).** When the operator reconciles a
150+ customer ` HTTPRoute ` into its downstream representation, it injects the
151+ project name as a request header (` x-datum-project-name ` ) via a
152+ ` RequestHeaderModifier ` filter on each route rule. The project name is read
153+ from the upstream cluster identity (the Milo project name) that the
154+ operator already holds while mapping upstream → downstream resources. Routes
155+ that already define a ` RequestHeaderModifier ` are merged into rather than
156+ duplicated, since Gateway API permits at most one such filter per rule.
157+
158+ 2 . ** Envoy access log format.** The ` EnvoyProxy ` access log JSON format is
159+ extended with a ` project_name ` field sourced from the injected header:
160+ ` project_name: "%REQ(X-DATUM-PROJECT-NAME)%" ` . Because the header is set on
161+ the route before the access log is written, every logged request for a
162+ customer route carries the resolved project name. (We use ` %REQ()% ` rather
163+ than ` %METADATA(ROUTE:...)% ` because Envoy Gateway's JSON access log
164+ formatter does not register the metadata formatter, so route metadata is not
165+ accessible from JSON access logs.)
166+
167+ 3 . ** Vector billing collector.** The ` billing-usage-collector-vector ` VRL
168+ transform reads the ` project_name ` field from each access log line and adds
169+ it as a dimension on all four emitted CloudEvents (requests, ingress-bytes,
170+ egress-bytes, connection-seconds), and subject. An absent or empty value (rendered by
171+ Envoy as ` "-" ` ) is normalized to an empty string so unmatched routes do not
172+ pollute the dimension.
173+
174+ This keeps the entire signal path — request handling, name resolution, log
175+ emission, parsing, and CloudEvent forwarding — co-located on the edge cluster.
176+
177+ #### Transport: how access logs reach Vector
178+
179+ The access log line must travel from the Envoy proxy to the
180+ ` billing-usage-collector-vector ` agent. Two transports are viable; see
181+ [ Access Log Transport] ( #access-log-transport-file-sink-vs-otlp-sink ) under
182+ Alternatives for the trade-offs. In short:
183+
184+ - ** File sink (stdout) + ` kubernetes_logs ` ** — the current/baseline approach,
185+ where Envoy writes JSON to stdout and Vector tails the node's container logs.
186+ This requires Vector to run as a per-node DaemonSet co-located with the Envoy
187+ pod, which holds on edge clusters but not where Vector runs as an aggregator.
188+ - ** OpenTelemetry (OTLP) sink** — Envoy pushes access logs directly to Vector's
189+ OTLP receiver over the network, independent of pod/node topology. This is
190+ implemented in a draft PR (see below).
130191
131192---
132193
@@ -373,6 +434,54 @@ The following decisions are tracked for the implementation of this enhancement:
373434
374435# # Alternatives
375436
437+ # ## Access Log Transport: File Sink vs OTLP Sink
438+
439+ The signal-collection design above is independent of *how* the Envoy access log
440+ line reaches the `billing-usage-collector-vector` agent. Two transports were
441+ evaluated :
442+
443+ # ### Option A1: File sink (stdout) + Vector `kubernetes_logs` (baseline)
444+
445+ Envoy keeps its existing `File` access log sink writing JSON to `/dev/stdout`.
446+ The container runtime persists this to the node's container log files, and
447+ Vector tails them via a `kubernetes_logs` source.
448+
449+ - *Pros:* No new ports or network hops; reuses the standard Kubernetes log
450+ collection pattern; the `File` sink is already present in the base
451+ ` EnvoyProxy` ; logs survive on disk if Vector is briefly down (checkpointed
452+ tailing).
453+ - *Cons:* Requires Vector to run as a **per-node DaemonSet co-located** with the
454+ Envoy pod, because `kubernetes_logs` can only read the node it runs on. This
455+ holds on **edge** clusters (Vector and Envoy are both DaemonSets), but breaks
456+ where the billing Vector runs as a **Stateless-Aggregator** (staging/prod), as
457+ a single aggregator pod cannot tail Envoy stdout on other nodes. It also needs
458+ a `kubernetes_logs` source plus a ClusterRole for pod metadata, pod-label
459+ filtering to avoid ingesting unrelated containers, and a `parse_json(.message)`
460+ step.
461+
462+ # ### Option A2: OpenTelemetry (OTLP) sink (implemented in draft PR)
463+
464+ Envoy adds an `OpenTelemetry` access log sink alongside the existing `File`
465+ sink, pushing access logs directly to Vector's OTLP receiver
466+ (`opentelemetry` source, gRPC :4317). The JSON fields arrive as OTLP
467+ log-record attributes, which the VRL transform normalizes to top-level fields.
468+
469+ - *Pros:* **Topology-independent** — works identically whether Vector is a
470+ DaemonSet or an aggregator, since it targets the Vector Service DNS and lets
471+ kube-proxy route. No `kubernetes_logs` source, no ClusterRole, no
472+ per-container filtering, no re-parsing of a stringified message. An OTel
473+ resource attribute (`service.name : nso-httproute-signals`) tags the stream.
474+ - *Cons:* Adds a network sink and OTLP ports to the Vector Service; introduces a
475+ push dependency (mitigated by keeping the `File` sink in parallel as a
476+ fallback / for debugging).
477+
478+ This transport is implemented in a draft PR :
479+
480+ <!-- TODO : reference the draft PR here -->
481+ PR :
482+
483+
484+
376485# ## Option B: Prometheus Scrape Delta Calculation
377486Run an operator loop that polls the Envoy `/stats` endpoint periodically.
378487- *Rejected because:* Loses per-request granularity and increases NSO statefulness/risk of double-counting.
0 commit comments