Skip to content

Commit a39e81b

Browse files
alxckntoddbaert
andauthored
docs: Reorganize metrics documentation and add gRPC sync metrics docs (#1917)
## This PR - documents metrics added in #1861 - reorganizes the documentation about metrics to avoid confusion between available metrics across services exposed by flagd --------- Signed-off-by: Alexandre Chakroun <achakroun@macmail.fr> Signed-off-by: Todd Baert <todd.baert@dynatrace.com> Co-authored-by: Todd Baert <todd.baert@dynatrace.com>
1 parent 176866e commit a39e81b

3 files changed

Lines changed: 62 additions & 14 deletions

File tree

docs/reference/flagd-ofrep.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,8 @@ curl -X POST 'http://localhost:8016/ofrep/v1/evaluate/flags'
2424
```
2525

2626
See the [cheat sheet](./cheat-sheet.md#ofrep-api-http) for more OFREP examples including context-sensitive evaluation and selectors.
27+
28+
## Monitoring
29+
30+
The OFREP endpoint is instrumented with OpenTelemetry HTTP and flag evaluation metrics.
31+
See the [Monitoring reference](./monitoring.md#http-metrics) for the full list of exposed metrics and their attributes.

docs/reference/grpc-sync-service.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,8 @@ final FlagdProvider flagdProvider =
4242
```
4343

4444
See the [cheat sheet](./cheat-sheet.md#grpc-sync-api-syncproto) for `grpcurl` examples using `FetchAllFlags` and `SyncFlags`.
45+
46+
## Monitoring
47+
48+
The gRPC sync service is instrumented with OpenTelemetry metrics for monitoring active connections and stream lifecycles.
49+
See the [Monitoring reference](./monitoring.md#grpc-sync-metrics) for the full list of exposed metrics and their attributes.

docs/reference/monitoring.md

Lines changed: 52 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
description: monitoring and telemetry flagd and flagd providers
2+
description: monitoring and telemetry for flagd HTTP, gRPC sync, and flagd providers
33
---
44

55
# Monitoring
@@ -45,26 +45,64 @@ Given below is the current implementation overview of flagd telemetry internals,
4545

4646
## Metrics
4747

48-
flagd exposes the following metrics:
48+
> Please note that metric names may vary based on the consuming monitoring tool naming requirements.
49+
> For example, the transformation of OTLP metrics to Prometheus is described [here](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#otlp-metric-points-to-prometheus).
50+
51+
### HTTP Metrics
52+
53+
These metrics apply to both the [flag evaluation](./specifications/protos.md) and [OFREP](./flagd-ofrep.md) endpoints. flagd uses the [OpenTelemetry Semantic Conventions for HTTP](https://opentelemetry.io/docs/specs/semconv/http/http-metrics/):
54+
55+
- `http.server.request.duration` - Measures the duration of inbound HTTP requests (seconds). Histogram buckets: 5ms, 10ms, 25ms, 50ms, 75ms, 100ms, 250ms, 500ms, 750ms, 1s, 2.5s, 5s, 7.5s, 10s.
56+
- `http.server.request.body.size` - Measures the size of HTTP request messages (bytes)
57+
- `http.server.response.body.size` - Measures the size of HTTP response messages (bytes)
58+
59+
For the full list of attributes on these metrics, see the [OpenTelemetry HTTP Server Metrics](https://opentelemetry.io/docs/specs/semconv/http/http-metrics/#http-server) semantic conventions.
60+
61+
### Flag Evaluation Metrics
62+
63+
These metrics are recorded on every [flag evaluation](./specifications/protos.md), regardless of transport (HTTP, gRPC, connect). Attribute names are inspired by the [OpenTelemetry Semantic Conventions for Feature Flags](https://opentelemetry.io/docs/specs/semconv/feature-flags/feature-flags-events/):
4964

50-
- `http.server.request.duration` - Measures the duration of inbound HTTP requests
51-
- `http.server.response.body.size` - Measures the size of HTTP response messages
52-
- `http.server.active_requests` - Measures the number of concurrent HTTP requests that are currently in-flight
5365
- `feature_flag.flagd.impression` - Measures the number of evaluations for a given flag
5466
- `feature_flag.flagd.result.reason` - Measures the number of evaluations for a given reason
5567

56-
> Please note that metric names may vary based on the consuming monitoring tool naming requirements.
57-
> For example, the transformation of OTLP metrics to Prometheus is described [here](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#otlp-metric-points-to-prometheus).
68+
**Attributes:**
69+
70+
- `feature_flag.key` - The flag key being evaluated
71+
- `feature_flag.result.variant` - The variant returned by the evaluation
72+
- `feature_flag.provider.name` - The feature flag provider name (always `flagd`)
73+
- `feature_flag.reason` - The evaluation reason (e.g. `STATIC`, `TARGETING_MATCH`, `ERROR`)
74+
75+
### gRPC Sync Metrics
76+
77+
flagd instruments the [gRPC sync service](./grpc-sync-service.md) with standard RPC metrics and custom sync-specific metrics.
78+
79+
#### Standard RPC metrics
80+
81+
flagd uses the [OpenTelemetry Semantic Conventions for RPC](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc):
82+
83+
- `rpc.server.duration` - Measures the duration of inbound RPC calls (ms)
84+
- `rpc.server.request.size` - Measures the size of RPC request messages (bytes)
85+
- `rpc.server.response.size` - Measures the size of RPC response messages (bytes)
86+
- `rpc.server.requests_per_rpc` - Measures the number of requests received per RPC
87+
- `rpc.server.responses_per_rpc` - Measures the number of responses sent per RPC
88+
89+
**Attributes:**
90+
91+
- `rpc.system` - The RPC system (always `grpc`)
92+
- `rpc.service` - The fully-qualified RPC service name (e.g. `flagd.sync.v1.FlagSyncService`)
93+
- `rpc.method` - The RPC method name (e.g. `SyncFlags`, `FetchAllFlags`)
94+
- `rpc.grpc.status_code` - The gRPC status code (e.g. `OK`, `CANCELLED`, `DEADLINE_EXCEEDED`)
95+
96+
#### Custom sync metrics
5897

59-
### HTTP Metric Attributes
98+
- `feature_flag.flagd.sync.active_streams` - Measures the number of currently active gRPC sync streaming connections
99+
- `feature_flag.flagd.sync.stream.duration` - Measures the duration of gRPC sync streaming connections (seconds). Histogram buckets: 30s, 1min, 2min, 5min, 8min, 10min, 20min, 30min, 1h, 3h.
60100

61-
flagd uses the following OpenTelemetry Semantic Conventions for HTTP metrics:
101+
**Attributes:**
62102

63-
- `service.name` - The name of the service
64-
- `http.route` - The matched route (path template)
65-
- `http.request.method` - The HTTP request method (GET, POST, etc.)
66-
- `http.response.status_code` - The HTTP response status code
67-
- `url.scheme` - The URI scheme (http or https)
103+
- `selector` - The selector expression used by the sync stream, when specified in the request
104+
- `provider_id` - The provider ID of the connecting client, when specified in the request
105+
- `reason` - Stream exit reason: `normal_close`, `deadline_exceeded`, `client_disconnect`, or `error` (on `stream.duration` only)
68106

69107
## Traces
70108

0 commit comments

Comments
 (0)