|
1 | 1 | --- |
2 | | -description: monitoring and telemetry flagd and flagd providers |
| 2 | +description: monitoring and telemetry for flagd HTTP, gRPC sync, and flagd providers |
3 | 3 | --- |
4 | 4 |
|
5 | 5 | # Monitoring |
@@ -45,26 +45,64 @@ Given below is the current implementation overview of flagd telemetry internals, |
45 | 45 |
|
46 | 46 | ## Metrics |
47 | 47 |
|
48 | | -flagd exposes the following metrics: |
| 48 | +> Please note that metric names may vary based on the consuming monitoring tool naming requirements. |
| 49 | +> For example, the transformation of OTLP metrics to Prometheus is described [here](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#otlp-metric-points-to-prometheus). |
| 50 | +
|
| 51 | +### HTTP Metrics |
| 52 | + |
| 53 | +These metrics apply to both the [flag evaluation](./specifications/protos.md) and [OFREP](./flagd-ofrep.md) endpoints. flagd uses the [OpenTelemetry Semantic Conventions for HTTP](https://opentelemetry.io/docs/specs/semconv/http/http-metrics/): |
| 54 | + |
| 55 | +- `http.server.request.duration` - Measures the duration of inbound HTTP requests (seconds). Histogram buckets: 5ms, 10ms, 25ms, 50ms, 75ms, 100ms, 250ms, 500ms, 750ms, 1s, 2.5s, 5s, 7.5s, 10s. |
| 56 | +- `http.server.request.body.size` - Measures the size of HTTP request messages (bytes) |
| 57 | +- `http.server.response.body.size` - Measures the size of HTTP response messages (bytes) |
| 58 | + |
| 59 | +For the full list of attributes on these metrics, see the [OpenTelemetry HTTP Server Metrics](https://opentelemetry.io/docs/specs/semconv/http/http-metrics/#http-server) semantic conventions. |
| 60 | + |
| 61 | +### Flag Evaluation Metrics |
| 62 | + |
| 63 | +These metrics are recorded on every [flag evaluation](./specifications/protos.md), regardless of transport (HTTP, gRPC, connect). Attribute names are inspired by the [OpenTelemetry Semantic Conventions for Feature Flags](https://opentelemetry.io/docs/specs/semconv/feature-flags/feature-flags-events/): |
49 | 64 |
|
50 | | -- `http.server.request.duration` - Measures the duration of inbound HTTP requests |
51 | | -- `http.server.response.body.size` - Measures the size of HTTP response messages |
52 | | -- `http.server.active_requests` - Measures the number of concurrent HTTP requests that are currently in-flight |
53 | 65 | - `feature_flag.flagd.impression` - Measures the number of evaluations for a given flag |
54 | 66 | - `feature_flag.flagd.result.reason` - Measures the number of evaluations for a given reason |
55 | 67 |
|
56 | | -> Please note that metric names may vary based on the consuming monitoring tool naming requirements. |
57 | | -> For example, the transformation of OTLP metrics to Prometheus is described [here](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md#otlp-metric-points-to-prometheus). |
| 68 | +**Attributes:** |
| 69 | + |
| 70 | +- `feature_flag.key` - The flag key being evaluated |
| 71 | +- `feature_flag.result.variant` - The variant returned by the evaluation |
| 72 | +- `feature_flag.provider.name` - The feature flag provider name (always `flagd`) |
| 73 | +- `feature_flag.reason` - The evaluation reason (e.g. `STATIC`, `TARGETING_MATCH`, `ERROR`) |
| 74 | + |
| 75 | +### gRPC Sync Metrics |
| 76 | + |
| 77 | +flagd instruments the [gRPC sync service](./grpc-sync-service.md) with standard RPC metrics and custom sync-specific metrics. |
| 78 | + |
| 79 | +#### Standard RPC metrics |
| 80 | + |
| 81 | +flagd uses the [OpenTelemetry Semantic Conventions for RPC](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc): |
| 82 | + |
| 83 | +- `rpc.server.duration` - Measures the duration of inbound RPC calls (ms) |
| 84 | +- `rpc.server.request.size` - Measures the size of RPC request messages (bytes) |
| 85 | +- `rpc.server.response.size` - Measures the size of RPC response messages (bytes) |
| 86 | +- `rpc.server.requests_per_rpc` - Measures the number of requests received per RPC |
| 87 | +- `rpc.server.responses_per_rpc` - Measures the number of responses sent per RPC |
| 88 | + |
| 89 | +**Attributes:** |
| 90 | + |
| 91 | +- `rpc.system` - The RPC system (always `grpc`) |
| 92 | +- `rpc.service` - The fully-qualified RPC service name (e.g. `flagd.sync.v1.FlagSyncService`) |
| 93 | +- `rpc.method` - The RPC method name (e.g. `SyncFlags`, `FetchAllFlags`) |
| 94 | +- `rpc.grpc.status_code` - The gRPC status code (e.g. `OK`, `CANCELLED`, `DEADLINE_EXCEEDED`) |
| 95 | + |
| 96 | +#### Custom sync metrics |
58 | 97 |
|
59 | | -### HTTP Metric Attributes |
| 98 | +- `feature_flag.flagd.sync.active_streams` - Measures the number of currently active gRPC sync streaming connections |
| 99 | +- `feature_flag.flagd.sync.stream.duration` - Measures the duration of gRPC sync streaming connections (seconds). Histogram buckets: 30s, 1min, 2min, 5min, 8min, 10min, 20min, 30min, 1h, 3h. |
60 | 100 |
|
61 | | -flagd uses the following OpenTelemetry Semantic Conventions for HTTP metrics: |
| 101 | +**Attributes:** |
62 | 102 |
|
63 | | -- `service.name` - The name of the service |
64 | | -- `http.route` - The matched route (path template) |
65 | | -- `http.request.method` - The HTTP request method (GET, POST, etc.) |
66 | | -- `http.response.status_code` - The HTTP response status code |
67 | | -- `url.scheme` - The URI scheme (http or https) |
| 103 | +- `selector` - The selector expression used by the sync stream, when specified in the request |
| 104 | +- `provider_id` - The provider ID of the connecting client, when specified in the request |
| 105 | +- `reason` - Stream exit reason: `normal_close`, `deadline_exceeded`, `client_disconnect`, or `error` (on `stream.duration` only) |
68 | 106 |
|
69 | 107 | ## Traces |
70 | 108 |
|
|
0 commit comments