[opentelemetry-instrumentation-grpc] Add support for metrics#4621
[opentelemetry-instrumentation-grpc] Add support for metrics#4621lorenzoronzani wants to merge 9 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds OpenTelemetry RPC duration metrics to the opentelemetry-instrumentation-grpc package (sync + asyncio), aligning with the RPC metrics semantic conventions by emitting rpc.client.call.duration and rpc.server.call.duration histograms and validating them via new unit tests.
Changes:
- Added duration histogram creation and recording to gRPC client/server interceptors (sync +
grpc.aio), including metric attributes likerpc.system.name,rpc.method, andrpc.response.status_code. - Extended public interceptor factories to accept
meter_provider, and plumbed meter/target information into client interceptors forserver.address/server.port. - Added new metric-focused test suites for sync and asyncio client/server paths, plus a changelog entry.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| instrumentation/opentelemetry-instrumentation-grpc/src/opentelemetry/instrumentation/grpc/_server.py | Records rpc.server.call.duration with semconv attributes. |
| instrumentation/opentelemetry-instrumentation-grpc/src/opentelemetry/instrumentation/grpc/_client.py | Records rpc.client.call.duration, parses channel target for server attributes. |
| instrumentation/opentelemetry-instrumentation-grpc/src/opentelemetry/instrumentation/grpc/_aio_server.py | Adds duration recording to aio server interceptor paths. |
| instrumentation/opentelemetry-instrumentation-grpc/src/opentelemetry/instrumentation/grpc/_aio_client.py | Adds duration recording hooks for aio client interceptor paths. |
| instrumentation/opentelemetry-instrumentation-grpc/src/opentelemetry/instrumentation/grpc/init.py | Exposes meter_provider/target plumbing for interceptor factories and client instrumentors. |
| instrumentation/opentelemetry-instrumentation-grpc/tests/test_server_interceptor_metrics.py | New sync server duration metric tests (OK + error + streaming). |
| instrumentation/opentelemetry-instrumentation-grpc/tests/test_client_interceptor_metrics.py | New sync client duration metric tests (OK + error + streaming). |
| instrumentation/opentelemetry-instrumentation-grpc/tests/test_aio_server_interceptor_metrics.py | New aio server duration metric tests. |
| instrumentation/opentelemetry-instrumentation-grpc/tests/test_aio_client_interceptor_metrics.py | New aio client duration metric tests. |
| .changelog/4621.added | Changelog entry for adding gRPC RPC duration metrics. |
Comments suppressed due to low confidence (3)
instrumentation/opentelemetry-instrumentation-grpc/src/opentelemetry/instrumentation/grpc/_client.py:239
- If
invoker(...)raises a non-grpc.RpcErrorexception (e.g., a local serialization / interceptor error before the RPC completes),status_coderemainsStatusCode.OK, so the metric will incorrectly report an OK response. Consider mapping non-grpc.RpcErrorfailures togrpc.StatusCode.UNKNOWN(or skip recording) sorpc.response.status_code/error.typereflect failure.
except Exception as exc:
if isinstance(exc, grpc.RpcError):
status_code = exc.code()
span.set_attribute(
RPC_GRPC_STATUS_CODE,
status_code.value[0],
)
span.set_status(
Status(
status_code=StatusCode.ERROR,
description=f"{type(exc).__name__}: {exc}",
)
)
span.record_exception(exc)
instrumentation/opentelemetry-instrumentation-grpc/src/opentelemetry/instrumentation/grpc/_server.py:402
- Same as unary path: on an uncaught exception in a streaming handler, the wrapped context’s status code will remain OK, but gRPC will return UNKNOWN. This will cause
rpc.server.call.durationto be tagged withrpc.response.status_code=OK. Consider ensuring a non-OK status (e.g., UNKNOWN) is recorded when exceptions escape the handler.
self._record_duration(
handler_call_details,
start_time,
context._code,
)
instrumentation/opentelemetry-instrumentation-grpc/src/opentelemetry/instrumentation/grpc/_aio_server.py:160
- In the async streaming server interceptor, uncaught exceptions will leave
context._self_codeas OK, so the duration metric will be tagged withrpc.response.status_code=OKeven though gRPC returns UNKNOWN. Consider updating the wrapped context code on exception before recording metrics.
except Exception as error:
# pylint:disable=unidiomatic-typecheck
if type(error) != Exception: # noqa: E721
span.record_exception(error)
raise error
finally:
self._record_duration(
handler_call_details,
start_time,
context._self_code,
)
| if result is None: | ||
| span.end() | ||
| self._record_duration( | ||
| client_info.full_method, start_time, status_code | ||
| ) |
| self._record_duration( | ||
| handler_call_details, | ||
| start_time, | ||
| context._code, | ||
| ) |
| except Exception as error: | ||
| # Bare exceptions are likely to be gRPC aborts, which | ||
| # we handle in our context wrapper. | ||
| # Here, we're interested in uncaught exceptions. | ||
| # pylint:disable=unidiomatic-typecheck | ||
| if type(error) != Exception: # noqa: E721 | ||
| span.record_exception(error) | ||
| raise error | ||
|
|
||
| finally: | ||
| self._record_duration( | ||
| handler_call_details, | ||
| start_time, | ||
| context._self_code, | ||
| ) |
| def wrapper_fn(self, original_func, instance, args, kwargs): | ||
| channel = original_func(*args, **kwargs) | ||
| tracer_provider = kwargs.get("tracer_provider") | ||
| request_hook = self._request_hook | ||
| response_hook = self._response_hook | ||
| target = args[0] if args else None | ||
| return intercept_channel( | ||
| channel, | ||
| client_interceptor( | ||
| tracer_provider=tracer_provider, | ||
| tracer_provider=self._tracer_provider, | ||
| filter_=self._filter, | ||
| request_hook=request_hook, | ||
| response_hook=response_hook, | ||
| request_hook=self._request_hook, | ||
| response_hook=self._response_hook, | ||
| meter_provider=self._meter_provider, | ||
| target=target, | ||
| ), |
| def insecure(*args, **kwargs): | ||
| kwargs = self._add_interceptors(tracer_provider, kwargs) | ||
|
|
||
| target = args[0] if args else None | ||
| kwargs = self._add_interceptors( | ||
| tracer_provider, meter_provider, target, kwargs | ||
| ) | ||
| return self._original_insecure(*args, **kwargs) | ||
|
|
||
| def secure(*args, **kwargs): | ||
| kwargs = self._add_interceptors(tracer_provider, kwargs) | ||
|
|
||
| target = args[0] if args else None | ||
| kwargs = self._add_interceptors( | ||
| tracer_provider, meter_provider, target, kwargs | ||
| ) | ||
| return self._original_secure(*args, **kwargs) | ||
|
|
| meter = get_meter( | ||
| __name__, | ||
| __version__, | ||
| meter_provider, | ||
| ) |
| def test_unary_call_records_duration_metric(self): | ||
| """A unary client RPC produces an rpc.client.call.duration histogram.""" | ||
| simple_method(self._stub) | ||
|
|
||
| metrics = self.get_sorted_metrics() | ||
| duration_metric = next( | ||
| (m for m in metrics if m.name == RPC_CLIENT_CALL_DURATION), | ||
| None, | ||
| ) |
|
Just an update: I am integrating comments, unfortunately I didn't have time to do that before. |
Description
I am adding metrics support inside gRPC servers and client.
I am following semantic conventions.
Server & Client:
meter_providerto create histogram.I edited also aio components to integrate new edits.
Fixes # (issue)
Issue 3375
Type of change
How Has This Been Tested?
I am in a system that doesn't allow me to have all required packages, I found this solution to run my tests.
uv run --no-sync pytest instrumentation/opentelemetry-instrumentation-grpc/tests/ --ignore=instrumentation/opentelemetry-instrumentation-grpc/tests/protobuf/ -qDoes This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.