OpenAI v2: completion hook support and minor fixes#4315
OpenAI v2: completion hook support and minor fixes#4315lmolkova wants to merge 6 commits intoopen-telemetry:mainfrom
Conversation
lmolkova
commented
Mar 7, 2026
- Call completion hook in telemetry handler (if passed in constructor)
- Add should_record_content method to handler (so that instrumentations don't need to implement their own)
- Test against latest OpenAI and fix issues (new Omit type for values that were not provided)
7271e7e to
fe0c61d
Compare
MikeGoldsmith
left a comment
There was a problem hiding this comment.
Sorry for delay getting this reviewed. It got lost and wasn't added to the PR review board.
|
I'm going to put it on hold and rebase on top of #4315 given the amount of conflicts it's going to introduce. |
Nik-Reddy
left a comment
There was a problem hiding this comment.
Reviewed thoroughly - this is a clean and well-structured refactor. A few observations:
What I like:
- Moving
should_capture_content()intoTelemetryHandleris the right call. Individual instrumentations (like the Cohere one I'm adding in #4418) shouldn't need to manage content capture logic themselves - this centralizes it properly. - The
_maybe_build_llm_event_recordrename (from_maybe_emit_llm_event) is a good separation of concerns - building the record vs emitting it lets the completion hook inspect it first. - Handling
openai.Omitinvalue_is_set()is a nice forward-compatibility fix for newer OpenAI SDK versions. _openai_response_format_to_output_typecorrectly mapsjson_object/json_schematojson.
One question:
In the TelemetryHandler.__init__ content capture logic (lines ~164-177), the fallback for non-experimental mode checks OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT == "true" only. Should this also handle the newer span_only/event_only/span_and_event values for consistency, or is that intentionally left as experimental-only?
Note: This PR supersedes my #4417 (content capture fix) - I'll close that one and reference this instead. Happy to help test once merged.
LGTM overall.
0ddc569 to
6d21ce7
Compare
@Nik-Reddy it makes sense, update the PR to support new valued on legacy path |
There was a problem hiding this comment.
Pull request overview
Adds completion hook support to the GenAI telemetry utilities and wires it into the OpenAI v2 instrumentation so consumers can capture inputs/outputs (and optionally stamp log records) after an invocation completes.
Changes:
- Add
completion_hookplumbing toTelemetryHandlerand pass it through all invocation types; call the hook on inference/workflow completion. - Introduce
TelemetryHandler.should_capture_content()to centralize content-capture decisions for instrumentations. - Update OpenAI v2 instrumentation/tests for the hook, latest OpenAI SDK (
openai==2.26.0),Omithandling, andresponse_formatmapping.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| util/opentelemetry-util-genai/tests/test_handler_completion_hook.py | New unit tests for hook invocation, log_record passing, and should_capture_content() behavior. |
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/handler.py | Adds completion_hook param, computes capture policy, and exposes should_capture_content(). |
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/_inference_invocation.py | Calls completion hook and passes created log record before emission. |
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/_workflow_invocation.py | Calls completion hook when workflows finish. |
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/_invocation.py | Threads completion_hook through base invocation class. |
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/_tool_invocation.py | Plumbs completion_hook through tool invocation construction. |
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/_embedding_invocation.py | Plumbs completion_hook through embedding invocation construction. |
| util/opentelemetry-util-genai/CHANGELOG.md | Documents completion hook support in util package. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_completion_hook.py | New integration tests verifying hook invocation and content routing. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/requirements.latest.txt | Bumps OpenAI SDK used in “latest” tests to 2.26.0. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/conftest.py | Minor formatting-only fixture signature change. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/utils.py | Handles openai.Omit and normalizes response_format output types. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/patch.py | Uses handler.should_capture_content() in v2 wrappers. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/init.py | Loads hook (explicit kwarg preferred, else entry point) and passes into TelemetryHandler. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/CHANGELOG.md | Documents hook support and OpenAI v2 fixes. |
| instrumentation-genai/AGENTS.md | Adds guidance on initializing TelemetryHandler with completion hooks. |
| def _call_completion_hook( | ||
| self, | ||
| log_record: LogRecord | None, | ||
| ) -> None: | ||
| self._completion_hook.on_completion( | ||
| inputs=self.input_messages, | ||
| outputs=self.output_messages, | ||
| system_instruction=self.system_instruction, | ||
| span=self.span, | ||
| log_record=log_record, | ||
| ) |
There was a problem hiding this comment.
CompletionHook.on_completion() is invoked without any error isolation. If a user-provided hook raises, it will bubble out of invocation.stop()/fail(), potentially converting a successful OpenAI call into a traced failure (and re-raising to user code). Please wrap the hook call in a broad try/except and log at debug/exception level so instrumentation is not impacted by hook errors.
| def _call_completion_hook(self) -> None: | ||
| self._completion_hook.on_completion( | ||
| inputs=self.input_messages, | ||
| outputs=self.output_messages, | ||
| system_instruction=[], | ||
| span=self.span, | ||
| log_record=None, | ||
| ) |
There was a problem hiding this comment.
CompletionHook.on_completion() is called directly during workflow finalization. A hook exception will propagate out of invocation.stop()/fail(), which can break normal control flow for callers using workflows. Please catch and log any hook exceptions so hooks cannot interfere with span completion.
| if is_experimental_mode(): | ||
| content_enabled = ( | ||
| get_content_capturing_mode() != ContentCapturingMode.NO_CONTENT | ||
| ) | ||
| else: | ||
| content_enabled = os.environ.get( | ||
| OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT, "" | ||
| ).lower() in ( | ||
| "true", | ||
| "span_only", | ||
| "event_only", | ||
| "span_and_event", | ||
| ) | ||
| self._capture_content = content_enabled or not isinstance( | ||
| self._completion_hook, _NoOpCompletionHook | ||
| ) |
There was a problem hiding this comment.
In experimental mode, should_capture_content() currently treats any mode other than NO_CONTENT as requiring content capture. This ignores the existing should_capture_content_on_spans_in_experimental_mode() behavior (see instrumentation-anthropic/.../patch.py:68) which avoids capturing content for EVENT_ONLY when should_emit_event() is false (to prevent unnecessary conversion/retention of message content). Consider reusing that helper (or equivalent logic) here so OpenAI v2 and other instrumentations get consistent privacy/behavior when events are disabled.
| def should_capture_content(self) -> bool: | ||
| """Returns True if content should be captured. | ||
|
|
||
| Content is captured when the content capturing mode requires it, or | ||
| when a real completion hook is configured (not a no-op). | ||
| """ | ||
| return self._capture_content |
There was a problem hiding this comment.
TelemetryHandler is documented in the PR description as adding a should_record_content method, but the implementation introduces should_capture_content(). This mismatch can confuse downstream instrumentations/readers; either rename the method to match the intended API or update the PR description/docs to reflect the final name.
| def get_telemetry_handler( | ||
| tracer_provider: TracerProvider | None = None, | ||
| meter_provider: MeterProvider | None = None, | ||
| logger_provider: LoggerProvider | None = None, | ||
| completion_hook: CompletionHook | None = None, | ||
| ) -> TelemetryHandler: | ||
| """ | ||
| Returns a singleton TelemetryHandler instance. | ||
| """ | ||
| handler: TelemetryHandler | None = getattr( | ||
| get_telemetry_handler, "_default_handler", None | ||
| ) | ||
| if handler is None: | ||
| handler = TelemetryHandler( | ||
| tracer_provider=tracer_provider, | ||
| meter_provider=meter_provider, | ||
| logger_provider=logger_provider, | ||
| completion_hook=completion_hook, | ||
| ) | ||
| setattr(get_telemetry_handler, "_default_handler", handler) | ||
| return handler |
There was a problem hiding this comment.
get_telemetry_handler() is a singleton and will silently ignore a non-None completion_hook (and other providers) passed after the first call. With the new completion_hook parameter, this is particularly surprising and can lead to hooks not being applied. Consider either (a) documenting that only the first call’s arguments are used, (b) warning/raising if subsequent calls pass different non-None values, or (c) providing a separate non-singleton constructor helper.
| def __init__( | ||
| self, | ||
| # Individual components instead of TelemetryHandler to avoid a circular | ||
| # import between handler.py and the invocation modules. | ||
| tracer: Tracer, | ||
| metrics_recorder: InvocationMetricsRecorder, | ||
| logger: Logger, | ||
| completion_hook: CompletionHook, | ||
| operation_name: str, | ||
| span_name: str, | ||
| span_kind: SpanKind = SpanKind.CLIENT, | ||
| attributes: dict[str, Any] | None = None, | ||
| metric_attributes: dict[str, Any] | None = None, | ||
| ) -> None: | ||
| self._tracer = tracer | ||
| self._metrics_recorder = metrics_recorder | ||
| self._logger = logger | ||
| self._completion_hook = completion_hook | ||
| self._operation_name: str = operation_name |
There was a problem hiding this comment.
Adding a new required completion_hook positional parameter to GenAIInvocation.__init__ forces all public invocation types (Inference/Embedding/Tool/Workflow) to take this required arg as well, which is a backwards-incompatible API change for any consumer that constructs invocations directly (even if discouraged). To preserve compatibility (util guideline 1000003 §3), make completion_hook optional with a default no-op, or accept completion_hook: CompletionHook | None = None and coerce to a no-op internally.
|
|
||
| def _openai_response_format_to_output_type(response_format_type: str) -> str: | ||
| if response_format_type in ("json_object", "json_schema"): | ||
| return "json" |
There was a problem hiding this comment.
_openai_response_format_to_output_type() hardcodes the semconv output type as the string literal "json". For gen_ai.output.type the semantic conventions define a well-known value set; per instrumentation-genai guideline 1000002 §4, prefer using GenAIAttributes.GenAiOutputTypeValues.JSON.value (and other enum values) instead of raw strings.
| return "json" | |
| return GenAIAttributes.GenAiOutputTypeValues.JSON.value |