feat: OpenAI responses create instrumentation#4474
feat: OpenAI responses create instrumentation#4474eternalcuriouslearner wants to merge 22 commits intoopen-telemetry:mainfrom
Conversation
…es-create-instrumentation-first-part
There was a problem hiding this comment.
Pull request overview
This PR adds OpenTelemetry instrumentation for the OpenAI Responses API create method (sync + streaming) using the TelemetryHandler inference invocation lifecycle, along with VCR-based tests to validate spans/log behavior and response attribute extraction.
Changes:
- Add patching for
openai.resources.responses.responses.Responses.createto emit inference spans and capture request/response attributes. - Extend response extraction to cover request params, finish reasons, tool calls, reasoning parts, and cache token usage.
- Add a comprehensive
test_responses.pysuite plus VCR cassettes and supporting test utilities/fixtures.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/patch_responses.py | New wrapper for Responses.create that starts/stops/fails inference invocations and wraps streaming responses. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/init.py | Registers/unregisters the Responses create wrapper when latest experimental semconv mode is enabled and the SDK module exists. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/response_extractors.py | Adds request attribute handling, inference creation kwargs, and expanded output/finish-reason parsing (tool calls + reasoning). |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/response_wrappers.py | Updates stream wrapper lifecycle to use invocation stop()/fail() instead of handler callbacks; adjusts event handling and parse(). |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py | New test suite validating spans/logs for Responses create across streaming, errors, tool calls, reasoning tokens, and content capture modes. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_response_extractors.py | Updates extractor tests for new tool-call and reasoning output item mappings and finish-reason aggregation. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_response_wrappers.py | Updates wrapper tests to reflect invocation stop()/fail() API expectations. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_utils.py | Adds Responses tool definition helper and cache-token attribute assertions. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/conftest.py | Adds an instrument_event_only fixture to exercise event-only content capture behavior. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/cassettes/*.yaml | Adds VCR recordings for the new Responses tests. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/CHANGELOG.md | Documents the new Responses create instrumentation feature under Unreleased. |
| @pytest.mark.vcr() | ||
| def test_responses_create_streaming_delegates_response_attribute( | ||
| request, openai_client, instrument_no_content | ||
| ): | ||
| _skip_if_not_latest() | ||
|
|
||
| stream = openai_client.responses.create( | ||
| model=DEFAULT_MODEL, | ||
| instructions=SYSTEM_INSTRUCTIONS, | ||
| input="Say hi.", | ||
| stream=True, | ||
| ) | ||
|
|
||
| assert stream.response is not None | ||
| assert stream.response.status_code == 200 | ||
| assert stream.response.headers.get("x-request-id") is not None | ||
| stream.close() | ||
|
|
There was a problem hiding this comment.
The streaming test that closes the stream immediately (test_responses_create_streaming_delegates_response_attribute) doesn’t assert any telemetry outcome. To satisfy the GenAI instrumentation test requirements (stream closed early by the caller), please assert that exactly one span is finished and that it is finalized without being marked as an error (and/or clearly document the expected behavior for early-close).
Description
This PR adds instrumentation around OpenAI's Responses api's
createmethod.Fixes # (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
createfunction.Does This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.