Skip to content

feat: OpenAI responses create instrumentation#4474

Open
eternalcuriouslearner wants to merge 22 commits intoopen-telemetry:mainfrom
eternalcuriouslearner:feat/openai-responses-create-instrumentation-first-part
Open

feat: OpenAI responses create instrumentation#4474
eternalcuriouslearner wants to merge 22 commits intoopen-telemetry:mainfrom
eternalcuriouslearner:feat/openai-responses-create-instrumentation-first-part

Conversation

@eternalcuriouslearner
Copy link
Copy Markdown
Contributor

Description

This PR adds instrumentation around OpenAI's Responses api's create method.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Created vcr based tests to verify the span creation for Responses api's create function.

Does This PR Require a Core Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds OpenTelemetry instrumentation for the OpenAI Responses API create method (sync + streaming) using the TelemetryHandler inference invocation lifecycle, along with VCR-based tests to validate spans/log behavior and response attribute extraction.

Changes:

  • Add patching for openai.resources.responses.responses.Responses.create to emit inference spans and capture request/response attributes.
  • Extend response extraction to cover request params, finish reasons, tool calls, reasoning parts, and cache token usage.
  • Add a comprehensive test_responses.py suite plus VCR cassettes and supporting test utilities/fixtures.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/patch_responses.py New wrapper for Responses.create that starts/stops/fails inference invocations and wraps streaming responses.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/init.py Registers/unregisters the Responses create wrapper when latest experimental semconv mode is enabled and the SDK module exists.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/response_extractors.py Adds request attribute handling, inference creation kwargs, and expanded output/finish-reason parsing (tool calls + reasoning).
instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/response_wrappers.py Updates stream wrapper lifecycle to use invocation stop()/fail() instead of handler callbacks; adjusts event handling and parse().
instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_responses.py New test suite validating spans/logs for Responses create across streaming, errors, tool calls, reasoning tokens, and content capture modes.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_response_extractors.py Updates extractor tests for new tool-call and reasoning output item mappings and finish-reason aggregation.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_response_wrappers.py Updates wrapper tests to reflect invocation stop()/fail() API expectations.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_utils.py Adds Responses tool definition helper and cache-token attribute assertions.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/conftest.py Adds an instrument_event_only fixture to exercise event-only content capture behavior.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/cassettes/*.yaml Adds VCR recordings for the new Responses tests.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/CHANGELOG.md Documents the new Responses create instrumentation feature under Unreleased.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 1 comment.

Comment on lines +474 to +491
@pytest.mark.vcr()
def test_responses_create_streaming_delegates_response_attribute(
request, openai_client, instrument_no_content
):
_skip_if_not_latest()

stream = openai_client.responses.create(
model=DEFAULT_MODEL,
instructions=SYSTEM_INSTRUCTIONS,
input="Say hi.",
stream=True,
)

assert stream.response is not None
assert stream.response.status_code == 200
assert stream.response.headers.get("x-request-id") is not None
stream.close()

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The streaming test that closes the stream immediately (test_responses_create_streaming_delegates_response_attribute) doesn’t assert any telemetry outcome. To satisfy the GenAI instrumentation test requirements (stream closed early by the caller), please assert that exactly one span is finished and that it is finalized without being marked as an error (and/or clearly document the expected behavior for early-close).

Copilot generated this review using guidance from repository custom instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants