Skip to content

fix openai-agents tests

41e409d
Select commit
Loading
Failed to load commit list.
Draft

feat: Send GenAI spans as V2 envelope items #6079

fix openai-agents tests
41e409d
Select commit
Loading
Failed to load commit list.
@sentry/warden / warden: code-review completed Apr 17, 2026 in 19m 49s

7 issues

code-review: Found 7 issues (2 high, 4 medium, 1 low)

High

Wrong event object passed to V2 span conversion causes missing span attributes - `sentry_sdk/client.py:1133-1135`

At line 1134, event (the original event parameter) is passed to _serialized_v1_span_to_serialized_v2_span instead of event_opt (the prepared event). The _serialized_v1_span_to_serialized_v2_span function extracts attributes like sentry.release, sentry.environment, sentry.sdk.name, sentry.sdk.version, user info, and segment info from the event object. Since _prepare_event enriches the event with SDK info (lines 808-811) and other metadata, using the original event will result in V2 GenAI spans missing these critical attributes.

Also found at:

  • tests/integrations/google_genai/test_google_genai.py:330
Test accesses spans[0]["data"] but capture_items("span") produces "attributes" key - `tests/tracing/test_misc.py:628`

The test was updated to use capture_items("span") but still accesses spans[0]["data"]. The capture_items fixture transforms span items to have an attributes key (see conftest.py lines 361-367), not data. This will cause a KeyError at runtime. Other tests in the codebase using capture_items("span") correctly access span["attributes"] (e.g., test_google_genai.py).

Also found at:

  • tests/integrations/google_genai/test_google_genai.py:812-814
  • tests/integrations/pydantic_ai/test_pydantic_ai.py:832-833

Medium

Test uses incorrect field name 'attributes' instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`

The test change replaces "data" with "attributes" in the inline_data structure. The Google GenAI SDK uses data as the field name (see genai_types.Blob(data=..., mime_type=...) at line 1546 and other tests at lines 1765, 1806). The production code in sentry_sdk/ai/utils.py line 286 reads inline_data.get("data", ""), so this test input won't properly validate the data extraction logic since attributes is not a valid field.

Hardcoded SDK version will cause test failures on version bumps - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

The test_text_generation function hardcodes "sentry.sdk.version": "2.58.0" while all other tests in this file and across the codebase use mock.ANY for this field. This will cause test failures when the SDK version is incremented. The inconsistency appears to be an oversight since the adjacent test_text_generation_streaming function correctly uses mock.ANY.

Also found at:

  • tests/integrations/langchain/test_langchain.py:1368
Test captures only transactions but asserts on spans, causing false positive - `tests/integrations/litellm/test_litellm.py:945`

The capture_items("transaction") call on line 945 only captures transaction-type items, but the test later filters for span-type items on line 1020 (spans = [item.payload for item in items if item.type == "span"]). Since no spans are captured, the spans list will always be empty, and the for span in spans loop will never execute any assertions. This means the test passes regardless of whether spans actually have the expected SPANDATA.GEN_AI_SYSTEM attribute.

Also found at:

  • tests/integrations/openai_agents/test_openai_agents.py:3560-3562
Test missing validation when handled_tool_call_exceptions=False - `tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496`

The original test validated the model_behaviour_error event when handled_tool_call_exceptions=False, but the refactored code removed the else branch entirely. This means when the test runs with handled_tool_call_exceptions=False, no event validation occurs, reducing test coverage for that code path.

Also found at:

  • tests/integrations/pydantic_ai/test_pydantic_ai.py:493-494

Low

Unused list comprehension result in test_langchain_embeddings_error_handling - `tests/integrations/langchain/test_langchain.py:1840-1844`

The list comprehension at lines 1840-1844 builds a list of error events but the result is not assigned to any variable or used in any assertion. This appears to be dead code that was previously an assignment or assertion, leaving the test without any meaningful validation of error handling behavior.

Also found at:

  • tests/integrations/pydantic_ai/test_pydantic_ai.py:1402

Duration: 19m 33s · Tokens: 14.3M in / 173.8k out · Cost: $20.35 (+extraction: $0.02, +merge: $0.01, +fix_gate: $0.02)

Annotations

Check failure on line 1135 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Wrong event object passed to V2 span conversion causes missing span attributes

At line 1134, `event` (the original event parameter) is passed to `_serialized_v1_span_to_serialized_v2_span` instead of `event_opt` (the prepared event). The `_serialized_v1_span_to_serialized_v2_span` function extracts attributes like `sentry.release`, `sentry.environment`, `sentry.sdk.name`, `sentry.sdk.version`, user info, and segment info from the event object. Since `_prepare_event` enriches the event with SDK info (lines 808-811) and other metadata, using the original `event` will result in V2 GenAI spans missing these critical attributes.

Check failure on line 330 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[6WP-JUA] Wrong event object passed to V2 span conversion causes missing span attributes (additional location)

At line 1134, `event` (the original event parameter) is passed to `_serialized_v1_span_to_serialized_v2_span` instead of `event_opt` (the prepared event). The `_serialized_v1_span_to_serialized_v2_span` function extracts attributes like `sentry.release`, `sentry.environment`, `sentry.sdk.name`, `sentry.sdk.version`, user info, and segment info from the event object. Since `_prepare_event` enriches the event with SDK info (lines 808-811) and other metadata, using the original `event` will result in V2 GenAI spans missing these critical attributes.

Check failure on line 628 in tests/tracing/test_misc.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Test accesses spans[0]["data"] but capture_items("span") produces "attributes" key

The test was updated to use `capture_items("span")` but still accesses `spans[0]["data"]`. The `capture_items` fixture transforms span items to have an `attributes` key (see conftest.py lines 361-367), not `data`. This will cause a KeyError at runtime. Other tests in the codebase using `capture_items("span")` correctly access `span["attributes"]` (e.g., test_google_genai.py).

Check failure on line 814 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[ZQ3-EQ5] Test accesses spans[0]["data"] but capture_items("span") produces "attributes" key (additional location)

The test was updated to use `capture_items("span")` but still accesses `spans[0]["data"]`. The `capture_items` fixture transforms span items to have an `attributes` key (see conftest.py lines 361-367), not `data`. This will cause a KeyError at runtime. Other tests in the codebase using `capture_items("span")` correctly access `span["attributes"]` (e.g., test_google_genai.py).

Check failure on line 833 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[ZQ3-EQ5] Test accesses spans[0]["data"] but capture_items("span") produces "attributes" key (additional location)

The test was updated to use `capture_items("span")` but still accesses `spans[0]["data"]`. The `capture_items` fixture transforms span items to have an `attributes` key (see conftest.py lines 361-367), not `data`. This will cause a KeyError at runtime. Other tests in the codebase using `capture_items("span")` correctly access `span["attributes"]` (e.g., test_google_genai.py).

Check warning on line 2153 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Test uses incorrect field name 'attributes' instead of 'data' for inline_data

The test change replaces `"data"` with `"attributes"` in the inline_data structure. The Google GenAI SDK uses `data` as the field name (see `genai_types.Blob(data=..., mime_type=...)` at line 1546 and other tests at lines 1765, 1806). The production code in `sentry_sdk/ai/utils.py` line 286 reads `inline_data.get("data", "")`, so this test input won't properly validate the data extraction logic since `attributes` is not a valid field.

Check warning on line 523 in tests/integrations/huggingface_hub/test_huggingface_hub.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Hardcoded SDK version will cause test failures on version bumps

The `test_text_generation` function hardcodes `"sentry.sdk.version": "2.58.0"` while all other tests in this file and across the codebase use `mock.ANY` for this field. This will cause test failures when the SDK version is incremented. The inconsistency appears to be an oversight since the adjacent `test_text_generation_streaming` function correctly uses `mock.ANY`.

Check warning on line 1368 in tests/integrations/langchain/test_langchain.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[5PZ-XT4] Hardcoded SDK version will cause test failures on version bumps (additional location)

The `test_text_generation` function hardcodes `"sentry.sdk.version": "2.58.0"` while all other tests in this file and across the codebase use `mock.ANY` for this field. This will cause test failures when the SDK version is incremented. The inconsistency appears to be an oversight since the adjacent `test_text_generation_streaming` function correctly uses `mock.ANY`.

Check warning on line 945 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Test captures only transactions but asserts on spans, causing false positive

The `capture_items("transaction")` call on line 945 only captures transaction-type items, but the test later filters for span-type items on line 1020 (`spans = [item.payload for item in items if item.type == "span"]`). Since no spans are captured, the `spans` list will always be empty, and the `for span in spans` loop will never execute any assertions. This means the test passes regardless of whether spans actually have the expected `SPANDATA.GEN_AI_SYSTEM` attribute.

Check warning on line 3562 in tests/integrations/openai_agents/test_openai_agents.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[WNA-HYX] Test captures only transactions but asserts on spans, causing false positive (additional location)

The `capture_items("transaction")` call on line 945 only captures transaction-type items, but the test later filters for span-type items on line 1020 (`spans = [item.payload for item in items if item.type == "span"]`). Since no spans are captured, the `spans` list will always be empty, and the `for span in spans` loop will never execute any assertions. This means the test passes regardless of whether spans actually have the expected `SPANDATA.GEN_AI_SYSTEM` attribute.

Check warning on line 496 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Test missing validation when handled_tool_call_exceptions=False

The original test validated the `model_behaviour_error` event when `handled_tool_call_exceptions=False`, but the refactored code removed the `else` branch entirely. This means when the test runs with `handled_tool_call_exceptions=False`, no event validation occurs, reducing test coverage for that code path.

Check warning on line 494 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[53D-QFE] Test missing validation when handled_tool_call_exceptions=False (additional location)

The original test validated the `model_behaviour_error` event when `handled_tool_call_exceptions=False`, but the refactored code removed the `else` branch entirely. This means when the test runs with `handled_tool_call_exceptions=False`, no event validation occurs, reducing test coverage for that code path.