feat: Send GenAI spans as V2 envelope items #6079
7 issues
code-review: Found 7 issues (2 high, 4 medium, 1 low)
High
Wrong event object passed to V2 span conversion causes missing span attributes - `sentry_sdk/client.py:1133-1135`
At line 1134, event (the original event parameter) is passed to _serialized_v1_span_to_serialized_v2_span instead of event_opt (the prepared event). The _serialized_v1_span_to_serialized_v2_span function extracts attributes like sentry.release, sentry.environment, sentry.sdk.name, sentry.sdk.version, user info, and segment info from the event object. Since _prepare_event enriches the event with SDK info (lines 808-811) and other metadata, using the original event will result in V2 GenAI spans missing these critical attributes.
Also found at:
tests/integrations/google_genai/test_google_genai.py:330
Test accesses spans[0]["data"] but capture_items("span") produces "attributes" key - `tests/tracing/test_misc.py:628`
The test was updated to use capture_items("span") but still accesses spans[0]["data"]. The capture_items fixture transforms span items to have an attributes key (see conftest.py lines 361-367), not data. This will cause a KeyError at runtime. Other tests in the codebase using capture_items("span") correctly access span["attributes"] (e.g., test_google_genai.py).
Also found at:
tests/integrations/google_genai/test_google_genai.py:812-814tests/integrations/pydantic_ai/test_pydantic_ai.py:832-833
Medium
Test uses incorrect field name 'attributes' instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`
The test change replaces "data" with "attributes" in the inline_data structure. The Google GenAI SDK uses data as the field name (see genai_types.Blob(data=..., mime_type=...) at line 1546 and other tests at lines 1765, 1806). The production code in sentry_sdk/ai/utils.py line 286 reads inline_data.get("data", ""), so this test input won't properly validate the data extraction logic since attributes is not a valid field.
Hardcoded SDK version will cause test failures on version bumps - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`
The test_text_generation function hardcodes "sentry.sdk.version": "2.58.0" while all other tests in this file and across the codebase use mock.ANY for this field. This will cause test failures when the SDK version is incremented. The inconsistency appears to be an oversight since the adjacent test_text_generation_streaming function correctly uses mock.ANY.
Also found at:
tests/integrations/langchain/test_langchain.py:1368
Test captures only transactions but asserts on spans, causing false positive - `tests/integrations/litellm/test_litellm.py:945`
The capture_items("transaction") call on line 945 only captures transaction-type items, but the test later filters for span-type items on line 1020 (spans = [item.payload for item in items if item.type == "span"]). Since no spans are captured, the spans list will always be empty, and the for span in spans loop will never execute any assertions. This means the test passes regardless of whether spans actually have the expected SPANDATA.GEN_AI_SYSTEM attribute.
Also found at:
tests/integrations/openai_agents/test_openai_agents.py:3560-3562
Test missing validation when handled_tool_call_exceptions=False - `tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496`
The original test validated the model_behaviour_error event when handled_tool_call_exceptions=False, but the refactored code removed the else branch entirely. This means when the test runs with handled_tool_call_exceptions=False, no event validation occurs, reducing test coverage for that code path.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:493-494
Low
Unused list comprehension result in test_langchain_embeddings_error_handling - `tests/integrations/langchain/test_langchain.py:1840-1844`
The list comprehension at lines 1840-1844 builds a list of error events but the result is not assigned to any variable or used in any assertion. This appears to be dead code that was previously an assignment or assertion, leaving the test without any meaningful validation of error handling behavior.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:1402
Duration: 19m 33s · Tokens: 14.3M in / 173.8k out · Cost: $20.35 (+extraction: $0.02, +merge: $0.01, +fix_gate: $0.02)
Annotations
Check failure on line 1135 in sentry_sdk/client.py
sentry-warden / warden: code-review
Wrong event object passed to V2 span conversion causes missing span attributes
At line 1134, `event` (the original event parameter) is passed to `_serialized_v1_span_to_serialized_v2_span` instead of `event_opt` (the prepared event). The `_serialized_v1_span_to_serialized_v2_span` function extracts attributes like `sentry.release`, `sentry.environment`, `sentry.sdk.name`, `sentry.sdk.version`, user info, and segment info from the event object. Since `_prepare_event` enriches the event with SDK info (lines 808-811) and other metadata, using the original `event` will result in V2 GenAI spans missing these critical attributes.
Check failure on line 330 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
[6WP-JUA] Wrong event object passed to V2 span conversion causes missing span attributes (additional location)
At line 1134, `event` (the original event parameter) is passed to `_serialized_v1_span_to_serialized_v2_span` instead of `event_opt` (the prepared event). The `_serialized_v1_span_to_serialized_v2_span` function extracts attributes like `sentry.release`, `sentry.environment`, `sentry.sdk.name`, `sentry.sdk.version`, user info, and segment info from the event object. Since `_prepare_event` enriches the event with SDK info (lines 808-811) and other metadata, using the original `event` will result in V2 GenAI spans missing these critical attributes.
Check failure on line 628 in tests/tracing/test_misc.py
sentry-warden / warden: code-review
Test accesses spans[0]["data"] but capture_items("span") produces "attributes" key
The test was updated to use `capture_items("span")` but still accesses `spans[0]["data"]`. The `capture_items` fixture transforms span items to have an `attributes` key (see conftest.py lines 361-367), not `data`. This will cause a KeyError at runtime. Other tests in the codebase using `capture_items("span")` correctly access `span["attributes"]` (e.g., test_google_genai.py).
Check failure on line 814 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
[ZQ3-EQ5] Test accesses spans[0]["data"] but capture_items("span") produces "attributes" key (additional location)
The test was updated to use `capture_items("span")` but still accesses `spans[0]["data"]`. The `capture_items` fixture transforms span items to have an `attributes` key (see conftest.py lines 361-367), not `data`. This will cause a KeyError at runtime. Other tests in the codebase using `capture_items("span")` correctly access `span["attributes"]` (e.g., test_google_genai.py).
Check failure on line 833 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
[ZQ3-EQ5] Test accesses spans[0]["data"] but capture_items("span") produces "attributes" key (additional location)
The test was updated to use `capture_items("span")` but still accesses `spans[0]["data"]`. The `capture_items` fixture transforms span items to have an `attributes` key (see conftest.py lines 361-367), not `data`. This will cause a KeyError at runtime. Other tests in the codebase using `capture_items("span")` correctly access `span["attributes"]` (e.g., test_google_genai.py).
Check warning on line 2153 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
Test uses incorrect field name 'attributes' instead of 'data' for inline_data
The test change replaces `"data"` with `"attributes"` in the inline_data structure. The Google GenAI SDK uses `data` as the field name (see `genai_types.Blob(data=..., mime_type=...)` at line 1546 and other tests at lines 1765, 1806). The production code in `sentry_sdk/ai/utils.py` line 286 reads `inline_data.get("data", "")`, so this test input won't properly validate the data extraction logic since `attributes` is not a valid field.
Check warning on line 523 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
Hardcoded SDK version will cause test failures on version bumps
The `test_text_generation` function hardcodes `"sentry.sdk.version": "2.58.0"` while all other tests in this file and across the codebase use `mock.ANY` for this field. This will cause test failures when the SDK version is incremented. The inconsistency appears to be an oversight since the adjacent `test_text_generation_streaming` function correctly uses `mock.ANY`.
Check warning on line 1368 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
[5PZ-XT4] Hardcoded SDK version will cause test failures on version bumps (additional location)
The `test_text_generation` function hardcodes `"sentry.sdk.version": "2.58.0"` while all other tests in this file and across the codebase use `mock.ANY` for this field. This will cause test failures when the SDK version is incremented. The inconsistency appears to be an oversight since the adjacent `test_text_generation_streaming` function correctly uses `mock.ANY`.
Check warning on line 945 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
Test captures only transactions but asserts on spans, causing false positive
The `capture_items("transaction")` call on line 945 only captures transaction-type items, but the test later filters for span-type items on line 1020 (`spans = [item.payload for item in items if item.type == "span"]`). Since no spans are captured, the `spans` list will always be empty, and the `for span in spans` loop will never execute any assertions. This means the test passes regardless of whether spans actually have the expected `SPANDATA.GEN_AI_SYSTEM` attribute.
Check warning on line 3562 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
[WNA-HYX] Test captures only transactions but asserts on spans, causing false positive (additional location)
The `capture_items("transaction")` call on line 945 only captures transaction-type items, but the test later filters for span-type items on line 1020 (`spans = [item.payload for item in items if item.type == "span"]`). Since no spans are captured, the `spans` list will always be empty, and the `for span in spans` loop will never execute any assertions. This means the test passes regardless of whether spans actually have the expected `SPANDATA.GEN_AI_SYSTEM` attribute.
Check warning on line 496 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
Test missing validation when handled_tool_call_exceptions=False
The original test validated the `model_behaviour_error` event when `handled_tool_call_exceptions=False`, but the refactored code removed the `else` branch entirely. This means when the test runs with `handled_tool_call_exceptions=False`, no event validation occurs, reducing test coverage for that code path.
Check warning on line 494 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
[53D-QFE] Test missing validation when handled_tool_call_exceptions=False (additional location)
The original test validated the `model_behaviour_error` event when `handled_tool_call_exceptions=False`, but the refactored code removed the `else` branch entirely. This means when the test runs with `handled_tool_call_exceptions=False`, no event validation occurs, reducing test coverage for that code path.