feat: Send GenAI spans as V2 envelope items #6079
9 issues
find-bugs: Found 9 issues (1 high, 7 medium, 1 low)
High
GenAI span conversion uses unprocessed `event` instead of `event_opt` - `sentry_sdk/client.py:1130`
On line 1130, _serialized_v1_span_to_serialized_v2_span(span, event) passes event (the original input parameter) instead of event_opt (the processed event). The _prepare_event method enriches the event with release, environment, sdk info, user data, and trace context. V2 spans converted from genAI spans will be missing this enriched data, causing inconsistency between the transaction and its extracted spans.
Also found at:
sentry_sdk/client.py:230
Medium
Test uses incorrect 'attributes' field instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`
The test input was changed from {"inline_data": {"data": b"binary_data", ...}} to {"inline_data": {"attributes": b"binary_data", ...}}. However, the transform_google_content_part function in sentry_sdk/ai/utils.py (line 286) expects inline_data.get("data", ""), not attributes. This means the test will no longer properly validate the inline_data handling, as the actual binary data will be ignored and an empty string will be used instead. Other tests in this file (lines 1765, 1806, 1851) correctly use "data".
Also found at:
tests/integrations/google_genai/test_google_genai.py:330
Hardcoded SDK version will cause test failures on version bump - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`
In test_text_generation, the sentry.sdk.version expected value is hardcoded to "2.58.0" instead of using mock.ANY. All other tests in this file and similar tests in other integration test files use mock.ANY for this field. This test will fail when the SDK version is bumped.
test_multiple_providers fails to capture spans, making provider detection assertion ineffective - `tests/integrations/litellm/test_litellm.py:945`
On line 945, capture_items("transaction") only captures transaction items, but the test later (line 1020-1023) attempts to filter for spans and assert that each span has SPANDATA.GEN_AI_SYSTEM in its attributes. Since spans are not captured, the spans list will always be empty, and the for span in spans: loop never executes any assertions. This makes the provider detection verification ineffective - the test will pass regardless of whether spans are correctly captured.
Also found at:
tests/integrations/litellm/test_litellm.py:1020-1023
Unsafe dict access could cause KeyError in test_no_integration when spans lack sentry.op attribute - `tests/integrations/litellm/test_litellm.py:1283-1284`
The test_no_integration and test_async_no_integration tests filter spans using direct dict access x["attributes"]["sentry.op"] which will raise KeyError if any captured span doesn't have the sentry.op attribute. When LiteLLM integration is not enabled, other default integrations might produce spans without this attribute. Other tests in this codebase use the safer .get() pattern (e.g., span["attributes"].get("sentry.op")). The tests would fail with KeyError instead of asserting that no LiteLLM chat spans exist.
Also found at:
tests/integrations/litellm/test_litellm.py:1330-1331
Test checks wrong key 'attributes' instead of 'data' for transaction trace context - `tests/integrations/openai_agents/test_openai_agents.py:3560-3562`
The assertion at line 3560-3561 checks transaction["contexts"]["trace"].get("attributes", {}) but transactions store span data in data, not attributes. Other tests in the same file (lines 3359, 3497) correctly use ["data"] to access gen_ai.conversation.id on transactions. This causes the test to always pass (false negative) even if gen_ai.conversation.id is incorrectly set in the transaction's data field.
Missing else branch drops validation for handled_tool_call_exceptions=False case - `tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496`
In test_agent_with_tool_validation_error, the old code had an else branch that validated model_behaviour_error existed when handled_tool_call_exceptions=False. The new code removed this branch entirely, meaning the test no longer validates that the unhandled UnexpectedModelBehavior exception is captured when handled_tool_call_exceptions=False. This allows the test to pass even if Sentry fails to capture the expected error event.
test_message_history accesses spans from transaction instead of captured items, causing test to fail or pass vacuously - `tests/integrations/pydantic_ai/test_pydantic_ai.py:830-840`
The test_message_history function was incompletely migrated to V2 envelope format. At line 830, it retrieves spans via second_transaction["spans"] (old format), but then at line 832 accesses s["attributes"].get("sentry.op", "") (V2 format). In V2, spans are sent as separate envelope items and should be accessed via [item.payload for item in items if item.type == "span"]. The transaction object may not contain a "spans" key at all, causing a KeyError, or the nested spans may have the old format (using s["op"] and s["data"] instead of s["attributes"]), causing the filter to find no matches. All other tests in this diff correctly use the pattern spans = [item.payload for item in items if item.type == "span"].
Low
Unused list comprehension result - dead code in test - `tests/integrations/langchain/test_langchain.py:1840-1844`
The list comprehension at lines 1840-1844 builds a list of error events but the result is never assigned to a variable or used. This appears to be leftover code from a refactor. Additionally, the capture_items("transaction", "span") call at line 1821 doesn't include "event" type, so this comprehension would always produce an empty list anyway.
Duration: 41m 58s · Tokens: 19.2M in / 211.4k out · Cost: $27.87 (+extraction: $0.02, +merge: $0.00, +fix_gate: $0.02)
Annotations
Check failure on line 1130 in sentry_sdk/client.py
sentry-warden / warden: find-bugs
GenAI span conversion uses unprocessed `event` instead of `event_opt`
On line 1130, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes `event` (the original input parameter) instead of `event_opt` (the processed event). The `_prepare_event` method enriches the event with `release`, `environment`, `sdk` info, user data, and trace context. V2 spans converted from genAI spans will be missing this enriched data, causing inconsistency between the transaction and its extracted spans.
Check failure on line 230 in sentry_sdk/client.py
sentry-warden / warden: find-bugs
[W73-4Y5] GenAI span conversion uses unprocessed `event` instead of `event_opt` (additional location)
On line 1130, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes `event` (the original input parameter) instead of `event_opt` (the processed event). The `_prepare_event` method enriches the event with `release`, `environment`, `sdk` info, user data, and trace context. V2 spans converted from genAI spans will be missing this enriched data, causing inconsistency between the transaction and its extracted spans.
Check warning on line 2153 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: find-bugs
Test uses incorrect 'attributes' field instead of 'data' for inline_data
The test input was changed from `{"inline_data": {"data": b"binary_data", ...}}` to `{"inline_data": {"attributes": b"binary_data", ...}}`. However, the `transform_google_content_part` function in `sentry_sdk/ai/utils.py` (line 286) expects `inline_data.get("data", "")`, not `attributes`. This means the test will no longer properly validate the inline_data handling, as the actual binary data will be ignored and an empty string will be used instead. Other tests in this file (lines 1765, 1806, 1851) correctly use `"data"`.
Check warning on line 330 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: find-bugs
[6K2-FC5] Test uses incorrect 'attributes' field instead of 'data' for inline_data (additional location)
The test input was changed from `{"inline_data": {"data": b"binary_data", ...}}` to `{"inline_data": {"attributes": b"binary_data", ...}}`. However, the `transform_google_content_part` function in `sentry_sdk/ai/utils.py` (line 286) expects `inline_data.get("data", "")`, not `attributes`. This means the test will no longer properly validate the inline_data handling, as the actual binary data will be ignored and an empty string will be used instead. Other tests in this file (lines 1765, 1806, 1851) correctly use `"data"`.
Check warning on line 523 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: find-bugs
Hardcoded SDK version will cause test failures on version bump
In `test_text_generation`, the `sentry.sdk.version` expected value is hardcoded to `"2.58.0"` instead of using `mock.ANY`. All other tests in this file and similar tests in other integration test files use `mock.ANY` for this field. This test will fail when the SDK version is bumped.
Check warning on line 945 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
test_multiple_providers fails to capture spans, making provider detection assertion ineffective
On line 945, `capture_items("transaction")` only captures transaction items, but the test later (line 1020-1023) attempts to filter for spans and assert that each span has `SPANDATA.GEN_AI_SYSTEM` in its attributes. Since spans are not captured, the `spans` list will always be empty, and the `for span in spans:` loop never executes any assertions. This makes the provider detection verification ineffective - the test will pass regardless of whether spans are correctly captured.
Check warning on line 1023 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
[HT7-8WC] test_multiple_providers fails to capture spans, making provider detection assertion ineffective (additional location)
On line 945, `capture_items("transaction")` only captures transaction items, but the test later (line 1020-1023) attempts to filter for spans and assert that each span has `SPANDATA.GEN_AI_SYSTEM` in its attributes. Since spans are not captured, the `spans` list will always be empty, and the `for span in spans:` loop never executes any assertions. This makes the provider detection verification ineffective - the test will pass regardless of whether spans are correctly captured.
Check warning on line 1284 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
Unsafe dict access could cause KeyError in test_no_integration when spans lack sentry.op attribute
The `test_no_integration` and `test_async_no_integration` tests filter spans using direct dict access `x["attributes"]["sentry.op"]` which will raise KeyError if any captured span doesn't have the `sentry.op` attribute. When LiteLLM integration is not enabled, other default integrations might produce spans without this attribute. Other tests in this codebase use the safer `.get()` pattern (e.g., `span["attributes"].get("sentry.op")`). The tests would fail with KeyError instead of asserting that no LiteLLM chat spans exist.
Check warning on line 1331 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
[NPB-C78] Unsafe dict access could cause KeyError in test_no_integration when spans lack sentry.op attribute (additional location)
The `test_no_integration` and `test_async_no_integration` tests filter spans using direct dict access `x["attributes"]["sentry.op"]` which will raise KeyError if any captured span doesn't have the `sentry.op` attribute. When LiteLLM integration is not enabled, other default integrations might produce spans without this attribute. Other tests in this codebase use the safer `.get()` pattern (e.g., `span["attributes"].get("sentry.op")`). The tests would fail with KeyError instead of asserting that no LiteLLM chat spans exist.
Check warning on line 3562 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: find-bugs
Test checks wrong key 'attributes' instead of 'data' for transaction trace context
The assertion at line 3560-3561 checks `transaction["contexts"]["trace"].get("attributes", {})` but transactions store span data in `data`, not `attributes`. Other tests in the same file (lines 3359, 3497) correctly use `["data"]` to access `gen_ai.conversation.id` on transactions. This causes the test to always pass (false negative) even if `gen_ai.conversation.id` is incorrectly set in the transaction's `data` field.
Check warning on line 496 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: find-bugs
Missing else branch drops validation for handled_tool_call_exceptions=False case
In `test_agent_with_tool_validation_error`, the old code had an `else` branch that validated `model_behaviour_error` existed when `handled_tool_call_exceptions=False`. The new code removed this branch entirely, meaning the test no longer validates that the unhandled `UnexpectedModelBehavior` exception is captured when `handled_tool_call_exceptions=False`. This allows the test to pass even if Sentry fails to capture the expected error event.
Check warning on line 840 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: find-bugs
test_message_history accesses spans from transaction instead of captured items, causing test to fail or pass vacuously
The test_message_history function was incompletely migrated to V2 envelope format. At line 830, it retrieves spans via `second_transaction["spans"]` (old format), but then at line 832 accesses `s["attributes"].get("sentry.op", "")` (V2 format). In V2, spans are sent as separate envelope items and should be accessed via `[item.payload for item in items if item.type == "span"]`. The transaction object may not contain a "spans" key at all, causing a KeyError, or the nested spans may have the old format (using `s["op"]` and `s["data"]` instead of `s["attributes"]`), causing the filter to find no matches. All other tests in this diff correctly use the pattern `spans = [item.payload for item in items if item.type == "span"]`.