feat: Send GenAI spans as V2 envelope items #6079
7 issues
find-bugs: Found 7 issues (2 high, 4 medium, 1 low)
High
test_multiple_providers captures only transactions but asserts on spans - `tests/integrations/litellm/test_litellm.py:945`
Line 945 calls capture_items("transaction") which only captures transactions. However, lines 1020-1023 (outside the hunk but testing items set up in the hunk) filter for item.type == "span" and assert on span attributes. Since spans are never captured, the spans list will be empty and the for-loop assertion will trivially pass, making the test ineffective at verifying span attributes.
Also found at:
tests/integrations/litellm/test_litellm.py:1020-1023tests/integrations/litellm/test_litellm.py:866-868
Test expects V2 span envelope for non-gen_ai op span, will fail - `tests/tracing/test_misc.py:618-629`
The test test_conversation_id_propagates_to_span_with_gen_ai_operation_name was modified to use capture_items("span") which captures V2 envelope span items. However, the span being created has op="http.client", and _split_gen_ai_spans() in client.py only splits spans where op starts with gen_ai.. This span will NOT be sent as a V2 envelope item - it will remain in the transaction event. The test will fail because spans list will be empty or not contain the expected span.
Also found at:
tests/tracing/test_misc.py:636-647
Medium
V2 GenAI spans may be missing release, environment, and SDK metadata - `sentry_sdk/client.py:1130`
Line 1130 passes event instead of event_opt to _serialized_v1_span_to_serialized_v2_span. The _prepare_event function (lines 811-817) populates release, environment, and sdk from options into the event, but those values only exist in the returned event_opt. The original event parameter may not contain these fields, causing V2 GenAI spans to be missing sentry.release, sentry.environment, sentry.sdk.name, and sentry.sdk.version attributes.
Also found at:
tests/integrations/google_genai/test_google_genai.py:2153
Hardcoded SDK version '2.58.0' will cause test failure on version change - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`
In test_text_generation, the expected sentry.sdk.version attribute is hardcoded as "2.58.0" (line 523) instead of using mock.ANY like all other similar tests in this file. This test will fail when the SDK version changes, unlike test_text_generation_streaming, test_chat_completion, and other tests which correctly use mock.ANY for version comparison.
Also found at:
tests/integrations/openai_agents/test_openai_agents.py:1097
Test accesses orphaned _meta after gen_ai span is removed from transaction - `tests/integrations/openai/test_openai.py:3758-3760`
After gen_ai spans are split from the transaction and sent as V2 envelope items, the transaction's spans list no longer contains the gen_ai span. However, the test still accesses event["_meta"]["spans"]["0"]["data"] expecting truncation metadata. Since the span at index 0 has been moved to the V2 envelope, _meta["spans"]["0"] now references metadata for a span that no longer exists in the transaction's spans array. This test will likely fail or assert against orphaned/stale metadata.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:830-833
Test checks wrong field 'attributes' instead of 'data' for transaction trace context - `tests/integrations/openai_agents/test_openai_agents.py:3560-3562`
At line 3560-3561, the test checks transaction["contexts"]["trace"].get("attributes", {}) to verify conversation_id is not set. However, all other tests in this file (lines 3359, 3497) and throughout the test suite access transaction trace data via transaction["contexts"]["trace"]["data"]. This inconsistency means the test will always pass since it's checking a non-existent 'attributes' field, while the actual data might still contain the conversation_id in the 'data' field.
Low
Duplicated sort key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`
The sorting lambda for tools on line 330 was changed from (t.get("name", ""), t.get("description", "")) to (t.get("name", ""), t.get("name", "")). This duplicates 'name' as both primary and secondary sort keys, making the secondary sort redundant. While this works for the current test data (since tool names are distinct), it loses the intended secondary sort by description and appears to be a copy-paste error.
Also found at:
tests/integrations/langchain/test_langchain.py:1840-1844
Duration: 23m 34s · Tokens: 18.5M in / 212.2k out · Cost: $26.27 (+extraction: $0.03, +merge: $0.01, +fix_gate: $0.02)
Annotations
Check failure on line 945 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
test_multiple_providers captures only transactions but asserts on spans
Line 945 calls `capture_items("transaction")` which only captures transactions. However, lines 1020-1023 (outside the hunk but testing `items` set up in the hunk) filter for `item.type == "span"` and assert on span attributes. Since spans are never captured, the `spans` list will be empty and the for-loop assertion will trivially pass, making the test ineffective at verifying span attributes.
Check failure on line 1023 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
[KB4-XQE] test_multiple_providers captures only transactions but asserts on spans (additional location)
Line 945 calls `capture_items("transaction")` which only captures transactions. However, lines 1020-1023 (outside the hunk but testing `items` set up in the hunk) filter for `item.type == "span"` and assert on span attributes. Since spans are never captured, the `spans` list will be empty and the for-loop assertion will trivially pass, making the test ineffective at verifying span attributes.
Check failure on line 868 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
[KB4-XQE] test_multiple_providers captures only transactions but asserts on spans (additional location)
Line 945 calls `capture_items("transaction")` which only captures transactions. However, lines 1020-1023 (outside the hunk but testing `items` set up in the hunk) filter for `item.type == "span"` and assert on span attributes. Since spans are never captured, the `spans` list will be empty and the for-loop assertion will trivially pass, making the test ineffective at verifying span attributes.
Check failure on line 629 in tests/tracing/test_misc.py
sentry-warden / warden: find-bugs
Test expects V2 span envelope for non-gen_ai op span, will fail
The test `test_conversation_id_propagates_to_span_with_gen_ai_operation_name` was modified to use `capture_items("span")` which captures V2 envelope span items. However, the span being created has `op="http.client"`, and `_split_gen_ai_spans()` in client.py only splits spans where op starts with `gen_ai.`. This span will NOT be sent as a V2 envelope item - it will remain in the transaction event. The test will fail because `spans` list will be empty or not contain the expected span.
Check failure on line 647 in tests/tracing/test_misc.py
sentry-warden / warden: find-bugs
[76L-VVC] Test expects V2 span envelope for non-gen_ai op span, will fail (additional location)
The test `test_conversation_id_propagates_to_span_with_gen_ai_operation_name` was modified to use `capture_items("span")` which captures V2 envelope span items. However, the span being created has `op="http.client"`, and `_split_gen_ai_spans()` in client.py only splits spans where op starts with `gen_ai.`. This span will NOT be sent as a V2 envelope item - it will remain in the transaction event. The test will fail because `spans` list will be empty or not contain the expected span.
Check warning on line 1130 in sentry_sdk/client.py
sentry-warden / warden: find-bugs
V2 GenAI spans may be missing release, environment, and SDK metadata
Line 1130 passes `event` instead of `event_opt` to `_serialized_v1_span_to_serialized_v2_span`. The `_prepare_event` function (lines 811-817) populates `release`, `environment`, and `sdk` from options into the event, but those values only exist in the returned `event_opt`. The original `event` parameter may not contain these fields, causing V2 GenAI spans to be missing `sentry.release`, `sentry.environment`, `sentry.sdk.name`, and `sentry.sdk.version` attributes.
Check warning on line 2153 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: find-bugs
[ZP5-7W4] V2 GenAI spans may be missing release, environment, and SDK metadata (additional location)
Line 1130 passes `event` instead of `event_opt` to `_serialized_v1_span_to_serialized_v2_span`. The `_prepare_event` function (lines 811-817) populates `release`, `environment`, and `sdk` from options into the event, but those values only exist in the returned `event_opt`. The original `event` parameter may not contain these fields, causing V2 GenAI spans to be missing `sentry.release`, `sentry.environment`, `sentry.sdk.name`, and `sentry.sdk.version` attributes.
Check warning on line 523 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: find-bugs
Hardcoded SDK version '2.58.0' will cause test failure on version change
In `test_text_generation`, the expected `sentry.sdk.version` attribute is hardcoded as `"2.58.0"` (line 523) instead of using `mock.ANY` like all other similar tests in this file. This test will fail when the SDK version changes, unlike `test_text_generation_streaming`, `test_chat_completion`, and other tests which correctly use `mock.ANY` for version comparison.
Check warning on line 1097 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: find-bugs
[75Z-DFJ] Hardcoded SDK version '2.58.0' will cause test failure on version change (additional location)
In `test_text_generation`, the expected `sentry.sdk.version` attribute is hardcoded as `"2.58.0"` (line 523) instead of using `mock.ANY` like all other similar tests in this file. This test will fail when the SDK version changes, unlike `test_text_generation_streaming`, `test_chat_completion`, and other tests which correctly use `mock.ANY` for version comparison.
Check warning on line 3760 in tests/integrations/openai/test_openai.py
sentry-warden / warden: find-bugs
Test accesses orphaned _meta after gen_ai span is removed from transaction
After gen_ai spans are split from the transaction and sent as V2 envelope items, the transaction's spans list no longer contains the gen_ai span. However, the test still accesses `event["_meta"]["spans"]["0"]["data"]` expecting truncation metadata. Since the span at index 0 has been moved to the V2 envelope, `_meta["spans"]["0"]` now references metadata for a span that no longer exists in the transaction's spans array. This test will likely fail or assert against orphaned/stale metadata.
Check warning on line 833 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: find-bugs
[AZN-DCX] Test accesses orphaned _meta after gen_ai span is removed from transaction (additional location)
After gen_ai spans are split from the transaction and sent as V2 envelope items, the transaction's spans list no longer contains the gen_ai span. However, the test still accesses `event["_meta"]["spans"]["0"]["data"]` expecting truncation metadata. Since the span at index 0 has been moved to the V2 envelope, `_meta["spans"]["0"]` now references metadata for a span that no longer exists in the transaction's spans array. This test will likely fail or assert against orphaned/stale metadata.
Check warning on line 3562 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: find-bugs
Test checks wrong field 'attributes' instead of 'data' for transaction trace context
At line 3560-3561, the test checks `transaction["contexts"]["trace"].get("attributes", {})` to verify conversation_id is not set. However, all other tests in this file (lines 3359, 3497) and throughout the test suite access transaction trace data via `transaction["contexts"]["trace"]["data"]`. This inconsistency means the test will always pass since it's checking a non-existent 'attributes' field, while the actual data might still contain the conversation_id in the 'data' field.