feat: Send GenAI spans as V2 envelope items #6079
6 issues
code-review: Found 6 issues (2 high, 3 medium, 1 low)
High
Wrong variable `event` used instead of `event_opt` causes missing attributes in GenAI V2 spans - `sentry_sdk/client.py:1124`
The code passes event (the original input) to _serialized_v1_span_to_serialized_v2_span at line 1124, but should pass event_opt (the prepared event). The _serialized_v1_span_to_serialized_v2_span function extracts user, release, environment, transaction, trace_context, and sdk_info from the event parameter. These attributes are populated by _prepare_event() and exist on event_opt, not the original event. This will cause V2 GenAI spans to be missing user information, release/environment metadata, and SDK info.
Test never verifies span attributes because spans list is always empty - `tests/integrations/litellm/test_litellm.py:1020-1023`
The code at line 1020 filters for item.type == "span" but items was captured on line 945 with only capture_items("transaction"), not including "span" type. This means spans will always be an empty list, and the for-loop assertion at lines 1021-1023 will never execute, silently passing without verifying any span attributes. The companion test_async_multiple_providers function correctly uses capture_items("transaction", "span") at line 1040.
Also found at:
tests/integrations/litellm/test_litellm.py:945
Medium
Sorting key uses 'name' twice instead of 'name' and 'description' as documented - `tests/integrations/google_genai/test_google_genai.py:330`
The sorting lambda was changed from t.get("description", "") to t.get("name", "") for the second element of the tuple key. This contradicts the comment on line 328 which states "sort by name and description for comparison". Using name twice is redundant and removes the secondary sort by description, which could cause test flakiness if tools have the same name but different descriptions.
Test uses incorrect field name 'attributes' instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`
The test was changed to use "attributes" instead of "data" as the field name within inline_data. However, the implementation code in sentry_sdk/ai/utils.py (line 286) and sentry_sdk/integrations/google_genai/utils.py (line 378) both expect the field to be named "data". This test will not properly validate the inline_data handling logic since the code looks for .get("data", "") which will return an empty string for input with "attributes".
Hardcoded SDK version will break tests on version bump - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`
The test hardcodes "sentry.sdk.version": "2.58.0" instead of using mock.ANY (like the openai tests in this same PR) or importing VERSION from sentry_sdk.consts. When the SDK version is incremented, these tests will fail. This pattern appears in all 7 occurrences in this file.
Also found at:
tests/integrations/huggingface_hub/test_huggingface_hub.py:676tests/integrations/huggingface_hub/test_huggingface_hub.py:942tests/integrations/huggingface_hub/test_huggingface_hub.py:1038
Low
Direct dictionary access may raise KeyError if span lacks expected attributes - `tests/integrations/litellm/test_litellm.py:1283-1285`
In test_no_integration and test_async_no_integration, the code filters spans using x["attributes"]["sentry.op"] and x["attributes"]["sentry.origin"]. If any span is captured that doesn't have these attributes (e.g., from other instrumentation), this will raise a KeyError. The same file uses the safer .get() pattern at line 1427, suggesting awareness of this issue.
Duration: 25m 53s · Tokens: 7.0M in / 84.2k out · Cost: $9.66 (+extraction: $0.00, +merge: $0.00, +fix_gate: $0.01)
Annotations
Check failure on line 1124 in sentry_sdk/client.py
sentry-warden / warden: code-review
Wrong variable `event` used instead of `event_opt` causes missing attributes in GenAI V2 spans
The code passes `event` (the original input) to `_serialized_v1_span_to_serialized_v2_span` at line 1124, but should pass `event_opt` (the prepared event). The `_serialized_v1_span_to_serialized_v2_span` function extracts `user`, `release`, `environment`, `transaction`, `trace_context`, and `sdk_info` from the event parameter. These attributes are populated by `_prepare_event()` and exist on `event_opt`, not the original `event`. This will cause V2 GenAI spans to be missing user information, release/environment metadata, and SDK info.
Check failure on line 1023 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
Test never verifies span attributes because spans list is always empty
The code at line 1020 filters for `item.type == "span"` but `items` was captured on line 945 with only `capture_items("transaction")`, not including "span" type. This means `spans` will always be an empty list, and the for-loop assertion at lines 1021-1023 will never execute, silently passing without verifying any span attributes. The companion `test_async_multiple_providers` function correctly uses `capture_items("transaction", "span")` at line 1040.
Check failure on line 945 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[7HE-V7N] Test never verifies span attributes because spans list is always empty (additional location)
The code at line 1020 filters for `item.type == "span"` but `items` was captured on line 945 with only `capture_items("transaction")`, not including "span" type. This means `spans` will always be an empty list, and the for-loop assertion at lines 1021-1023 will never execute, silently passing without verifying any span attributes. The companion `test_async_multiple_providers` function correctly uses `capture_items("transaction", "span")` at line 1040.
Check warning on line 330 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
Sorting key uses 'name' twice instead of 'name' and 'description' as documented
The sorting lambda was changed from `t.get("description", "")` to `t.get("name", "")` for the second element of the tuple key. This contradicts the comment on line 328 which states "sort by name and description for comparison". Using `name` twice is redundant and removes the secondary sort by description, which could cause test flakiness if tools have the same name but different descriptions.
Check warning on line 2153 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
Test uses incorrect field name 'attributes' instead of 'data' for inline_data
The test was changed to use `"attributes"` instead of `"data"` as the field name within `inline_data`. However, the implementation code in `sentry_sdk/ai/utils.py` (line 286) and `sentry_sdk/integrations/google_genai/utils.py` (line 378) both expect the field to be named `"data"`. This test will not properly validate the inline_data handling logic since the code looks for `.get("data", "")` which will return an empty string for input with `"attributes"`.
Check warning on line 523 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
Hardcoded SDK version will break tests on version bump
The test hardcodes `"sentry.sdk.version": "2.58.0"` instead of using `mock.ANY` (like the openai tests in this same PR) or importing `VERSION` from `sentry_sdk.consts`. When the SDK version is incremented, these tests will fail. This pattern appears in all 7 occurrences in this file.
Check warning on line 676 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
[XGB-JDN] Hardcoded SDK version will break tests on version bump (additional location)
The test hardcodes `"sentry.sdk.version": "2.58.0"` instead of using `mock.ANY` (like the openai tests in this same PR) or importing `VERSION` from `sentry_sdk.consts`. When the SDK version is incremented, these tests will fail. This pattern appears in all 7 occurrences in this file.
Check warning on line 942 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
[XGB-JDN] Hardcoded SDK version will break tests on version bump (additional location)
The test hardcodes `"sentry.sdk.version": "2.58.0"` instead of using `mock.ANY` (like the openai tests in this same PR) or importing `VERSION` from `sentry_sdk.consts`. When the SDK version is incremented, these tests will fail. This pattern appears in all 7 occurrences in this file.
Check warning on line 1038 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
[XGB-JDN] Hardcoded SDK version will break tests on version bump (additional location)
The test hardcodes `"sentry.sdk.version": "2.58.0"` instead of using `mock.ANY` (like the openai tests in this same PR) or importing `VERSION` from `sentry_sdk.consts`. When the SDK version is incremented, these tests will fail. This pattern appears in all 7 occurrences in this file.