feat: Send GenAI spans as V2 envelope items #6079
7 issues
code-review: Found 7 issues (1 high, 5 medium, 1 low)
High
test_message_history accesses transaction-embedded spans with V2 envelope accessors - `tests/integrations/pydantic_ai/test_pydantic_ai.py:830-832`
At line 830, spans are retrieved from second_transaction["spans"] which returns spans in the transaction-embedded format (with op and data keys). However, at line 832, the code tries to filter these using s["attributes"].get("sentry.op", "") which is the V2 envelope span format. This mismatch will cause a KeyError or fail to find any matching spans since transaction-embedded spans don't have an attributes key. The test should either extract spans from capture_items("span") (like other tests in this file do at lines 83-86) or use the correct accessor s["op"] for transaction-embedded spans.
Medium
Uses raw `event` instead of processed `event_opt` for span conversion - `sentry_sdk/client.py:1130`
On line 1130, the code passes event to _serialized_v1_span_to_serialized_v2_span instead of event_opt. While _prepare_event typically modifies event in-place, using event_opt is semantically correct and consistent with the rest of this function which uses event_opt for the processed/prepared event. The _serialized_v1_span_to_serialized_v2_span function extracts metadata like user, release, environment from the event, and in edge cases where event and event_opt diverge, this could result in missing or incorrect attributes on the converted GenAI spans.
Also found at:
tests/integrations/langchain/test_langchain.py:1368
Test uses incorrect inline_data key 'attributes' instead of 'data' - `tests/integrations/google_genai/test_google_genai.py:2153`
The test changes the inline_data dict key from "data" to "attributes", but the production code in sentry_sdk/ai/utils.py:286 and sentry_sdk/integrations/google_genai/utils.py:378 expects the key to be "data". This test will not properly verify the inline_data handling since inline_data.get("data", "") will return an empty string instead of the binary data.
Also found at:
tests/integrations/openai_agents/test_openai_agents.py:3560-3562
Unused list comprehension makes test ineffective at validating error handling - `tests/integrations/langchain/test_langchain.py:1840-1844`
The list comprehension at lines 1840-1844 creates a list from items filtering for error events, but the result is never assigned to a variable or used in any assertion. Combined with the fact that capture_items("transaction", "span") doesn't capture events (only "transaction" and "span" types), this test named test_langchain_embeddings_error_handling doesn't actually validate that errors are properly captured.
Test does not capture span items but later asserts on them - `tests/integrations/litellm/test_litellm.py:945`
The capture_items("transaction") call at line 945 only captures transaction items, but the test later attempts to filter and assert on spans at lines 1020-1023 (item.type == "span"). Since spans are not being captured, the spans list will always be empty and the for loop assertion will never execute, making this test ineffective at validating span attributes. This was likely an oversight when converting from capture_events to capture_items.
Also found at:
tests/integrations/litellm/test_litellm.py:320tests/integrations/huggingface_hub/test_huggingface_hub.py:523
Test may silently pass without verifying any assertions - `tests/integrations/pydantic_ai/test_pydantic_ai.py:988-995`
The test test_include_prompts_requires_pii iterates over chat_spans to verify that request messages are not captured when PII is disabled, but does not assert that chat_spans contains any elements. If no spans are captured (due to instrumentation bug or test setup issue), the for-loop never executes and the test passes without verifying anything. Other similar tests in this file include assert len(chat_spans) >= 1 before the verification loop.
Also found at:
tests/integrations/litellm/test_litellm.py:1118-1121
Low
Sorting key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`
The sorting lambda on line 330 was changed to use t.get("name", "") for both tuple elements instead of (name, description). This contradicts the comment on line 328 which states "sort by name and description for comparison". While this may still work for this specific test since the two tools have different names, it introduces inconsistency and could cause non-deterministic ordering if tools with the same name but different descriptions are added.
Also found at:
tests/integrations/langchain/test_langchain.py:939-940
Duration: 38m 21s · Tokens: 13.0M in / 169.5k out · Cost: $18.16 (+extraction: $0.01, +merge: $0.01, +fix_gate: $0.02)
Annotations
Check failure on line 832 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
test_message_history accesses transaction-embedded spans with V2 envelope accessors
At line 830, spans are retrieved from `second_transaction["spans"]` which returns spans in the transaction-embedded format (with `op` and `data` keys). However, at line 832, the code tries to filter these using `s["attributes"].get("sentry.op", "")` which is the V2 envelope span format. This mismatch will cause a KeyError or fail to find any matching spans since transaction-embedded spans don't have an `attributes` key. The test should either extract spans from `capture_items("span")` (like other tests in this file do at lines 83-86) or use the correct accessor `s["op"]` for transaction-embedded spans.
Check warning on line 1130 in sentry_sdk/client.py
sentry-warden / warden: code-review
Uses raw `event` instead of processed `event_opt` for span conversion
On line 1130, the code passes `event` to `_serialized_v1_span_to_serialized_v2_span` instead of `event_opt`. While `_prepare_event` typically modifies `event` in-place, using `event_opt` is semantically correct and consistent with the rest of this function which uses `event_opt` for the processed/prepared event. The `_serialized_v1_span_to_serialized_v2_span` function extracts metadata like user, release, environment from the event, and in edge cases where `event` and `event_opt` diverge, this could result in missing or incorrect attributes on the converted GenAI spans.
Check warning on line 1368 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
[UEW-3BM] Uses raw `event` instead of processed `event_opt` for span conversion (additional location)
On line 1130, the code passes `event` to `_serialized_v1_span_to_serialized_v2_span` instead of `event_opt`. While `_prepare_event` typically modifies `event` in-place, using `event_opt` is semantically correct and consistent with the rest of this function which uses `event_opt` for the processed/prepared event. The `_serialized_v1_span_to_serialized_v2_span` function extracts metadata like user, release, environment from the event, and in edge cases where `event` and `event_opt` diverge, this could result in missing or incorrect attributes on the converted GenAI spans.
Check warning on line 2153 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
Test uses incorrect inline_data key 'attributes' instead of 'data'
The test changes the `inline_data` dict key from `"data"` to `"attributes"`, but the production code in `sentry_sdk/ai/utils.py:286` and `sentry_sdk/integrations/google_genai/utils.py:378` expects the key to be `"data"`. This test will not properly verify the inline_data handling since `inline_data.get("data", "")` will return an empty string instead of the binary data.
Check warning on line 3562 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
[3PU-7CN] Test uses incorrect inline_data key 'attributes' instead of 'data' (additional location)
The test changes the `inline_data` dict key from `"data"` to `"attributes"`, but the production code in `sentry_sdk/ai/utils.py:286` and `sentry_sdk/integrations/google_genai/utils.py:378` expects the key to be `"data"`. This test will not properly verify the inline_data handling since `inline_data.get("data", "")` will return an empty string instead of the binary data.
Check warning on line 1844 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
Unused list comprehension makes test ineffective at validating error handling
The list comprehension at lines 1840-1844 creates a list from `items` filtering for error events, but the result is never assigned to a variable or used in any assertion. Combined with the fact that `capture_items("transaction", "span")` doesn't capture events (only "transaction" and "span" types), this test named `test_langchain_embeddings_error_handling` doesn't actually validate that errors are properly captured.
Check warning on line 945 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
Test does not capture span items but later asserts on them
The `capture_items("transaction")` call at line 945 only captures transaction items, but the test later attempts to filter and assert on spans at lines 1020-1023 (`item.type == "span"`). Since spans are not being captured, the `spans` list will always be empty and the for loop assertion will never execute, making this test ineffective at validating span attributes. This was likely an oversight when converting from `capture_events` to `capture_items`.
Check warning on line 320 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[KXL-VS3] Test does not capture span items but later asserts on them (additional location)
The `capture_items("transaction")` call at line 945 only captures transaction items, but the test later attempts to filter and assert on spans at lines 1020-1023 (`item.type == "span"`). Since spans are not being captured, the `spans` list will always be empty and the for loop assertion will never execute, making this test ineffective at validating span attributes. This was likely an oversight when converting from `capture_events` to `capture_items`.
Check warning on line 523 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
[KXL-VS3] Test does not capture span items but later asserts on them (additional location)
The `capture_items("transaction")` call at line 945 only captures transaction items, but the test later attempts to filter and assert on spans at lines 1020-1023 (`item.type == "span"`). Since spans are not being captured, the `spans` list will always be empty and the for loop assertion will never execute, making this test ineffective at validating span attributes. This was likely an oversight when converting from `capture_events` to `capture_items`.
Check warning on line 995 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
Test may silently pass without verifying any assertions
The test `test_include_prompts_requires_pii` iterates over `chat_spans` to verify that request messages are not captured when PII is disabled, but does not assert that `chat_spans` contains any elements. If no spans are captured (due to instrumentation bug or test setup issue), the for-loop never executes and the test passes without verifying anything. Other similar tests in this file include `assert len(chat_spans) >= 1` before the verification loop.
Check warning on line 1121 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[848-J2E] Test may silently pass without verifying any assertions (additional location)
The test `test_include_prompts_requires_pii` iterates over `chat_spans` to verify that request messages are not captured when PII is disabled, but does not assert that `chat_spans` contains any elements. If no spans are captured (due to instrumentation bug or test setup issue), the for-loop never executes and the test passes without verifying anything. Other similar tests in this file include `assert len(chat_spans) >= 1` before the verification loop.