feat: Send GenAI spans as V2 envelope items #6079
6 issues
code-review: Found 6 issues (1 high, 4 medium, 1 low)
High
tx["_meta"]["spans"] will cause KeyError - spans are no longer nested in transaction - `tests/integrations/langchain/test_langchain.py:1370`
Line 1370 accesses tx["_meta"]["spans"]["0"]["data"] to verify message truncation metadata. However, the test was migrated to use V2 envelope items where spans are separate items (accessed via [item.payload for item in items if item.type == "span"]) rather than being nested within the transaction. Since spans are no longer embedded in tx, the path tx["_meta"]["spans"] will raise a KeyError at runtime, causing this test to fail.
Also found at:
tests/integrations/openai_agents/test_openai_agents.py:3039
Medium
V2 span conversion uses unprocessed event instead of processed event_opt - `sentry_sdk/client.py:1138`
At line 1138, _serialized_v1_span_to_serialized_v2_span(span, event) passes the original event parameter instead of event_opt. The _serialized_v1_span_to_serialized_v2_span function reads user, release, environment, transaction, trace context, and SDK info from the event (lines 224-251 in client.py). These fields are populated during _prepare_event via scope.apply_to_event, which can return a different object via event processors (lines 1817-1823 in scope.py). When event processors return a new object, V2 spans will be missing user, release, environment, and other scope-applied attributes.
Also found at:
tests/integrations/litellm/test_litellm.py:945tests/integrations/huggingface_hub/test_huggingface_hub.py:710
List comprehension result is discarded without any assertion or assignment - `tests/integrations/langchain/test_langchain.py:1842-1846`
The list comprehension at lines 1842-1846 computes error events but does not assign the result to a variable or make any assertions on it. This makes the test ineffective - it doesn't actually verify that errors are captured or handled correctly. Similar tests in litellm/test_litellm.py assign the result to error_events and assert on its length.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496
Test assertion uses wrong key 'attributes' instead of 'data' for transaction contexts - `tests/integrations/openai_agents/test_openai_agents.py:3540-3542`
The test at line 3540 checks for gen_ai.conversation.id in transaction["contexts"]["trace"].get("attributes", {}), but transactions store this data under data, not attributes. Looking at the capture_items fixture in conftest.py (lines 361-370), only span items have their attributes transformed; transactions use the raw payload which contains data. Other assertions in the same file (lines 3341 and 3479) correctly use data for transaction context access.
Also found at:
tests/integrations/openai_agents/test_openai_agents.py:2257tests/integrations/langchain/test_langchain.py:1953-1954
test_message_history accesses transaction-nested spans using wrong attribute format, causing silent test pass - `tests/integrations/pydantic_ai/test_pydantic_ai.py:831-841`
At line 831, spans are extracted from second_transaction["spans"] and then filtered using s["attributes"].get("sentry.op", ""). However, spans nested inside transaction payloads use the legacy format with s["op"], not s["attributes"]["sentry.op"]. This mismatch causes the filter to find zero chat_spans, and since the test uses if chat_spans: at line 836, it silently passes without verifying the message history feature.
Also found at:
tests/integrations/langchain/test_langchain.py:262
Low
Sorting key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`
The sorting lambda at line 330 uses t.get("name", "") twice instead of sorting by name and description as the comment on line 328 states. The original code used (t.get("name", ""), t.get("description", "")). While this doesn't cause a runtime error since the tool names are unique, it makes the second tuple element redundant and contradicts the documented intent.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:493
Duration: 21m 41s · Tokens: 15.9M in / 195.5k out · Cost: $22.30 (+extraction: $0.03, +merge: $0.01, +fix_gate: $0.01)
Annotations
Check failure on line 1370 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
tx["_meta"]["spans"] will cause KeyError - spans are no longer nested in transaction
Line 1370 accesses `tx["_meta"]["spans"]["0"]["data"]` to verify message truncation metadata. However, the test was migrated to use V2 envelope items where spans are separate items (accessed via `[item.payload for item in items if item.type == "span"]`) rather than being nested within the transaction. Since spans are no longer embedded in `tx`, the path `tx["_meta"]["spans"]` will raise a KeyError at runtime, causing this test to fail.
Check failure on line 3039 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
[DFD-QKM] tx["_meta"]["spans"] will cause KeyError - spans are no longer nested in transaction (additional location)
Line 1370 accesses `tx["_meta"]["spans"]["0"]["data"]` to verify message truncation metadata. However, the test was migrated to use V2 envelope items where spans are separate items (accessed via `[item.payload for item in items if item.type == "span"]`) rather than being nested within the transaction. Since spans are no longer embedded in `tx`, the path `tx["_meta"]["spans"]` will raise a KeyError at runtime, causing this test to fail.
Check warning on line 1138 in sentry_sdk/client.py
sentry-warden / warden: code-review
V2 span conversion uses unprocessed event instead of processed event_opt
At line 1138, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes the original `event` parameter instead of `event_opt`. The `_serialized_v1_span_to_serialized_v2_span` function reads `user`, `release`, `environment`, `transaction`, trace context, and SDK info from the event (lines 224-251 in client.py). These fields are populated during `_prepare_event` via `scope.apply_to_event`, which can return a different object via event processors (lines 1817-1823 in scope.py). When event processors return a new object, V2 spans will be missing user, release, environment, and other scope-applied attributes.
Check warning on line 945 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[XXJ-8AW] V2 span conversion uses unprocessed event instead of processed event_opt (additional location)
At line 1138, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes the original `event` parameter instead of `event_opt`. The `_serialized_v1_span_to_serialized_v2_span` function reads `user`, `release`, `environment`, `transaction`, trace context, and SDK info from the event (lines 224-251 in client.py). These fields are populated during `_prepare_event` via `scope.apply_to_event`, which can return a different object via event processors (lines 1817-1823 in scope.py). When event processors return a new object, V2 spans will be missing user, release, environment, and other scope-applied attributes.
Check warning on line 710 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
[XXJ-8AW] V2 span conversion uses unprocessed event instead of processed event_opt (additional location)
At line 1138, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes the original `event` parameter instead of `event_opt`. The `_serialized_v1_span_to_serialized_v2_span` function reads `user`, `release`, `environment`, `transaction`, trace context, and SDK info from the event (lines 224-251 in client.py). These fields are populated during `_prepare_event` via `scope.apply_to_event`, which can return a different object via event processors (lines 1817-1823 in scope.py). When event processors return a new object, V2 spans will be missing user, release, environment, and other scope-applied attributes.
Check warning on line 1846 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
List comprehension result is discarded without any assertion or assignment
The list comprehension at lines 1842-1846 computes error events but does not assign the result to a variable or make any assertions on it. This makes the test ineffective - it doesn't actually verify that errors are captured or handled correctly. Similar tests in litellm/test_litellm.py assign the result to `error_events` and assert on its length.
Check warning on line 496 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
[QCN-PY7] List comprehension result is discarded without any assertion or assignment (additional location)
The list comprehension at lines 1842-1846 computes error events but does not assign the result to a variable or make any assertions on it. This makes the test ineffective - it doesn't actually verify that errors are captured or handled correctly. Similar tests in litellm/test_litellm.py assign the result to `error_events` and assert on its length.
Check warning on line 3542 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
Test assertion uses wrong key 'attributes' instead of 'data' for transaction contexts
The test at line 3540 checks for `gen_ai.conversation.id` in `transaction["contexts"]["trace"].get("attributes", {})`, but transactions store this data under `data`, not `attributes`. Looking at the `capture_items` fixture in conftest.py (lines 361-370), only span items have their attributes transformed; transactions use the raw payload which contains `data`. Other assertions in the same file (lines 3341 and 3479) correctly use `data` for transaction context access.
Check warning on line 2257 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
[8VJ-WMK] Test assertion uses wrong key 'attributes' instead of 'data' for transaction contexts (additional location)
The test at line 3540 checks for `gen_ai.conversation.id` in `transaction["contexts"]["trace"].get("attributes", {})`, but transactions store this data under `data`, not `attributes`. Looking at the `capture_items` fixture in conftest.py (lines 361-370), only span items have their attributes transformed; transactions use the raw payload which contains `data`. Other assertions in the same file (lines 3341 and 3479) correctly use `data` for transaction context access.
Check warning on line 1954 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
[8VJ-WMK] Test assertion uses wrong key 'attributes' instead of 'data' for transaction contexts (additional location)
The test at line 3540 checks for `gen_ai.conversation.id` in `transaction["contexts"]["trace"].get("attributes", {})`, but transactions store this data under `data`, not `attributes`. Looking at the `capture_items` fixture in conftest.py (lines 361-370), only span items have their attributes transformed; transactions use the raw payload which contains `data`. Other assertions in the same file (lines 3341 and 3479) correctly use `data` for transaction context access.
Check warning on line 841 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
test_message_history accesses transaction-nested spans using wrong attribute format, causing silent test pass
At line 831, spans are extracted from `second_transaction["spans"]` and then filtered using `s["attributes"].get("sentry.op", "")`. However, spans nested inside transaction payloads use the legacy format with `s["op"]`, not `s["attributes"]["sentry.op"]`. This mismatch causes the filter to find zero `chat_spans`, and since the test uses `if chat_spans:` at line 836, it silently passes without verifying the message history feature.
Check warning on line 262 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
[VQS-JUK] test_message_history accesses transaction-nested spans using wrong attribute format, causing silent test pass (additional location)
At line 831, spans are extracted from `second_transaction["spans"]` and then filtered using `s["attributes"].get("sentry.op", "")`. However, spans nested inside transaction payloads use the legacy format with `s["op"]`, not `s["attributes"]["sentry.op"]`. This mismatch causes the filter to find zero `chat_spans`, and since the test uses `if chat_spans:` at line 836, it silently passes without verifying the message history feature.