feat: Send GenAI spans as V2 envelope items #6079
10 issues
find-bugs: Found 10 issues (6 medium, 4 low)
Medium
GenAI V2 spans missing release, environment, and SDK metadata when event processors return new objects - `sentry_sdk/client.py:243`
At line 1138, _serialized_v1_span_to_serialized_v2_span(span, event) passes the original event object instead of event_opt. The issue is that release, environment, and sdk metadata are added to the LOCAL event variable inside _prepare_event (lines 815-821 in the full file), which may be a different object from the original event if scope.apply_to_event or any event processor returned a new dict. This results in V2 spans missing sentry.release, sentry.environment, sentry.sdk.name, and sentry.sdk.version attributes.
Also found at:
sentry_sdk/client.py:1138
test_multiple_providers silently passes without testing spans due to missing span capture - `tests/integrations/litellm/test_litellm.py:945`
On line 945, capture_items("transaction") only captures transaction items, but lines 1020-1023 try to filter for and iterate over "span" items from items. Since no spans are captured, spans will always be an empty list, causing the for-loop assertions to pass vacuously without actually testing that spans have the expected SPANDATA.GEN_AI_SYSTEM attribute. The async version test_async_multiple_providers correctly uses capture_items("transaction", "span").
Also found at:
tests/integrations/litellm/test_litellm.py:1020-1023
Direct dictionary access on span attributes may raise KeyError - `tests/integrations/litellm/test_litellm.py:1283-1285`
In test_no_integration, the code filters spans using direct dictionary access x["attributes"]["sentry.op"] instead of .get(). If any captured span lacks the sentry.op attribute, this will raise a KeyError. Other similar tests in the codebase (anthropic, langchain, openai_agents) use .get("sentry.op") for safe access. This could cause the test to fail with an unexpected error rather than correctly asserting there are no litellm chat spans.
Also found at:
tests/integrations/litellm/test_litellm.py:1330-1332
Test assertion checks wrong key 'attributes' instead of 'data' for transaction contexts - `tests/integrations/openai_agents/test_openai_agents.py:3540-3542`
The test at line 3540-3541 checks transaction["contexts"]["trace"].get("attributes", {}) but transactions use data not attributes for trace context data. Other tests in the same file (lines 3341, 3479) correctly use data. This assertion will always pass because attributes doesn't exist on transactions, masking any actual bugs where gen_ai.conversation.id might incorrectly be set.
Test does not validate model_behaviour_error event when handled_tool_call_exceptions=False - `tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496`
In test_agent_with_tool_validation_error, the original code extracted model_behaviour_error in both branches: when handled_tool_call_exceptions=True (with error) and when handled_tool_call_exceptions=False (alone). The refactored code only extracts events when handled_tool_call_exceptions=True, completely omitting any event validation for the False case. This means the test no longer verifies that UnexpectedModelBehavior is properly captured as an event when handled_tool_call_exceptions=False, reducing test coverage.
test_message_history uses inconsistent span data format causing test to silently pass without validating anything - `tests/integrations/pydantic_ai/test_pydantic_ai.py:831-833`
In test_message_history (lines 831-833), spans are extracted from the transaction using second_transaction["spans"] which returns spans in the old format (with s["op"] directly). However, the code then filters with s["attributes"].get("sentry.op", ""), expecting the new V2 envelope format. Since spans embedded in transactions don't have an attributes dict, s["attributes"] will raise a KeyError, or if attributes exists but lacks sentry.op, the filter returns no matches. The test then has conditional checks (if chat_spans: and if "gen_ai.request.messages" in chat_span["attributes"]:) that silently pass when no spans match, meaning the test validates nothing.
Low
Sorting key incorrectly uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`
The sorting lambda on line 330 was changed from (t.get("name", ""), t.get("description", "")) to (t.get("name", ""), t.get("name", "")). This introduces a copy-paste error where the second key uses "name" instead of "description". The comment on line 328 states the intent is to "sort by name and description for comparison", contradicting the implementation. While the test still passes because tool names happen to be unique, the logic is inconsistent with stated intent.
Hardcoded SDK version will cause test failure when version changes - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`
In test_text_generation, the expected sentry.sdk.version attribute is hardcoded to "2.58.0" instead of using mock.ANY like all other tests in this file and across the codebase. When the SDK version is bumped, this test will fail with an assertion error comparing the expected hardcoded version against the actual current version.
capture_items filter mismatch - events not captured but filtered for - `tests/integrations/langchain/test_langchain.py:1823`
Line 1823 calls capture_items("transaction", "span") which only captures transactions and spans. However, lines 1842-1846 attempt to filter by item.type == "event", which will always produce an empty list since events are not being captured. If the test intends to verify error events, it needs to include "event" in the capture_items call.
Also found at:
tests/integrations/langchain/test_langchain.py:1842-1846
test_langchain_embeddings_span_hierarchy does not verify actual hierarchy relationship - `tests/integrations/langchain/test_langchain.py:1956-1964`
The test docstring states it verifies "embeddings spans are properly nested within parent spans" but only asserts that both spans exist. There is no assertion checking the parent-child relationship (e.g., comparing parent_span_id). The test passes as long as both spans are created, even if the hierarchy is wrong.
Duration: 38m 35s · Tokens: 23.0M in / 228.7k out · Cost: $31.68 (+extraction: $0.03, +merge: $0.01, +fix_gate: $0.02)
Annotations
Check warning on line 243 in sentry_sdk/client.py
sentry-warden / warden: find-bugs
GenAI V2 spans missing release, environment, and SDK metadata when event processors return new objects
At line 1138, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes the original `event` object instead of `event_opt`. The issue is that `release`, `environment`, and `sdk` metadata are added to the LOCAL `event` variable inside `_prepare_event` (lines 815-821 in the full file), which may be a different object from the original `event` if `scope.apply_to_event` or any event processor returned a new dict. This results in V2 spans missing `sentry.release`, `sentry.environment`, `sentry.sdk.name`, and `sentry.sdk.version` attributes.
Check warning on line 1138 in sentry_sdk/client.py
sentry-warden / warden: find-bugs
[K5C-BP3] GenAI V2 spans missing release, environment, and SDK metadata when event processors return new objects (additional location)
At line 1138, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes the original `event` object instead of `event_opt`. The issue is that `release`, `environment`, and `sdk` metadata are added to the LOCAL `event` variable inside `_prepare_event` (lines 815-821 in the full file), which may be a different object from the original `event` if `scope.apply_to_event` or any event processor returned a new dict. This results in V2 spans missing `sentry.release`, `sentry.environment`, `sentry.sdk.name`, and `sentry.sdk.version` attributes.
Check warning on line 945 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
test_multiple_providers silently passes without testing spans due to missing span capture
On line 945, `capture_items("transaction")` only captures transaction items, but lines 1020-1023 try to filter for and iterate over "span" items from `items`. Since no spans are captured, `spans` will always be an empty list, causing the for-loop assertions to pass vacuously without actually testing that spans have the expected `SPANDATA.GEN_AI_SYSTEM` attribute. The async version `test_async_multiple_providers` correctly uses `capture_items("transaction", "span")`.
Check warning on line 1023 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
[URW-WPE] test_multiple_providers silently passes without testing spans due to missing span capture (additional location)
On line 945, `capture_items("transaction")` only captures transaction items, but lines 1020-1023 try to filter for and iterate over "span" items from `items`. Since no spans are captured, `spans` will always be an empty list, causing the for-loop assertions to pass vacuously without actually testing that spans have the expected `SPANDATA.GEN_AI_SYSTEM` attribute. The async version `test_async_multiple_providers` correctly uses `capture_items("transaction", "span")`.
Check warning on line 1285 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
Direct dictionary access on span attributes may raise KeyError
In `test_no_integration`, the code filters spans using direct dictionary access `x["attributes"]["sentry.op"]` instead of `.get()`. If any captured span lacks the `sentry.op` attribute, this will raise a `KeyError`. Other similar tests in the codebase (anthropic, langchain, openai_agents) use `.get("sentry.op")` for safe access. This could cause the test to fail with an unexpected error rather than correctly asserting there are no litellm chat spans.
Check warning on line 1332 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: find-bugs
[NUB-X4X] Direct dictionary access on span attributes may raise KeyError (additional location)
In `test_no_integration`, the code filters spans using direct dictionary access `x["attributes"]["sentry.op"]` instead of `.get()`. If any captured span lacks the `sentry.op` attribute, this will raise a `KeyError`. Other similar tests in the codebase (anthropic, langchain, openai_agents) use `.get("sentry.op")` for safe access. This could cause the test to fail with an unexpected error rather than correctly asserting there are no litellm chat spans.
Check warning on line 3542 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: find-bugs
Test assertion checks wrong key 'attributes' instead of 'data' for transaction contexts
The test at line 3540-3541 checks `transaction["contexts"]["trace"].get("attributes", {})` but transactions use `data` not `attributes` for trace context data. Other tests in the same file (lines 3341, 3479) correctly use `data`. This assertion will always pass because `attributes` doesn't exist on transactions, masking any actual bugs where `gen_ai.conversation.id` might incorrectly be set.
Check warning on line 496 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: find-bugs
Test does not validate model_behaviour_error event when handled_tool_call_exceptions=False
In `test_agent_with_tool_validation_error`, the original code extracted `model_behaviour_error` in both branches: when `handled_tool_call_exceptions=True` (with error) and when `handled_tool_call_exceptions=False` (alone). The refactored code only extracts events when `handled_tool_call_exceptions=True`, completely omitting any event validation for the `False` case. This means the test no longer verifies that `UnexpectedModelBehavior` is properly captured as an event when `handled_tool_call_exceptions=False`, reducing test coverage.
Check warning on line 833 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: find-bugs
test_message_history uses inconsistent span data format causing test to silently pass without validating anything
In test_message_history (lines 831-833), spans are extracted from the transaction using `second_transaction["spans"]` which returns spans in the old format (with `s["op"]` directly). However, the code then filters with `s["attributes"].get("sentry.op", "")`, expecting the new V2 envelope format. Since spans embedded in transactions don't have an `attributes` dict, `s["attributes"]` will raise a KeyError, or if attributes exists but lacks `sentry.op`, the filter returns no matches. The test then has conditional checks (`if chat_spans:` and `if "gen_ai.request.messages" in chat_span["attributes"]:`) that silently pass when no spans match, meaning the test validates nothing.