feat: Send GenAI spans as V2 envelope items #6079

10 issues

find-bugs: Found 10 issues (6 medium, 4 low)

Medium

GenAI V2 spans missing release, environment, and SDK metadata when event processors return new objects - `sentry_sdk/client.py:243`

At line 1138, _serialized_v1_span_to_serialized_v2_span(span, event) passes the original event object instead of event_opt. The issue is that release, environment, and sdk metadata are added to the LOCAL event variable inside _prepare_event (lines 815-821 in the full file), which may be a different object from the original event if scope.apply_to_event or any event processor returned a new dict. This results in V2 spans missing sentry.release, sentry.environment, sentry.sdk.name, and sentry.sdk.version attributes.

Also found at:

sentry_sdk/client.py:1138

test_multiple_providers silently passes without testing spans due to missing span capture - `tests/integrations/litellm/test_litellm.py:945`

On line 945, capture_items("transaction") only captures transaction items, but lines 1020-1023 try to filter for and iterate over "span" items from items. Since no spans are captured, spans will always be an empty list, causing the for-loop assertions to pass vacuously without actually testing that spans have the expected SPANDATA.GEN_AI_SYSTEM attribute. The async version test_async_multiple_providers correctly uses capture_items("transaction", "span").

Also found at:

tests/integrations/litellm/test_litellm.py:1020-1023

Direct dictionary access on span attributes may raise KeyError - `tests/integrations/litellm/test_litellm.py:1283-1285`

In test_no_integration, the code filters spans using direct dictionary access x["attributes"]["sentry.op"] instead of .get(). If any captured span lacks the sentry.op attribute, this will raise a KeyError. Other similar tests in the codebase (anthropic, langchain, openai_agents) use .get("sentry.op") for safe access. This could cause the test to fail with an unexpected error rather than correctly asserting there are no litellm chat spans.

Also found at:

tests/integrations/litellm/test_litellm.py:1330-1332

Test assertion checks wrong key 'attributes' instead of 'data' for transaction contexts - `tests/integrations/openai_agents/test_openai_agents.py:3540-3542`

The test at line 3540-3541 checks transaction["contexts"]["trace"].get("attributes", {}) but transactions use data not attributes for trace context data. Other tests in the same file (lines 3341, 3479) correctly use data. This assertion will always pass because attributes doesn't exist on transactions, masking any actual bugs where gen_ai.conversation.id might incorrectly be set.

Test does not validate model_behaviour_error event when handled_tool_call_exceptions=False - `tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496`

In test_agent_with_tool_validation_error, the original code extracted model_behaviour_error in both branches: when handled_tool_call_exceptions=True (with error) and when handled_tool_call_exceptions=False (alone). The refactored code only extracts events when handled_tool_call_exceptions=True, completely omitting any event validation for the False case. This means the test no longer verifies that UnexpectedModelBehavior is properly captured as an event when handled_tool_call_exceptions=False, reducing test coverage.

test_message_history uses inconsistent span data format causing test to silently pass without validating anything - `tests/integrations/pydantic_ai/test_pydantic_ai.py:831-833`

In test_message_history (lines 831-833), spans are extracted from the transaction using second_transaction["spans"] which returns spans in the old format (with s["op"] directly). However, the code then filters with s["attributes"].get("sentry.op", ""), expecting the new V2 envelope format. Since spans embedded in transactions don't have an attributes dict, s["attributes"] will raise a KeyError, or if attributes exists but lacks sentry.op, the filter returns no matches. The test then has conditional checks (if chat_spans: and if "gen_ai.request.messages" in chat_span["attributes"]:) that silently pass when no spans match, meaning the test validates nothing.

Low

Sorting key incorrectly uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`

The sorting lambda on line 330 was changed from (t.get("name", ""), t.get("description", "")) to (t.get("name", ""), t.get("name", "")). This introduces a copy-paste error where the second key uses "name" instead of "description". The comment on line 328 states the intent is to "sort by name and description for comparison", contradicting the implementation. While the test still passes because tool names happen to be unique, the logic is inconsistent with stated intent.

Hardcoded SDK version will cause test failure when version changes - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

In test_text_generation, the expected sentry.sdk.version attribute is hardcoded to "2.58.0" instead of using mock.ANY like all other tests in this file and across the codebase. When the SDK version is bumped, this test will fail with an assertion error comparing the expected hardcoded version against the actual current version.

capture_items filter mismatch - events not captured but filtered for - `tests/integrations/langchain/test_langchain.py:1823`

Line 1823 calls capture_items("transaction", "span") which only captures transactions and spans. However, lines 1842-1846 attempt to filter by item.type == "event", which will always produce an empty list since events are not being captured. If the test intends to verify error events, it needs to include "event" in the capture_items call.

Also found at:

tests/integrations/langchain/test_langchain.py:1842-1846

test_langchain_embeddings_span_hierarchy does not verify actual hierarchy relationship - `tests/integrations/langchain/test_langchain.py:1956-1964`

The test docstring states it verifies "embeddings spans are properly nested within parent spans" but only asserts that both spans exist. There is no assertion checking the parent-child relationship (e.g., comparing parent_span_id). The test passes as long as both spans are created, even if the hierarchy is wrong.

Duration: 38m 35s · Tokens: 23.0M in / 228.7k out · Cost: $31.68 (+extraction: $0.03, +merge: $0.01, +fix_gate: $0.02)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

10 issues

Medium

Low

Annotations

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

Re-running checks...

feat: Send GenAI spans as V2 envelope items #6079

Are you sure you want to change the base?

Uh oh!

add name fallback

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

10 issues

Medium

Low

Annotations

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

Re-running checks...