feat: Send GenAI spans as V2 envelope items #6079

11 issues

High

tx["_meta"]["spans"] will cause KeyError - spans are no longer nested in transaction - `tests/integrations/langchain/test_langchain.py:1370`

Line 1370 accesses tx["_meta"]["spans"]["0"]["data"] to verify message truncation metadata. However, the test was migrated to use V2 envelope items where spans are separate items (accessed via [item.payload for item in items if item.type == "span"]) rather than being nested within the transaction. Since spans are no longer embedded in tx, the path tx["_meta"]["spans"] will raise a KeyError at runtime, causing this test to fail.

Also found at:

tests/integrations/openai_agents/test_openai_agents.py:3039

GenAI V2 spans missing metadata because wrong event object passed to converter - `sentry_sdk/client.py:1138`

At line 1138, _serialized_v1_span_to_serialized_v2_span is called with event (the original, unprepared event dict) instead of event_opt (the prepared event with release, environment, SDK info, and user data populated by _prepare_event). The converter function reads event.get("release"), event.get("environment"), event.get("sdk"), event.get("user"), and event.get("contexts") to populate V2 span attributes like sentry.release, sentry.environment, sentry.sdk.name, etc. Since the original event may not have these fields populated, the converted GenAI spans will be missing critical tracing metadata.

Also found at:

tests/integrations/langchain/test_langchain.py:1370
tests/integrations/openai/test_openai.py:3767-3769

Medium

V2 span conversion uses unprocessed event instead of processed event_opt - `sentry_sdk/client.py:1138`

At line 1138, _serialized_v1_span_to_serialized_v2_span(span, event) passes the original event parameter instead of event_opt. The _serialized_v1_span_to_serialized_v2_span function reads user, release, environment, transaction, trace context, and SDK info from the event (lines 224-251 in client.py). These fields are populated during _prepare_event via scope.apply_to_event, which can return a different object via event processors (lines 1817-1823 in scope.py). When event processors return a new object, V2 spans will be missing user, release, environment, and other scope-applied attributes.

Also found at:

tests/integrations/litellm/test_litellm.py:945
tests/integrations/huggingface_hub/test_huggingface_hub.py:710

List comprehension result is discarded without any assertion or assignment - `tests/integrations/langchain/test_langchain.py:1842-1846`

The list comprehension at lines 1842-1846 computes error events but does not assign the result to a variable or make any assertions on it. This makes the test ineffective - it doesn't actually verify that errors are captured or handled correctly. Similar tests in litellm/test_litellm.py assign the result to error_events and assert on its length.

Also found at:

tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496

Test assertion uses wrong key 'attributes' instead of 'data' for transaction contexts - `tests/integrations/openai_agents/test_openai_agents.py:3540-3542`

The test at line 3540 checks for gen_ai.conversation.id in transaction["contexts"]["trace"].get("attributes", {}), but transactions store this data under data, not attributes. Looking at the capture_items fixture in conftest.py (lines 361-370), only span items have their attributes transformed; transactions use the raw payload which contains data. Other assertions in the same file (lines 3341 and 3479) correctly use data for transaction context access.

Also found at:

tests/integrations/openai_agents/test_openai_agents.py:2257
tests/integrations/langchain/test_langchain.py:1953-1954

test_message_history accesses transaction-nested spans using wrong attribute format, causing silent test pass - `tests/integrations/pydantic_ai/test_pydantic_ai.py:831-841`

At line 831, spans are extracted from second_transaction["spans"] and then filtered using s["attributes"].get("sentry.op", ""). However, spans nested inside transaction payloads use the legacy format with s["op"], not s["attributes"]["sentry.op"]. This mismatch causes the filter to find zero chat_spans, and since the test uses if chat_spans: at line 836, it silently passes without verifying the message history feature.

Also found at:

tests/integrations/langchain/test_langchain.py:262

test_async_exception_handling patches embeddings client instead of completions client - `tests/integrations/litellm/test_litellm.py:866-868`

The test test_async_exception_handling patches client.embeddings._client._client.send but calls litellm.acompletion() which uses the completions API. This mismatch means the mock won't intercept the actual request, potentially causing the test to either fail or pass for the wrong reasons. The sync version test_exception_handling correctly patches client.completions._client._client.send.

test_multiple_providers never validates span attributes due to missing span capture - `tests/integrations/litellm/test_litellm.py:945`

The test_multiple_providers function calls capture_items("transaction") at line 945, but later attempts to assert on span attributes at lines 1020-1023. Since spans are not captured, the spans list will always be empty and the for span in spans: loop will never execute, making the assertion assert SPANDATA.GEN_AI_SYSTEM in span["attributes"] ineffective. This could allow bugs where the GenAI system attribute is missing to go undetected.

Also found at:

tests/integrations/litellm/test_litellm.py:1020-1023
tests/integrations/google_genai/test_google_genai.py:330-331

Test checks wrong key 'attributes' instead of 'data' for transaction context - `tests/integrations/openai_agents/test_openai_agents.py:3540-3542`

The test for verifying conversation_id absence uses transaction["contexts"]["trace"].get("attributes", {}) at line 3540, but transactions use data not attributes for trace context data. The capture_items fixture only transforms attributes for 'metric', 'log', and 'span' types (conftest.py line 361-367), while transactions are passed through unchanged (line 369-370). Other tests in this file correctly use ["data"] (lines 3341, 3479). This will cause the assertion to always pass regardless of whether conversation_id is incorrectly set, defeating the purpose of the test.

Also found at:

tests/integrations/pydantic_ai/test_pydantic_ai.py:831-833
tests/integrations/openai_agents/test_openai_agents.py:2257
tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496

Low

Sorting key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`

The sorting lambda at line 330 uses t.get("name", "") twice instead of sorting by name and description as the comment on line 328 states. The original code used (t.get("name", ""), t.get("description", "")). While this doesn't cause a runtime error since the tool names are unique, it makes the second tuple element redundant and contradicts the documented intent.

Also found at:

tests/integrations/pydantic_ai/test_pydantic_ai.py:493

...and 1 more

4 skills analyzed

Skill	Findings	Duration	Cost
code-review	6	21m 41s	$22.25
find-bugs	5	45m 18s	$31.93
skill-scanner	0	42m 19s	$7.28
security-review	0	49m 43s	$5.88

Duration: 159m · Tokens: 45.6M in / 524.6k out · Cost: $67.46 (+extraction: $0.07, +merge: $0.01, +fix_gate: $0.03, +dedup: $0.03)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

11 issues

High

Medium

Low

Re-running checks...

feat: Send GenAI spans as V2 envelope items #6079

Are you sure you want to change the base?

Uh oh!

update test with hardcoded version

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

11 issues

High

Medium

Low

Re-running checks...