feat: Send GenAI spans as V2 envelope items #6079

13 issues

find-bugs: Found 13 issues (9 medium, 4 low)

Medium

V2 GenAI spans use unprocessed event data instead of prepared event - `sentry_sdk/client.py:134-137`

On line 1134, _serialized_v1_span_to_serialized_v2_span(span, event) passes the original event instead of event_opt. The event_opt is the result of _prepare_event() which includes processing by before_send_transaction callback (lines 897-925). If before_send_transaction modifies user data, release, environment, SDK info, or trace context, those changes won't be reflected in the converted V2 GenAI spans, causing data inconsistency between the V1 transaction and V2 spans in the same envelope.

Also found at:

sentry_sdk/client.py:1134

Sorting key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`

The sorting lambda in test_generate_content_with_tools was changed from key=lambda t: (t.get("name", ""), t.get("description", "")) to key=lambda t: (t.get("name", ""), t.get("name", "")). This appears to be a copy-paste error where the second element of the tuple should be "description" to maintain the original sorting behavior. The test still passes because the two tools have different names ("get_weather" and "get_weather_tool"), but the sorting logic is now incorrect and could fail to properly order tools if they had the same name but different descriptions.

Test accesses spans[0] without filtering, unlike all other tests in the file - `tests/integrations/langchain/test_langchain.py:940`

In test_span_status_error, spans[0] is accessed directly without filtering by operation type (sentry.op). All other tests in this file filter spans before accessing them (e.g., chat_spans = list(x for x in spans if x['attributes']['sentry.op'] == 'gen_ai.chat')). This makes the test fragile - if span ordering changes or additional spans are emitted, the test may pass/fail unexpectedly. The assertion spans[0]['status'] == 'error' may be checking an unrelated span.

List comprehension result is unused - test doesn't verify error capture - `tests/integrations/langchain/test_langchain.py:1842-1846`

In test_langchain_embeddings_error_handling, a list comprehension filtering error events (lines 1842-1846) is computed but never assigned to a variable or used in an assertion. The comment says 'errors might not be auto-captured' but the test neither validates when errors ARE captured nor makes any assertion about the result. This makes the test ineffective at verifying error handling behavior.

test_langchain_embeddings_span_hierarchy mixes V1 and V2 span retrieval methods - `tests/integrations/langchain/test_langchain.py:1953-1954`

In test_langchain_embeddings_span_hierarchy, embeddings_spans are retrieved from V2 spans (items with type 'span') while custom_spans are retrieved from V1 transaction spans (tx.get('spans', [])). This inconsistency means the test compares spans from different data structures, which could lead to false positives/negatives or broken parent-child relationship verification when the feature flag changes span storage behavior.

Test assertion references transaction _meta for span that is now sent as V2 envelope item - `tests/integrations/langgraph/test_langgraph.py:1411`

The test was migrated to use V2 envelope items (capture_items) where GenAI spans are sent separately from the transaction. However, the assertion at line 1411 still checks tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"] on the transaction object. Since GenAI spans are extracted from the transaction and sent as separate V2 envelope items (per the _split_gen_ai_spans logic in client.py), the transaction's _meta["spans"] will not contain metadata for the GenAI span. This assertion will likely fail or check the wrong span.

test_multiple_providers never validates span assertions due to missing 'span' in capture_items - `tests/integrations/litellm/test_litellm.py:945`

The test calls capture_items("transaction") at line 945 which only captures items of type 'transaction'. However, later at line 1020, the test tries to filter for spans with if item.type == "span". Since spans are never captured, the spans list will always be empty, and the for loop assertion at lines 1021-1023 never executes. This makes the test appear to pass while not actually validating that SPANDATA.GEN_AI_SYSTEM is present in span attributes.

Also found at:

tests/integrations/litellm/test_litellm.py:1020-1023

Inconsistent attribute key lookup for transaction trace context - `tests/integrations/openai_agents/test_openai_agents.py:3540-3542`

The test at line 3540-3541 checks transaction["contexts"]["trace"].get("attributes", {}) but the related positive assertion at line 3479-3480 uses transaction["contexts"]["trace"]["data"]. This inconsistency means the negative assertion at line 3540-3541 may not be testing the correct location. If the conversation_id would actually be stored under ["data"] (as the positive test expects), this test would pass even if conversation_id IS incorrectly present, because it's checking the wrong key.

Test accesses V2 span format attributes on old-format embedded spans - `tests/integrations/pydantic_ai/test_pydantic_ai.py:832-834`

In test_message_history, the code retrieves spans from second_transaction["spans"] (line 831) which returns spans in the old format with op at the top level and data for additional data. However, the filtering at line 833 uses s["attributes"].get("sentry.op", "") which is the new V2 envelope span format. Since embedded spans don't have an attributes key, this will raise a KeyError or fail to find any matching spans.

Low

Internal flag _has_gen_ai_span leaks to server in transaction payload - `sentry_sdk/tracing.py:1087`

The _has_gen_ai_span flag set at line 1087 will be included in the serialized transaction sent to Sentry's server. In client.py, event.pop('_has_gen_ai_span', False) at line 1119 removes it from the original event dict, but event_opt (the serialized copy) already contains this field since serialization happens before the pop. Compare with _dropped_spans which is correctly popped inside _prepare_event before serialization. This wastes bandwidth and exposes internal SDK implementation details to the server.

Hardcoded SDK version in test assertion will cause test failures on version bumps - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

Line 523 uses a hardcoded version string "2.58.0" for sentry.sdk.version assertion, while all other tests in the same file (lines 599, 676, 753, 828, 942, 1038) correctly use mock.ANY. This test will fail when the SDK version is updated.

test_langchain_embeddings_error_handling is missing gen_ai_as_v2_spans experiment flag - `tests/integrations/langchain/test_langchain.py:1823`

The test test_langchain_embeddings_error_handling uses capture_items('transaction', 'span') and accesses span['attributes'] (V2 format), but doesn't enable the _experiments={'gen_ai_as_v2_spans': True} flag in sentry_init. Other tests that don't use the integration (line 1728) set this flag. Without the flag, the V2 span format may not be generated, causing the test to behave unexpectedly.

Ineffective test assertion: V2 spans have no 'tags' field - `tests/integrations/openai_agents/test_openai_agents.py:1966`

Line 1966 asserts mcp_tool_span.get("tags", {}).get("status") != "error" but V2 spans don't have a tags field - tags are merged into attributes during V1-to-V2 conversion (see _serialized_v1_span_to_serialized_v2_span in client.py lines 216-218). This assertion always passes because get("tags", {}) returns {} and {}.get("status") returns None, which is never equal to "error". The test claim 'Verify no error status' is only partially fulfilled by line 1965.

Duration: 24m 59s · Tokens: 20.3M in / 225.6k out · Cost: $28.34 (+extraction: $0.03, +merge: $0.01, +fix_gate: $0.02)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

13 issues

Medium

Low

Annotations

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

Re-running checks...

feat: Send GenAI spans as V2 envelope items #6079

Are you sure you want to change the base?

Uh oh!

add constant again

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

13 issues

Medium

Low

Annotations

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

sentry-warden / warden: find-bugs

Re-running checks...