feat: Send GenAI spans as V2 envelope items #6079

7 issues

code-review: Found 7 issues (2 high, 4 medium, 1 low)

High

Wrong event object passed to V2 span conversion causes missing span attributes - `sentry_sdk/client.py:1133-1135`

At line 1134, event (the original event parameter) is passed to _serialized_v1_span_to_serialized_v2_span instead of event_opt (the prepared event). The _serialized_v1_span_to_serialized_v2_span function extracts attributes like sentry.release, sentry.environment, sentry.sdk.name, sentry.sdk.version, user info, and segment info from the event object. Since _prepare_event enriches the event with SDK info (lines 808-811) and other metadata, using the original event will result in V2 GenAI spans missing these critical attributes.

Also found at:

tests/integrations/google_genai/test_google_genai.py:330

Test accesses spans[0]["data"] but capture_items("span") produces "attributes" key - `tests/tracing/test_misc.py:628`

The test was updated to use capture_items("span") but still accesses spans[0]["data"]. The capture_items fixture transforms span items to have an attributes key (see conftest.py lines 361-367), not data. This will cause a KeyError at runtime. Other tests in the codebase using capture_items("span") correctly access span["attributes"] (e.g., test_google_genai.py).

Also found at:

tests/integrations/google_genai/test_google_genai.py:812-814
tests/integrations/pydantic_ai/test_pydantic_ai.py:832-833

Medium

Test uses incorrect field name 'attributes' instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`

The test change replaces "data" with "attributes" in the inline_data structure. The Google GenAI SDK uses data as the field name (see genai_types.Blob(data=..., mime_type=...) at line 1546 and other tests at lines 1765, 1806). The production code in sentry_sdk/ai/utils.py line 286 reads inline_data.get("data", ""), so this test input won't properly validate the data extraction logic since attributes is not a valid field.

Hardcoded SDK version will cause test failures on version bumps - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

The test_text_generation function hardcodes "sentry.sdk.version": "2.58.0" while all other tests in this file and across the codebase use mock.ANY for this field. This will cause test failures when the SDK version is incremented. The inconsistency appears to be an oversight since the adjacent test_text_generation_streaming function correctly uses mock.ANY.

Also found at:

tests/integrations/langchain/test_langchain.py:1368

Test captures only transactions but asserts on spans, causing false positive - `tests/integrations/litellm/test_litellm.py:945`

The capture_items("transaction") call on line 945 only captures transaction-type items, but the test later filters for span-type items on line 1020 (spans = [item.payload for item in items if item.type == "span"]). Since no spans are captured, the spans list will always be empty, and the for span in spans loop will never execute any assertions. This means the test passes regardless of whether spans actually have the expected SPANDATA.GEN_AI_SYSTEM attribute.

Also found at:

tests/integrations/openai_agents/test_openai_agents.py:3560-3562

Test missing validation when handled_tool_call_exceptions=False - `tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496`

The original test validated the model_behaviour_error event when handled_tool_call_exceptions=False, but the refactored code removed the else branch entirely. This means when the test runs with handled_tool_call_exceptions=False, no event validation occurs, reducing test coverage for that code path.

Also found at:

tests/integrations/pydantic_ai/test_pydantic_ai.py:493-494

Low

Unused list comprehension result in test_langchain_embeddings_error_handling - `tests/integrations/langchain/test_langchain.py:1840-1844`

The list comprehension at lines 1840-1844 builds a list of error events but the result is not assigned to any variable or used in any assertion. This appears to be dead code that was previously an assignment or assertion, leaving the test without any meaningful validation of error handling behavior.

Also found at:

tests/integrations/pydantic_ai/test_pydantic_ai.py:1402

Duration: 19m 33s · Tokens: 14.3M in / 173.8k out · Cost: $20.35 (+extraction: $0.02, +merge: $0.01, +fix_gate: $0.02)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

7 issues

High

Medium

Low

Annotations

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

Re-running checks...

feat: Send GenAI spans as V2 envelope items #6079

Are you sure you want to change the base?

Uh oh!

fix openai-agents tests

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

7 issues

High

Medium

Low

Annotations

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

Re-running checks...