feat: Send GenAI spans as V2 envelope items #6079

9 issues

code-review: Found 9 issues (1 high, 5 medium, 3 low)

High

Test assertion references old transaction span structure after migration to V2 envelope items - `tests/integrations/langchain/test_langchain.py:1370`

Line 1370 asserts tx["_meta"]["spans"]["0"]["data"] but the test was migrated to use capture_items() where spans are now separate envelope items rather than embedded in the transaction. With V2 envelope items, the transaction's _meta.spans structure may not exist or be populated differently, causing this assertion to fail with a KeyError at runtime.

Also found at:

tests/integrations/pydantic_ai/test_pydantic_ai.py:831-834

Medium

Using wrong event variable causes missing attributes in V2 span conversion - `sentry_sdk/client.py:1138`

Line 1138 passes event to _serialized_v1_span_to_serialized_v2_span, but should use event_opt. The _prepare_event function enriches the event with release, environment, and other fields (lines 815-817 in client.py), and these enriched values exist only in event_opt. The V2 span conversion function reads these fields to populate attributes like sentry.release, sentry.environment, and sentry.segment.name. Using event instead of event_opt will result in missing or incorrect attributes in converted GenAI spans.

Also found at:

tests/integrations/openai_agents/test_openai_agents.py:3540-3541

Hardcoded SDK version in test will break on version bump - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

The test uses a hardcoded SDK version "sentry.sdk.version": "2.58.0" in the expected_data dictionary for test_text_generation, while all other tests in this file (e.g., test_text_generation_streaming, test_chat_completion) and across the codebase use mock.ANY for this field. This will cause the test to fail when the SDK version is updated.

Test captures only 'transaction' but asserts on 'span' items, causing test to pass vacuously - `tests/integrations/litellm/test_litellm.py:945`

The test_multiple_providers function calls capture_items("transaction") at line 945, which filters to only capture transaction items. However, lines 1020-1023 (outside the hunk but dependent on this change) attempt to iterate over span items with [item.payload for item in items if item.type == "span"]. Since spans were never captured, this list will always be empty, causing the for-loop assertion to pass vacuously without testing anything. The async version test_async_multiple_providers correctly uses capture_items("transaction", "span").

Also found at:

tests/integrations/litellm/test_litellm.py:1020-1023

Test coverage regression: model_behaviour_error event not verified when handled_tool_call_exceptions=False - `tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496`

The refactored test_agent_with_tool_validation_error test no longer verifies the model_behaviour_error event when handled_tool_call_exceptions=False. The original code unpacked both model_behaviour_error and transaction from events in the else branch, but the new code completely skips event validation in that path. This means the test no longer verifies that the UnexpectedModelBehavior exception is properly captured as an event when tool call exceptions are not handled.

Test can silently pass without verifying binary content capture - `tests/integrations/pydantic_ai/test_pydantic_ai.py:2889-2891`

The test test_binary_content_in_agent_run has a conditional if "gen_ai.request.messages" in chat_span["attributes"] around its key assertion (line 2889-2891). If the attribute is missing, the test passes without verifying that binary content is properly captured in spans, which defeats the purpose stated in the docstring. The assertion should be unconditional, or the test should explicitly verify that the attribute exists first.

Low

Incorrect sort key: uses name twice instead of name and description - `tests/integrations/google_genai/test_google_genai.py:330`

The sort key lambda uses t.get("name", "") twice instead of sorting by name and description as the comment indicates. This appears to be a copy-paste error introduced in the diff - the second key should be t.get("description", ""). The test may still pass since names happen to be unique, but the code doesn't match its documented intent.

Unused list comprehension for error event filtering - `tests/integrations/langchain/test_langchain.py:1842-1846`

The list comprehension at lines 1842-1846 filters for item.type == "event" but capture_items is called with only "transaction", "span" types at line 1823, so no events will ever be captured. The result of this comprehension is also not assigned to any variable or used in assertions, making it dead code.

Removed assertion reduces test failure diagnostics - `tests/integrations/openai_agents/test_openai_agents.py:2257`

The original code asserted assert len(events) == 3 before unpacking into txn1, txn2, txn3, but this assertion was removed. While Python will still raise a ValueError on unpacking mismatch, the explicit assertion provided clearer test failure messaging indicating exactly how many transactions were received. This makes debugging test failures harder when the number of transactions doesn't match expectations.

Duration: 18m 54s · Tokens: 14.4M in / 183.1k out · Cost: $20.42 (+extraction: $0.02, +merge: $0.00, +fix_gate: $0.02)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

9 issues

High

Medium

Low

Annotations

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

Re-running checks...

feat: Send GenAI spans as V2 envelope items #6079

Are you sure you want to change the base?

Uh oh!

add name fallback

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

9 issues

High

Medium

Low

Annotations

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

Re-running checks...