feat: Send GenAI spans as V2 envelope items #6079

7 issues

code-review: Found 7 issues (2 high, 4 medium, 1 low)

High

Span assertions never execute in test_multiple_providers due to missing span capture - `tests/integrations/litellm/test_litellm.py:1020-1023`

The sync test_multiple_providers function calls capture_items("transaction") at line 945, but the new code at line 1020 filters for item.type == "span". Since spans are never captured, the spans list will always be empty, causing the for-loop to never execute and the SPANDATA.GEN_AI_SYSTEM assertion to be silently skipped. The async version was correctly updated to capture_items("transaction", "span") at line 1040, but the sync version was missed.

Also found at:

tests/integrations/litellm/test_litellm.py:945

test_message_history mixes incompatible span formats causing test to silently pass without validation - `tests/integrations/pydantic_ai/test_pydantic_ai.py:830-840`

In test_message_history, spans are extracted from second_transaction["spans"] (line 830, old format with op field), but then filtered using s["attributes"].get("sentry.op", "") (lines 831-833, new V2 format). Since spans from transaction["spans"] use span["op"] not span["attributes"]["sentry.op"], the filter will never match any spans, making chat_spans empty. The subsequent assertions are wrapped in if chat_spans: and if "gen_ai.request.messages" in chat_span["attributes"]:, so the test silently passes without actually verifying anything.

Also found at:

tests/integrations/openai_agents/test_openai_agents.py:3560-3562

Medium

New span conversion functions lack test coverage - `sentry_sdk/client.py:102-298`

The new functions _serialized_v1_attribute_to_serialized_v2_attribute, _serialized_v1_span_to_serialized_v2_span, and _split_gen_ai_spans are not covered by dedicated unit tests. While the integration tests in the PR may exercise these code paths indirectly, direct unit tests would help verify edge cases like None values, empty lists, heterogeneous arrays, and invalid timestamp formats.

Also found at:

sentry_sdk/client.py:1130

Test uses incorrect key 'attributes' instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`

The test data structure uses {"inline_data": {"attributes": b"binary_data", ...}} but Google GenAI's inline_data format uses data not attributes. The production code in sentry_sdk/ai/utils.py:286 retrieves inline_data.get("data", ""). Other tests in this same file (lines 1765, 1806) correctly use {"inline_data": {"data": ..., "mime_type": ...}}. This test will fail to properly exercise the code path for extracting blob content.

Also found at:

tests/integrations/google_genai/test_google_genai.py:330

Hardcoded SDK version will cause test failure when version changes - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

The test test_text_generation hardcodes "sentry.sdk.version": "2.58.0" in expected_data, while the other tests in this file (test_text_generation_streaming, test_chat_completion) correctly use mock.ANY. This will cause the test to fail when the SDK version is bumped, introducing test fragility.

Unused list comprehension results in no actual test validation - `tests/integrations/langchain/test_langchain.py:1840-1844`

The list comprehension on lines 1840-1844 creates a list of error events but doesn't assign it to any variable, making it dead code with no effect. The original code also had this issue (assigning to [e for e in events...] without using e), but this refactoring maintains the bug. This means test_langchain_embeddings_error_handling doesn't actually validate that errors are captured properly.

Low

Test assertions may be silently skipped if no chat_spans are found - `tests/integrations/pydantic_ai/test_pydantic_ai.py:993-995`

In test_include_prompts_requires_pii, the test loops over chat_spans to verify assertions, but doesn't verify that chat_spans has at least one element. If no chat spans are generated due to a bug, the test would pass without actually testing anything. Consider adding assert len(chat_spans) >= 1 before the loop to ensure the test is actually validating the expected behavior.

Duration: 20m 35s · Tokens: 13.5M in / 167.6k out · Cost: $20.22 (+extraction: $0.01, +merge: $0.00, +fix_gate: $0.01)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

7 issues

High

Medium

Low

Annotations

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

Re-running checks...

feat: Send GenAI spans as V2 envelope items #6079

Are you sure you want to change the base?

Uh oh!

common tests

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

7 issues

High

Medium

Low

Annotations

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

Re-running checks...