feat: Send GenAI spans as V2 envelope items #6079

7 issues

code-review: Found 7 issues (1 high, 5 medium, 1 low)

High

test_message_history accesses transaction-embedded spans with V2 envelope accessors - `tests/integrations/pydantic_ai/test_pydantic_ai.py:830-832`

At line 830, spans are retrieved from second_transaction["spans"] which returns spans in the transaction-embedded format (with op and data keys). However, at line 832, the code tries to filter these using s["attributes"].get("sentry.op", "") which is the V2 envelope span format. This mismatch will cause a KeyError or fail to find any matching spans since transaction-embedded spans don't have an attributes key. The test should either extract spans from capture_items("span") (like other tests in this file do at lines 83-86) or use the correct accessor s["op"] for transaction-embedded spans.

Medium

Uses raw `event` instead of processed `event_opt` for span conversion - `sentry_sdk/client.py:1130`

On line 1130, the code passes event to _serialized_v1_span_to_serialized_v2_span instead of event_opt. While _prepare_event typically modifies event in-place, using event_opt is semantically correct and consistent with the rest of this function which uses event_opt for the processed/prepared event. The _serialized_v1_span_to_serialized_v2_span function extracts metadata like user, release, environment from the event, and in edge cases where event and event_opt diverge, this could result in missing or incorrect attributes on the converted GenAI spans.

Also found at:

tests/integrations/langchain/test_langchain.py:1368

Test uses incorrect inline_data key 'attributes' instead of 'data' - `tests/integrations/google_genai/test_google_genai.py:2153`

The test changes the inline_data dict key from "data" to "attributes", but the production code in sentry_sdk/ai/utils.py:286 and sentry_sdk/integrations/google_genai/utils.py:378 expects the key to be "data". This test will not properly verify the inline_data handling since inline_data.get("data", "") will return an empty string instead of the binary data.

Also found at:

tests/integrations/openai_agents/test_openai_agents.py:3560-3562

Unused list comprehension makes test ineffective at validating error handling - `tests/integrations/langchain/test_langchain.py:1840-1844`

The list comprehension at lines 1840-1844 creates a list from items filtering for error events, but the result is never assigned to a variable or used in any assertion. Combined with the fact that capture_items("transaction", "span") doesn't capture events (only "transaction" and "span" types), this test named test_langchain_embeddings_error_handling doesn't actually validate that errors are properly captured.

Test does not capture span items but later asserts on them - `tests/integrations/litellm/test_litellm.py:945`

The capture_items("transaction") call at line 945 only captures transaction items, but the test later attempts to filter and assert on spans at lines 1020-1023 (item.type == "span"). Since spans are not being captured, the spans list will always be empty and the for loop assertion will never execute, making this test ineffective at validating span attributes. This was likely an oversight when converting from capture_events to capture_items.

Also found at:

tests/integrations/litellm/test_litellm.py:320
tests/integrations/huggingface_hub/test_huggingface_hub.py:523

Test may silently pass without verifying any assertions - `tests/integrations/pydantic_ai/test_pydantic_ai.py:988-995`

The test test_include_prompts_requires_pii iterates over chat_spans to verify that request messages are not captured when PII is disabled, but does not assert that chat_spans contains any elements. If no spans are captured (due to instrumentation bug or test setup issue), the for-loop never executes and the test passes without verifying anything. Other similar tests in this file include assert len(chat_spans) >= 1 before the verification loop.

Also found at:

tests/integrations/litellm/test_litellm.py:1118-1121

Low

Sorting key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`

The sorting lambda on line 330 was changed to use t.get("name", "") for both tuple elements instead of (name, description). This contradicts the comment on line 328 which states "sort by name and description for comparison". While this may still work for this specific test since the two tools have different names, it introduces inconsistency and could cause non-deterministic ordering if tools with the same name but different descriptions are added.

Also found at:

tests/integrations/langchain/test_langchain.py:939-940

Duration: 38m 21s · Tokens: 13.0M in / 169.5k out · Cost: $18.16 (+extraction: $0.01, +merge: $0.01, +fix_gate: $0.02)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

7 issues

High

Medium

Low

Annotations

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

Re-running checks...

feat: Send GenAI spans as V2 envelope items #6079

Are you sure you want to change the base?

Uh oh!

fix common tests

Uh oh!

feat: Send GenAI spans as V2 envelope items #6079

Uh oh!

7 issues

High

Medium

Low

Annotations

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

sentry-warden / warden: code-review

Re-running checks...