feat: Send GenAI spans as V2 envelope items #6079
9 issues
code-review: Found 9 issues (1 high, 5 medium, 3 low)
High
Test assertion references old transaction span structure after migration to V2 envelope items - `tests/integrations/langchain/test_langchain.py:1370`
Line 1370 asserts tx["_meta"]["spans"]["0"]["data"] but the test was migrated to use capture_items() where spans are now separate envelope items rather than embedded in the transaction. With V2 envelope items, the transaction's _meta.spans structure may not exist or be populated differently, causing this assertion to fail with a KeyError at runtime.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:831-834
Medium
Using wrong event variable causes missing attributes in V2 span conversion - `sentry_sdk/client.py:1138`
Line 1138 passes event to _serialized_v1_span_to_serialized_v2_span, but should use event_opt. The _prepare_event function enriches the event with release, environment, and other fields (lines 815-817 in client.py), and these enriched values exist only in event_opt. The V2 span conversion function reads these fields to populate attributes like sentry.release, sentry.environment, and sentry.segment.name. Using event instead of event_opt will result in missing or incorrect attributes in converted GenAI spans.
Also found at:
tests/integrations/openai_agents/test_openai_agents.py:3540-3541
Hardcoded SDK version in test will break on version bump - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`
The test uses a hardcoded SDK version "sentry.sdk.version": "2.58.0" in the expected_data dictionary for test_text_generation, while all other tests in this file (e.g., test_text_generation_streaming, test_chat_completion) and across the codebase use mock.ANY for this field. This will cause the test to fail when the SDK version is updated.
Test captures only 'transaction' but asserts on 'span' items, causing test to pass vacuously - `tests/integrations/litellm/test_litellm.py:945`
The test_multiple_providers function calls capture_items("transaction") at line 945, which filters to only capture transaction items. However, lines 1020-1023 (outside the hunk but dependent on this change) attempt to iterate over span items with [item.payload for item in items if item.type == "span"]. Since spans were never captured, this list will always be empty, causing the for-loop assertion to pass vacuously without testing anything. The async version test_async_multiple_providers correctly uses capture_items("transaction", "span").
Also found at:
tests/integrations/litellm/test_litellm.py:1020-1023
Test coverage regression: model_behaviour_error event not verified when handled_tool_call_exceptions=False - `tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496`
The refactored test_agent_with_tool_validation_error test no longer verifies the model_behaviour_error event when handled_tool_call_exceptions=False. The original code unpacked both model_behaviour_error and transaction from events in the else branch, but the new code completely skips event validation in that path. This means the test no longer verifies that the UnexpectedModelBehavior exception is properly captured as an event when tool call exceptions are not handled.
Test can silently pass without verifying binary content capture - `tests/integrations/pydantic_ai/test_pydantic_ai.py:2889-2891`
The test test_binary_content_in_agent_run has a conditional if "gen_ai.request.messages" in chat_span["attributes"] around its key assertion (line 2889-2891). If the attribute is missing, the test passes without verifying that binary content is properly captured in spans, which defeats the purpose stated in the docstring. The assertion should be unconditional, or the test should explicitly verify that the attribute exists first.
Low
Incorrect sort key: uses name twice instead of name and description - `tests/integrations/google_genai/test_google_genai.py:330`
The sort key lambda uses t.get("name", "") twice instead of sorting by name and description as the comment indicates. This appears to be a copy-paste error introduced in the diff - the second key should be t.get("description", ""). The test may still pass since names happen to be unique, but the code doesn't match its documented intent.
Unused list comprehension for error event filtering - `tests/integrations/langchain/test_langchain.py:1842-1846`
The list comprehension at lines 1842-1846 filters for item.type == "event" but capture_items is called with only "transaction", "span" types at line 1823, so no events will ever be captured. The result of this comprehension is also not assigned to any variable or used in assertions, making it dead code.
Removed assertion reduces test failure diagnostics - `tests/integrations/openai_agents/test_openai_agents.py:2257`
The original code asserted assert len(events) == 3 before unpacking into txn1, txn2, txn3, but this assertion was removed. While Python will still raise a ValueError on unpacking mismatch, the explicit assertion provided clearer test failure messaging indicating exactly how many transactions were received. This makes debugging test failures harder when the number of transactions doesn't match expectations.
Duration: 18m 54s · Tokens: 14.4M in / 183.1k out · Cost: $20.42 (+extraction: $0.02, +merge: $0.00, +fix_gate: $0.02)
Annotations
Check failure on line 1370 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
Test assertion references old transaction span structure after migration to V2 envelope items
Line 1370 asserts `tx["_meta"]["spans"]["0"]["data"]` but the test was migrated to use `capture_items()` where spans are now separate envelope items rather than embedded in the transaction. With V2 envelope items, the transaction's `_meta.spans` structure may not exist or be populated differently, causing this assertion to fail with a KeyError at runtime.
Check failure on line 834 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
[6UU-6VK] Test assertion references old transaction span structure after migration to V2 envelope items (additional location)
Line 1370 asserts `tx["_meta"]["spans"]["0"]["data"]` but the test was migrated to use `capture_items()` where spans are now separate envelope items rather than embedded in the transaction. With V2 envelope items, the transaction's `_meta.spans` structure may not exist or be populated differently, causing this assertion to fail with a KeyError at runtime.
Check warning on line 1138 in sentry_sdk/client.py
sentry-warden / warden: code-review
Using wrong event variable causes missing attributes in V2 span conversion
Line 1138 passes `event` to `_serialized_v1_span_to_serialized_v2_span`, but should use `event_opt`. The `_prepare_event` function enriches the event with `release`, `environment`, and other fields (lines 815-817 in client.py), and these enriched values exist only in `event_opt`. The V2 span conversion function reads these fields to populate attributes like `sentry.release`, `sentry.environment`, and `sentry.segment.name`. Using `event` instead of `event_opt` will result in missing or incorrect attributes in converted GenAI spans.
Check warning on line 3541 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
[GBE-J9T] Using wrong event variable causes missing attributes in V2 span conversion (additional location)
Line 1138 passes `event` to `_serialized_v1_span_to_serialized_v2_span`, but should use `event_opt`. The `_prepare_event` function enriches the event with `release`, `environment`, and other fields (lines 815-817 in client.py), and these enriched values exist only in `event_opt`. The V2 span conversion function reads these fields to populate attributes like `sentry.release`, `sentry.environment`, and `sentry.segment.name`. Using `event` instead of `event_opt` will result in missing or incorrect attributes in converted GenAI spans.
Check warning on line 523 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
Hardcoded SDK version in test will break on version bump
The test uses a hardcoded SDK version `"sentry.sdk.version": "2.58.0"` in the `expected_data` dictionary for `test_text_generation`, while all other tests in this file (e.g., `test_text_generation_streaming`, `test_chat_completion`) and across the codebase use `mock.ANY` for this field. This will cause the test to fail when the SDK version is updated.
Check warning on line 945 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
Test captures only 'transaction' but asserts on 'span' items, causing test to pass vacuously
The `test_multiple_providers` function calls `capture_items("transaction")` at line 945, which filters to only capture transaction items. However, lines 1020-1023 (outside the hunk but dependent on this change) attempt to iterate over span items with `[item.payload for item in items if item.type == "span"]`. Since spans were never captured, this list will always be empty, causing the for-loop assertion to pass vacuously without testing anything. The async version `test_async_multiple_providers` correctly uses `capture_items("transaction", "span")`.
Check warning on line 1023 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[8AZ-P52] Test captures only 'transaction' but asserts on 'span' items, causing test to pass vacuously (additional location)
The `test_multiple_providers` function calls `capture_items("transaction")` at line 945, which filters to only capture transaction items. However, lines 1020-1023 (outside the hunk but dependent on this change) attempt to iterate over span items with `[item.payload for item in items if item.type == "span"]`. Since spans were never captured, this list will always be empty, causing the for-loop assertion to pass vacuously without testing anything. The async version `test_async_multiple_providers` correctly uses `capture_items("transaction", "span")`.
Check warning on line 496 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
Test coverage regression: model_behaviour_error event not verified when handled_tool_call_exceptions=False
The refactored `test_agent_with_tool_validation_error` test no longer verifies the `model_behaviour_error` event when `handled_tool_call_exceptions=False`. The original code unpacked both `model_behaviour_error` and `transaction` from events in the else branch, but the new code completely skips event validation in that path. This means the test no longer verifies that the `UnexpectedModelBehavior` exception is properly captured as an event when tool call exceptions are not handled.
Check warning on line 2891 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
Test can silently pass without verifying binary content capture
The test `test_binary_content_in_agent_run` has a conditional `if "gen_ai.request.messages" in chat_span["attributes"]` around its key assertion (line 2889-2891). If the attribute is missing, the test passes without verifying that binary content is properly captured in spans, which defeats the purpose stated in the docstring. The assertion should be unconditional, or the test should explicitly verify that the attribute exists first.