feat: Send GenAI spans as V2 envelope items #6079
6 issues
code-review: Found 6 issues (6 medium)
Medium
GenAI spans lose release, environment, and SDK metadata due to passing unprepared event - `sentry_sdk/client.py:1134`
On line 1134, event (the original input) is passed to _serialized_v1_span_to_serialized_v2_span instead of event_opt (the prepared event). The _prepare_event method enriches the event with release, environment, server_name, dist, and sdk info (lines 811-817), but _serialized_v1_span_to_serialized_v2_span reads these fields (lines 231-247) from the passed event to populate attributes like sentry.release, sentry.environment, sentry.sdk.name, and sentry.sdk.version. As a result, GenAI v2 spans will be missing these critical metadata attributes.
Also found at:
tests/integrations/huggingface_hub/test_huggingface_hub.py:521
Sort key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`
The sorting lambda was changed from sorting by (name, description) to (name, name), but the comment says 'sort by name and description for comparison'. This makes the sort key effectively single-field, which could lead to non-deterministic ordering when multiple tools have the same name but different descriptions, causing flaky test failures.
Also found at:
tests/integrations/litellm/test_litellm.py:945
Mock patches wrong client attribute in test_async_exception_handling - `tests/integrations/litellm/test_litellm.py:866-868`
The test mocks client.embeddings._client._client.send but calls litellm.acompletion which uses the completions endpoint. The synchronous test_exception_handling correctly mocks client.completions._client._client.send. This mismatch means the mock may not intercept the request, causing the test to not properly verify exception handling behavior.
Also found at:
tests/integrations/openai_agents/test_openai_agents.py:1966
Test assertions never execute because spans list is always empty - `tests/integrations/litellm/test_litellm.py:1020-1023`
The test_multiple_providers function calls capture_items("transaction") on line 945 which only captures transaction items, not span items. The newly added code on lines 1020-1023 tries to iterate over spans with spans = [item.payload for item in items if item.type == "span"], but this will always be empty since span items are not being captured. The for loop never executes, making the SPANDATA.GEN_AI_SYSTEM assertion dead code that doesn't validate anything.
Also found at:
tests/integrations/litellm/test_litellm.py:1279
Test checks wrong field 'attributes' instead of 'data' for transaction context - `tests/integrations/openai_agents/test_openai_agents.py:3540-3542`
The test checks transaction["contexts"]["trace"].get("attributes", {}) but should check transaction["contexts"]["trace"].get("data", {}). Earlier in the same file (lines 3341, 3479), the transaction context uses data to store attributes like gen_ai.conversation.id. This inconsistency means the test may pass regardless of whether gen_ai.conversation.id is actually absent from the transaction, since it's checking the wrong field.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:831-834
Missing event validation when handled_tool_call_exceptions is False - `tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496`
In test_agent_with_tool_validation_error, when handled_tool_call_exceptions=False, the test no longer validates that a model_behaviour_error event is captured. The original code had an else branch that unpacked (model_behaviour_error, transaction) = events to verify the expected events were present. Without this validation, if the integration fails to emit the expected error event when the flag is False, the test won't catch it.
Also found at:
tests/integrations/langchain/test_langchain.py:1842-1846
Duration: 40m 58s · Tokens: 16.2M in / 192.4k out · Cost: $20.99 (+extraction: $0.02, +merge: $0.00, +fix_gate: $0.02)
Annotations
Check warning on line 1134 in sentry_sdk/client.py
sentry-warden / warden: code-review
GenAI spans lose release, environment, and SDK metadata due to passing unprepared event
On line 1134, `event` (the original input) is passed to `_serialized_v1_span_to_serialized_v2_span` instead of `event_opt` (the prepared event). The `_prepare_event` method enriches the event with `release`, `environment`, `server_name`, `dist`, and `sdk` info (lines 811-817), but `_serialized_v1_span_to_serialized_v2_span` reads these fields (lines 231-247) from the passed event to populate attributes like `sentry.release`, `sentry.environment`, `sentry.sdk.name`, and `sentry.sdk.version`. As a result, GenAI v2 spans will be missing these critical metadata attributes.
Check warning on line 521 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
[YSC-XNK] GenAI spans lose release, environment, and SDK metadata due to passing unprepared event (additional location)
On line 1134, `event` (the original input) is passed to `_serialized_v1_span_to_serialized_v2_span` instead of `event_opt` (the prepared event). The `_prepare_event` method enriches the event with `release`, `environment`, `server_name`, `dist`, and `sdk` info (lines 811-817), but `_serialized_v1_span_to_serialized_v2_span` reads these fields (lines 231-247) from the passed event to populate attributes like `sentry.release`, `sentry.environment`, `sentry.sdk.name`, and `sentry.sdk.version`. As a result, GenAI v2 spans will be missing these critical metadata attributes.
Check warning on line 330 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
Sort key uses 'name' twice instead of 'name' and 'description'
The sorting lambda was changed from sorting by `(name, description)` to `(name, name)`, but the comment says 'sort by name and description for comparison'. This makes the sort key effectively single-field, which could lead to non-deterministic ordering when multiple tools have the same name but different descriptions, causing flaky test failures.
Check warning on line 945 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[3VH-9Y9] Sort key uses 'name' twice instead of 'name' and 'description' (additional location)
The sorting lambda was changed from sorting by `(name, description)` to `(name, name)`, but the comment says 'sort by name and description for comparison'. This makes the sort key effectively single-field, which could lead to non-deterministic ordering when multiple tools have the same name but different descriptions, causing flaky test failures.
Check warning on line 868 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
Mock patches wrong client attribute in test_async_exception_handling
The test mocks `client.embeddings._client._client.send` but calls `litellm.acompletion` which uses the completions endpoint. The synchronous `test_exception_handling` correctly mocks `client.completions._client._client.send`. This mismatch means the mock may not intercept the request, causing the test to not properly verify exception handling behavior.
Check warning on line 1966 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
[N8C-36U] Mock patches wrong client attribute in test_async_exception_handling (additional location)
The test mocks `client.embeddings._client._client.send` but calls `litellm.acompletion` which uses the completions endpoint. The synchronous `test_exception_handling` correctly mocks `client.completions._client._client.send`. This mismatch means the mock may not intercept the request, causing the test to not properly verify exception handling behavior.
Check warning on line 1023 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
Test assertions never execute because spans list is always empty
The `test_multiple_providers` function calls `capture_items("transaction")` on line 945 which only captures transaction items, not span items. The newly added code on lines 1020-1023 tries to iterate over spans with `spans = [item.payload for item in items if item.type == "span"]`, but this will always be empty since span items are not being captured. The for loop never executes, making the `SPANDATA.GEN_AI_SYSTEM` assertion dead code that doesn't validate anything.
Check warning on line 1279 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[824-3SV] Test assertions never execute because spans list is always empty (additional location)
The `test_multiple_providers` function calls `capture_items("transaction")` on line 945 which only captures transaction items, not span items. The newly added code on lines 1020-1023 tries to iterate over spans with `spans = [item.payload for item in items if item.type == "span"]`, but this will always be empty since span items are not being captured. The for loop never executes, making the `SPANDATA.GEN_AI_SYSTEM` assertion dead code that doesn't validate anything.
Check warning on line 3542 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
Test checks wrong field 'attributes' instead of 'data' for transaction context
The test checks `transaction["contexts"]["trace"].get("attributes", {})` but should check `transaction["contexts"]["trace"].get("data", {})`. Earlier in the same file (lines 3341, 3479), the transaction context uses `data` to store attributes like `gen_ai.conversation.id`. This inconsistency means the test may pass regardless of whether `gen_ai.conversation.id` is actually absent from the transaction, since it's checking the wrong field.
Check warning on line 834 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
[VV8-US7] Test checks wrong field 'attributes' instead of 'data' for transaction context (additional location)
The test checks `transaction["contexts"]["trace"].get("attributes", {})` but should check `transaction["contexts"]["trace"].get("data", {})`. Earlier in the same file (lines 3341, 3479), the transaction context uses `data` to store attributes like `gen_ai.conversation.id`. This inconsistency means the test may pass regardless of whether `gen_ai.conversation.id` is actually absent from the transaction, since it's checking the wrong field.
Check warning on line 496 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
Missing event validation when handled_tool_call_exceptions is False
In `test_agent_with_tool_validation_error`, when `handled_tool_call_exceptions=False`, the test no longer validates that a `model_behaviour_error` event is captured. The original code had an `else` branch that unpacked `(model_behaviour_error, transaction) = events` to verify the expected events were present. Without this validation, if the integration fails to emit the expected error event when the flag is False, the test won't catch it.
Check warning on line 1846 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
[8RN-KST] Missing event validation when handled_tool_call_exceptions is False (additional location)
In `test_agent_with_tool_validation_error`, when `handled_tool_call_exceptions=False`, the test no longer validates that a `model_behaviour_error` event is captured. The original code had an `else` branch that unpacked `(model_behaviour_error, transaction) = events` to verify the expected events were present. Without this validation, if the integration fails to emit the expected error event when the flag is False, the test won't catch it.