feat: Send GenAI spans as V2 envelope items #6079
10 issues
code-review: Found 10 issues (4 high, 5 medium, 1 low)
High
Test uses invalid 'attributes' key instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2190`
The test input was changed from "data": b"binary_data" to "attributes": b"binary_data". The Google GenAI API and the transform_google_content_part function expect inline_data to have a "data" key (line 286 of ai/utils.py: inline_data.get("data", "")). With "attributes", the transform will return an empty string for content instead of the binary data, making this test ineffective at validating the intended functionality.
Also found at:
tests/integrations/openai_agents/test_openai_agents.py:3528-3530
_experiments variable is assigned but never passed to sentry_init(), test won't enable V2 spans - `tests/integrations/huggingface_hub/test_huggingface_hub.py:787-788`
In test_chat_completion_api_error, the _experiments variable is assigned on line 787 but never passed to sentry_init() on line 786. This means the gen_ai_as_v2_spans: True experiment won't be enabled. Later in the test (lines 805, 815-817), the code accesses span["attributes"] which is the V2 span format - this will fail at runtime because V2 spans won't be generated. Additionally, the value is incorrectly wrapped in a tuple (dict,) instead of just a dict like in other test functions.
Also found at:
tests/integrations/huggingface_hub/test_huggingface_hub.py:847tests/integrations/langchain/test_langchain.py:997tests/integrations/langchain/test_langchain.py:1758tests/integrations/openai/test_openai.py:1434-1435tests/integrations/openai/test_openai.py:1450-1451tests/integrations/openai/test_openai.py:1474-1475tests/integrations/pydantic_ai/test_pydantic_ai.py:2978tests/integrations/langchain/test_langchain.py:1745
Incomplete V2 span migration leaves assertion accessing non-existent transaction meta structure - `tests/integrations/langchain/test_langchain.py:1381`
The test test_langchain_message_truncation was partially migrated to V2 span format: spans are now extracted as separate envelope items via items and accessed via span["attributes"]. However, line 1381 still asserts tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"][""]["len"] == 5, which references the old structure where spans are nested within the transaction. With V2 spans sent as separate envelope items, the tx["_meta"]["spans"] structure may not exist or contain different data, causing a KeyError or incorrect test behavior.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:500-506
test_multiple_providers captures no spans, making assertions vacuously pass - `tests/integrations/litellm/test_litellm.py:1034-1037`
In test_multiple_providers, capture_items("transaction") (line 959) only captures transaction items, but the test at line 1034-1037 filters for item.type == "span" and iterates over the result. Since no span items are captured, the spans list is always empty, and the for-loop never executes any assertions. This means the test silently passes without verifying that SPANDATA.GEN_AI_SYSTEM is set on any span. The async version correctly uses capture_items("transaction", "span").
Also found at:
tests/integrations/litellm/test_litellm.py:959
Medium
GenAI spans converted with unprepared event, missing release/environment/SDK attributes - `sentry_sdk/client.py:1130`
Line 1130 passes event (the original input) to _serialized_v1_span_to_serialized_v2_span, but should pass event_opt (the prepared event). The _prepare_event method populates release, environment, and sdk fields (lines 811-817) which _serialized_v1_span_to_serialized_v2_span relies on to populate span attributes. Using the unprepared event results in converted V2 GenAI spans missing these attributes.
Sorting key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:333`
The sorting lambda was changed from (t.get("name", ""), t.get("description", "")) to (t.get("name", ""), t.get("name", "")), duplicating the name field. The comment on line 331 explicitly states 'sort by name and description for comparison', but the code now sorts by name twice, which makes the secondary sort criterion useless and could produce incorrect ordering when tools have the same name but different descriptions.
Also found at:
tests/integrations/litellm/test_litellm.py:1301-1302tests/integrations/litellm/test_litellm.py:1348-1350tests/integrations/langchain/test_langchain.py:264
Unused list comprehension performs no validation in error handling test - `tests/integrations/langchain/test_langchain.py:1861-1865`
The list comprehension on lines 1861-1865 computes a list of error events but the result is not assigned to any variable or used in any assertion. The previous code had the same issue ([e for e in events if ...]), but this was an opportunity to fix it. As written, the test for error handling doesn't actually verify anything about captured errors.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:800-802
ai_client_span1 is unpacked but never tested - `tests/integrations/openai_agents/test_openai_agents.py:1712-1713`
The variable ai_client_span1 is unpacked from the spans list but no assertions are made against it. The previous test version verified ai_client_span properties including description, origin, status, and tags. This reduces test coverage and may miss regressions in the AI client span behavior.
Also found at:
tests/integrations/huggingface_hub/test_huggingface_hub.py:524
Test may pass vacuously without verifying any spans - `tests/integrations/pydantic_ai/test_pydantic_ai.py:1011-1018`
In test_include_prompts_requires_pii, the test iterates over chat_spans to verify messages aren't captured, but there's no assertion ensuring chat_spans is non-empty. If no chat spans are produced (due to a bug or configuration issue), the for-loop executes zero iterations and the test passes without verifying anything. Other similar tests in this file use assert len(chat_spans) >= 1 before the verification loop.
Also found at:
tests/integrations/pydantic_ai/test_pydantic_ai.py:848-851tests/integrations/langchain/test_langchain.py:946
Low
Unused capture_items fixture parameter in test function - `tests/integrations/openai_agents/test_openai_agents.py:3083`
The test function test_openai_agents_message_truncation accepts capture_items as a parameter (changed from capture_events), but the fixture is never actually called or used in the test body. The test directly accesses span data via span._data without needing to capture any items. This unused parameter should either be removed or the test should be updated to actually use the fixture to verify captured items.
Duration: 47m 14s · Tokens: 15.0M in / 197.7k out · Cost: $19.51 (+extraction: $0.04, +merge: $0.01, +fix_gate: $0.02)
Annotations
Check failure on line 2190 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
Test uses invalid 'attributes' key instead of 'data' for inline_data
The test input was changed from `"data": b"binary_data"` to `"attributes": b"binary_data"`. The Google GenAI API and the `transform_google_content_part` function expect `inline_data` to have a `"data"` key (line 286 of ai/utils.py: `inline_data.get("data", "")`). With `"attributes"`, the transform will return an empty string for content instead of the binary data, making this test ineffective at validating the intended functionality.
Check failure on line 3530 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
[JNW-NFC] Test uses invalid 'attributes' key instead of 'data' for inline_data (additional location)
The test input was changed from `"data": b"binary_data"` to `"attributes": b"binary_data"`. The Google GenAI API and the `transform_google_content_part` function expect `inline_data` to have a `"data"` key (line 286 of ai/utils.py: `inline_data.get("data", "")`). With `"attributes"`, the transform will return an empty string for content instead of the binary data, making this test ineffective at validating the intended functionality.
Check failure on line 788 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
_experiments variable is assigned but never passed to sentry_init(), test won't enable V2 spans
In `test_chat_completion_api_error`, the `_experiments` variable is assigned on line 787 but never passed to `sentry_init()` on line 786. This means the `gen_ai_as_v2_spans: True` experiment won't be enabled. Later in the test (lines 805, 815-817), the code accesses `span["attributes"]` which is the V2 span format - this will fail at runtime because V2 spans won't be generated. Additionally, the value is incorrectly wrapped in a tuple `(dict,)` instead of just a dict like in other test functions.
Check failure on line 847 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
[HZF-SZK] _experiments variable is assigned but never passed to sentry_init(), test won't enable V2 spans (additional location)
In `test_chat_completion_api_error`, the `_experiments` variable is assigned on line 787 but never passed to `sentry_init()` on line 786. This means the `gen_ai_as_v2_spans: True` experiment won't be enabled. Later in the test (lines 805, 815-817), the code accesses `span["attributes"]` which is the V2 span format - this will fail at runtime because V2 spans won't be generated. Additionally, the value is incorrectly wrapped in a tuple `(dict,)` instead of just a dict like in other test functions.
Check failure on line 997 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
[HZF-SZK] _experiments variable is assigned but never passed to sentry_init(), test won't enable V2 spans (additional location)
In `test_chat_completion_api_error`, the `_experiments` variable is assigned on line 787 but never passed to `sentry_init()` on line 786. This means the `gen_ai_as_v2_spans: True` experiment won't be enabled. Later in the test (lines 805, 815-817), the code accesses `span["attributes"]` which is the V2 span format - this will fail at runtime because V2 spans won't be generated. Additionally, the value is incorrectly wrapped in a tuple `(dict,)` instead of just a dict like in other test functions.
Check failure on line 1758 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
[HZF-SZK] _experiments variable is assigned but never passed to sentry_init(), test won't enable V2 spans (additional location)
In `test_chat_completion_api_error`, the `_experiments` variable is assigned on line 787 but never passed to `sentry_init()` on line 786. This means the `gen_ai_as_v2_spans: True` experiment won't be enabled. Later in the test (lines 805, 815-817), the code accesses `span["attributes"]` which is the V2 span format - this will fail at runtime because V2 spans won't be generated. Additionally, the value is incorrectly wrapped in a tuple `(dict,)` instead of just a dict like in other test functions.
Check failure on line 1435 in tests/integrations/openai/test_openai.py
sentry-warden / warden: code-review
[HZF-SZK] _experiments variable is assigned but never passed to sentry_init(), test won't enable V2 spans (additional location)
In `test_chat_completion_api_error`, the `_experiments` variable is assigned on line 787 but never passed to `sentry_init()` on line 786. This means the `gen_ai_as_v2_spans: True` experiment won't be enabled. Later in the test (lines 805, 815-817), the code accesses `span["attributes"]` which is the V2 span format - this will fail at runtime because V2 spans won't be generated. Additionally, the value is incorrectly wrapped in a tuple `(dict,)` instead of just a dict like in other test functions.
Check failure on line 1451 in tests/integrations/openai/test_openai.py
sentry-warden / warden: code-review
[HZF-SZK] _experiments variable is assigned but never passed to sentry_init(), test won't enable V2 spans (additional location)
In `test_chat_completion_api_error`, the `_experiments` variable is assigned on line 787 but never passed to `sentry_init()` on line 786. This means the `gen_ai_as_v2_spans: True` experiment won't be enabled. Later in the test (lines 805, 815-817), the code accesses `span["attributes"]` which is the V2 span format - this will fail at runtime because V2 spans won't be generated. Additionally, the value is incorrectly wrapped in a tuple `(dict,)` instead of just a dict like in other test functions.
Check failure on line 1475 in tests/integrations/openai/test_openai.py
sentry-warden / warden: code-review
[HZF-SZK] _experiments variable is assigned but never passed to sentry_init(), test won't enable V2 spans (additional location)
In `test_chat_completion_api_error`, the `_experiments` variable is assigned on line 787 but never passed to `sentry_init()` on line 786. This means the `gen_ai_as_v2_spans: True` experiment won't be enabled. Later in the test (lines 805, 815-817), the code accesses `span["attributes"]` which is the V2 span format - this will fail at runtime because V2 spans won't be generated. Additionally, the value is incorrectly wrapped in a tuple `(dict,)` instead of just a dict like in other test functions.
Check failure on line 2978 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
[HZF-SZK] _experiments variable is assigned but never passed to sentry_init(), test won't enable V2 spans (additional location)
In `test_chat_completion_api_error`, the `_experiments` variable is assigned on line 787 but never passed to `sentry_init()` on line 786. This means the `gen_ai_as_v2_spans: True` experiment won't be enabled. Later in the test (lines 805, 815-817), the code accesses `span["attributes"]` which is the V2 span format - this will fail at runtime because V2 spans won't be generated. Additionally, the value is incorrectly wrapped in a tuple `(dict,)` instead of just a dict like in other test functions.
Check failure on line 1745 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
[HZF-SZK] _experiments variable is assigned but never passed to sentry_init(), test won't enable V2 spans (additional location)
In `test_chat_completion_api_error`, the `_experiments` variable is assigned on line 787 but never passed to `sentry_init()` on line 786. This means the `gen_ai_as_v2_spans: True` experiment won't be enabled. Later in the test (lines 805, 815-817), the code accesses `span["attributes"]` which is the V2 span format - this will fail at runtime because V2 spans won't be generated. Additionally, the value is incorrectly wrapped in a tuple `(dict,)` instead of just a dict like in other test functions.
Check failure on line 1381 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
Incomplete V2 span migration leaves assertion accessing non-existent transaction meta structure
The test `test_langchain_message_truncation` was partially migrated to V2 span format: spans are now extracted as separate envelope items via `items` and accessed via `span["attributes"]`. However, line 1381 still asserts `tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"][""]["len"] == 5`, which references the old structure where spans are nested within the transaction. With V2 spans sent as separate envelope items, the `tx["_meta"]["spans"]` structure may not exist or contain different data, causing a KeyError or incorrect test behavior.
Check failure on line 506 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
[NP4-GVD] Incomplete V2 span migration leaves assertion accessing non-existent transaction meta structure (additional location)
The test `test_langchain_message_truncation` was partially migrated to V2 span format: spans are now extracted as separate envelope items via `items` and accessed via `span["attributes"]`. However, line 1381 still asserts `tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"][""]["len"] == 5`, which references the old structure where spans are nested within the transaction. With V2 spans sent as separate envelope items, the `tx["_meta"]["spans"]` structure may not exist or contain different data, causing a KeyError or incorrect test behavior.
Check failure on line 1037 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
test_multiple_providers captures no spans, making assertions vacuously pass
In `test_multiple_providers`, `capture_items("transaction")` (line 959) only captures transaction items, but the test at line 1034-1037 filters for `item.type == "span"` and iterates over the result. Since no span items are captured, the `spans` list is always empty, and the for-loop never executes any assertions. This means the test silently passes without verifying that `SPANDATA.GEN_AI_SYSTEM` is set on any span. The async version correctly uses `capture_items("transaction", "span")`.
Check failure on line 959 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[4C9-8ZA] test_multiple_providers captures no spans, making assertions vacuously pass (additional location)
In `test_multiple_providers`, `capture_items("transaction")` (line 959) only captures transaction items, but the test at line 1034-1037 filters for `item.type == "span"` and iterates over the result. Since no span items are captured, the `spans` list is always empty, and the for-loop never executes any assertions. This means the test silently passes without verifying that `SPANDATA.GEN_AI_SYSTEM` is set on any span. The async version correctly uses `capture_items("transaction", "span")`.
Check warning on line 1130 in sentry_sdk/client.py
sentry-warden / warden: code-review
GenAI spans converted with unprepared event, missing release/environment/SDK attributes
Line 1130 passes `event` (the original input) to `_serialized_v1_span_to_serialized_v2_span`, but should pass `event_opt` (the prepared event). The `_prepare_event` method populates `release`, `environment`, and `sdk` fields (lines 811-817) which `_serialized_v1_span_to_serialized_v2_span` relies on to populate span attributes. Using the unprepared `event` results in converted V2 GenAI spans missing these attributes.
Check warning on line 333 in tests/integrations/google_genai/test_google_genai.py
sentry-warden / warden: code-review
Sorting key uses 'name' twice instead of 'name' and 'description'
The sorting lambda was changed from `(t.get("name", ""), t.get("description", ""))` to `(t.get("name", ""), t.get("name", ""))`, duplicating the name field. The comment on line 331 explicitly states 'sort by name and description for comparison', but the code now sorts by name twice, which makes the secondary sort criterion useless and could produce incorrect ordering when tools have the same name but different descriptions.
Check warning on line 1302 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[794-TNZ] Sorting key uses 'name' twice instead of 'name' and 'description' (additional location)
The sorting lambda was changed from `(t.get("name", ""), t.get("description", ""))` to `(t.get("name", ""), t.get("name", ""))`, duplicating the name field. The comment on line 331 explicitly states 'sort by name and description for comparison', but the code now sorts by name twice, which makes the secondary sort criterion useless and could produce incorrect ordering when tools have the same name but different descriptions.
Check warning on line 1350 in tests/integrations/litellm/test_litellm.py
sentry-warden / warden: code-review
[794-TNZ] Sorting key uses 'name' twice instead of 'name' and 'description' (additional location)
The sorting lambda was changed from `(t.get("name", ""), t.get("description", ""))` to `(t.get("name", ""), t.get("name", ""))`, duplicating the name field. The comment on line 331 explicitly states 'sort by name and description for comparison', but the code now sorts by name twice, which makes the secondary sort criterion useless and could produce incorrect ordering when tools have the same name but different descriptions.
Check warning on line 264 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
[794-TNZ] Sorting key uses 'name' twice instead of 'name' and 'description' (additional location)
The sorting lambda was changed from `(t.get("name", ""), t.get("description", ""))` to `(t.get("name", ""), t.get("name", ""))`, duplicating the name field. The comment on line 331 explicitly states 'sort by name and description for comparison', but the code now sorts by name twice, which makes the secondary sort criterion useless and could produce incorrect ordering when tools have the same name but different descriptions.
Check warning on line 1865 in tests/integrations/langchain/test_langchain.py
sentry-warden / warden: code-review
Unused list comprehension performs no validation in error handling test
The list comprehension on lines 1861-1865 computes a list of error events but the result is not assigned to any variable or used in any assertion. The previous code had the same issue (`[e for e in events if ...]`), but this was an opportunity to fix it. As written, the test for error handling doesn't actually verify anything about captured errors.
Check warning on line 802 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
[ED7-4JC] Unused list comprehension performs no validation in error handling test (additional location)
The list comprehension on lines 1861-1865 computes a list of error events but the result is not assigned to any variable or used in any assertion. The previous code had the same issue (`[e for e in events if ...]`), but this was an opportunity to fix it. As written, the test for error handling doesn't actually verify anything about captured errors.
Check warning on line 1713 in tests/integrations/openai_agents/test_openai_agents.py
sentry-warden / warden: code-review
ai_client_span1 is unpacked but never tested
The variable `ai_client_span1` is unpacked from the spans list but no assertions are made against it. The previous test version verified `ai_client_span` properties including description, origin, status, and tags. This reduces test coverage and may miss regressions in the AI client span behavior.
Check warning on line 524 in tests/integrations/huggingface_hub/test_huggingface_hub.py
sentry-warden / warden: code-review
[MHJ-M4G] ai_client_span1 is unpacked but never tested (additional location)
The variable `ai_client_span1` is unpacked from the spans list but no assertions are made against it. The previous test version verified `ai_client_span` properties including description, origin, status, and tags. This reduces test coverage and may miss regressions in the AI client span behavior.
Check warning on line 1018 in tests/integrations/pydantic_ai/test_pydantic_ai.py
sentry-warden / warden: code-review
Test may pass vacuously without verifying any spans
In `test_include_prompts_requires_pii`, the test iterates over `chat_spans` to verify messages aren't captured, but there's no assertion ensuring `chat_spans` is non-empty. If no chat spans are produced (due to a bug or configuration issue), the for-loop executes zero iterations and the test passes without verifying anything. Other similar tests in this file use `assert len(chat_spans) >= 1` before the verification loop.