Skip to content

add constant again

7bd12ae
Select commit
Loading
Failed to load commit list.
Draft

feat: Send GenAI spans as V2 envelope items #6079

add constant again
7bd12ae
Select commit
Loading
Failed to load commit list.
@sentry/warden / warden: find-bugs completed Apr 20, 2026 in 25m 18s

13 issues

find-bugs: Found 13 issues (9 medium, 4 low)

Medium

V2 GenAI spans use unprocessed event data instead of prepared event - `sentry_sdk/client.py:134-137`

On line 1134, _serialized_v1_span_to_serialized_v2_span(span, event) passes the original event instead of event_opt. The event_opt is the result of _prepare_event() which includes processing by before_send_transaction callback (lines 897-925). If before_send_transaction modifies user data, release, environment, SDK info, or trace context, those changes won't be reflected in the converted V2 GenAI spans, causing data inconsistency between the V1 transaction and V2 spans in the same envelope.

Also found at:

  • sentry_sdk/client.py:1134
Sorting key uses 'name' twice instead of 'name' and 'description' - `tests/integrations/google_genai/test_google_genai.py:330`

The sorting lambda in test_generate_content_with_tools was changed from key=lambda t: (t.get("name", ""), t.get("description", "")) to key=lambda t: (t.get("name", ""), t.get("name", "")). This appears to be a copy-paste error where the second element of the tuple should be "description" to maintain the original sorting behavior. The test still passes because the two tools have different names ("get_weather" and "get_weather_tool"), but the sorting logic is now incorrect and could fail to properly order tools if they had the same name but different descriptions.

Test accesses spans[0] without filtering, unlike all other tests in the file - `tests/integrations/langchain/test_langchain.py:940`

In test_span_status_error, spans[0] is accessed directly without filtering by operation type (sentry.op). All other tests in this file filter spans before accessing them (e.g., chat_spans = list(x for x in spans if x['attributes']['sentry.op'] == 'gen_ai.chat')). This makes the test fragile - if span ordering changes or additional spans are emitted, the test may pass/fail unexpectedly. The assertion spans[0]['status'] == 'error' may be checking an unrelated span.

List comprehension result is unused - test doesn't verify error capture - `tests/integrations/langchain/test_langchain.py:1842-1846`

In test_langchain_embeddings_error_handling, a list comprehension filtering error events (lines 1842-1846) is computed but never assigned to a variable or used in an assertion. The comment says 'errors might not be auto-captured' but the test neither validates when errors ARE captured nor makes any assertion about the result. This makes the test ineffective at verifying error handling behavior.

test_langchain_embeddings_span_hierarchy mixes V1 and V2 span retrieval methods - `tests/integrations/langchain/test_langchain.py:1953-1954`

In test_langchain_embeddings_span_hierarchy, embeddings_spans are retrieved from V2 spans (items with type 'span') while custom_spans are retrieved from V1 transaction spans (tx.get('spans', [])). This inconsistency means the test compares spans from different data structures, which could lead to false positives/negatives or broken parent-child relationship verification when the feature flag changes span storage behavior.

Test assertion references transaction _meta for span that is now sent as V2 envelope item - `tests/integrations/langgraph/test_langgraph.py:1411`

The test was migrated to use V2 envelope items (capture_items) where GenAI spans are sent separately from the transaction. However, the assertion at line 1411 still checks tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"] on the transaction object. Since GenAI spans are extracted from the transaction and sent as separate V2 envelope items (per the _split_gen_ai_spans logic in client.py), the transaction's _meta["spans"] will not contain metadata for the GenAI span. This assertion will likely fail or check the wrong span.

test_multiple_providers never validates span assertions due to missing 'span' in capture_items - `tests/integrations/litellm/test_litellm.py:945`

The test calls capture_items("transaction") at line 945 which only captures items of type 'transaction'. However, later at line 1020, the test tries to filter for spans with if item.type == "span". Since spans are never captured, the spans list will always be empty, and the for loop assertion at lines 1021-1023 never executes. This makes the test appear to pass while not actually validating that SPANDATA.GEN_AI_SYSTEM is present in span attributes.

Also found at:

  • tests/integrations/litellm/test_litellm.py:1020-1023
Inconsistent attribute key lookup for transaction trace context - `tests/integrations/openai_agents/test_openai_agents.py:3540-3542`

The test at line 3540-3541 checks transaction["contexts"]["trace"].get("attributes", {}) but the related positive assertion at line 3479-3480 uses transaction["contexts"]["trace"]["data"]. This inconsistency means the negative assertion at line 3540-3541 may not be testing the correct location. If the conversation_id would actually be stored under ["data"] (as the positive test expects), this test would pass even if conversation_id IS incorrectly present, because it's checking the wrong key.

Test accesses V2 span format attributes on old-format embedded spans - `tests/integrations/pydantic_ai/test_pydantic_ai.py:832-834`

In test_message_history, the code retrieves spans from second_transaction["spans"] (line 831) which returns spans in the old format with op at the top level and data for additional data. However, the filtering at line 833 uses s["attributes"].get("sentry.op", "") which is the new V2 envelope span format. Since embedded spans don't have an attributes key, this will raise a KeyError or fail to find any matching spans.

Low

Internal flag _has_gen_ai_span leaks to server in transaction payload - `sentry_sdk/tracing.py:1087`

The _has_gen_ai_span flag set at line 1087 will be included in the serialized transaction sent to Sentry's server. In client.py, event.pop('_has_gen_ai_span', False) at line 1119 removes it from the original event dict, but event_opt (the serialized copy) already contains this field since serialization happens before the pop. Compare with _dropped_spans which is correctly popped inside _prepare_event before serialization. This wastes bandwidth and exposes internal SDK implementation details to the server.

Hardcoded SDK version in test assertion will cause test failures on version bumps - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

Line 523 uses a hardcoded version string "2.58.0" for sentry.sdk.version assertion, while all other tests in the same file (lines 599, 676, 753, 828, 942, 1038) correctly use mock.ANY. This test will fail when the SDK version is updated.

test_langchain_embeddings_error_handling is missing gen_ai_as_v2_spans experiment flag - `tests/integrations/langchain/test_langchain.py:1823`

The test test_langchain_embeddings_error_handling uses capture_items('transaction', 'span') and accesses span['attributes'] (V2 format), but doesn't enable the _experiments={'gen_ai_as_v2_spans': True} flag in sentry_init. Other tests that don't use the integration (line 1728) set this flag. Without the flag, the V2 span format may not be generated, causing the test to behave unexpectedly.

Ineffective test assertion: V2 spans have no 'tags' field - `tests/integrations/openai_agents/test_openai_agents.py:1966`

Line 1966 asserts mcp_tool_span.get("tags", {}).get("status") != "error" but V2 spans don't have a tags field - tags are merged into attributes during V1-to-V2 conversion (see _serialized_v1_span_to_serialized_v2_span in client.py lines 216-218). This assertion always passes because get("tags", {}) returns {} and {}.get("status") returns None, which is never equal to "error". The test claim 'Verify no error status' is only partially fulfilled by line 1965.


Duration: 24m 59s · Tokens: 20.3M in / 225.6k out · Cost: $28.34 (+extraction: $0.03, +merge: $0.01, +fix_gate: $0.02)

Annotations

Check warning on line 137 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

V2 GenAI spans use unprocessed event data instead of prepared event

On line 1134, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes the original `event` instead of `event_opt`. The `event_opt` is the result of `_prepare_event()` which includes processing by `before_send_transaction` callback (lines 897-925). If `before_send_transaction` modifies user data, release, environment, SDK info, or trace context, those changes won't be reflected in the converted V2 GenAI spans, causing data inconsistency between the V1 transaction and V2 spans in the same envelope.

Check warning on line 1134 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[WPJ-JHW] V2 GenAI spans use unprocessed event data instead of prepared event (additional location)

On line 1134, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes the original `event` instead of `event_opt`. The `event_opt` is the result of `_prepare_event()` which includes processing by `before_send_transaction` callback (lines 897-925). If `before_send_transaction` modifies user data, release, environment, SDK info, or trace context, those changes won't be reflected in the converted V2 GenAI spans, causing data inconsistency between the V1 transaction and V2 spans in the same envelope.

Check warning on line 330 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Sorting key uses 'name' twice instead of 'name' and 'description'

The sorting lambda in `test_generate_content_with_tools` was changed from `key=lambda t: (t.get("name", ""), t.get("description", ""))` to `key=lambda t: (t.get("name", ""), t.get("name", ""))`. This appears to be a copy-paste error where the second element of the tuple should be "description" to maintain the original sorting behavior. The test still passes because the two tools have different names ("get_weather" and "get_weather_tool"), but the sorting logic is now incorrect and could fail to properly order tools if they had the same name but different descriptions.

Check warning on line 940 in tests/integrations/langchain/test_langchain.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Test accesses spans[0] without filtering, unlike all other tests in the file

In test_span_status_error, `spans[0]` is accessed directly without filtering by operation type (sentry.op). All other tests in this file filter spans before accessing them (e.g., `chat_spans = list(x for x in spans if x['attributes']['sentry.op'] == 'gen_ai.chat')`). This makes the test fragile - if span ordering changes or additional spans are emitted, the test may pass/fail unexpectedly. The assertion `spans[0]['status'] == 'error'` may be checking an unrelated span.

Check warning on line 1846 in tests/integrations/langchain/test_langchain.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

List comprehension result is unused - test doesn't verify error capture

In test_langchain_embeddings_error_handling, a list comprehension filtering error events (lines 1842-1846) is computed but never assigned to a variable or used in an assertion. The comment says 'errors might not be auto-captured' but the test neither validates when errors ARE captured nor makes any assertion about the result. This makes the test ineffective at verifying error handling behavior.

Check warning on line 1954 in tests/integrations/langchain/test_langchain.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

test_langchain_embeddings_span_hierarchy mixes V1 and V2 span retrieval methods

In test_langchain_embeddings_span_hierarchy, embeddings_spans are retrieved from V2 spans (items with type 'span') while custom_spans are retrieved from V1 transaction spans (tx.get('spans', [])). This inconsistency means the test compares spans from different data structures, which could lead to false positives/negatives or broken parent-child relationship verification when the feature flag changes span storage behavior.

Check warning on line 1411 in tests/integrations/langgraph/test_langgraph.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Test assertion references transaction _meta for span that is now sent as V2 envelope item

The test was migrated to use V2 envelope items (`capture_items`) where GenAI spans are sent separately from the transaction. However, the assertion at line 1411 still checks `tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"]` on the transaction object. Since GenAI spans are extracted from the transaction and sent as separate V2 envelope items (per the `_split_gen_ai_spans` logic in client.py), the transaction's `_meta["spans"]` will not contain metadata for the GenAI span. This assertion will likely fail or check the wrong span.

Check warning on line 945 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

test_multiple_providers never validates span assertions due to missing 'span' in capture_items

The test calls `capture_items("transaction")` at line 945 which only captures items of type 'transaction'. However, later at line 1020, the test tries to filter for spans with `if item.type == "span"`. Since spans are never captured, the `spans` list will always be empty, and the for loop assertion at lines 1021-1023 never executes. This makes the test appear to pass while not actually validating that `SPANDATA.GEN_AI_SYSTEM` is present in span attributes.

Check warning on line 1023 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[5YF-LRW] test_multiple_providers never validates span assertions due to missing 'span' in capture_items (additional location)

The test calls `capture_items("transaction")` at line 945 which only captures items of type 'transaction'. However, later at line 1020, the test tries to filter for spans with `if item.type == "span"`. Since spans are never captured, the `spans` list will always be empty, and the for loop assertion at lines 1021-1023 never executes. This makes the test appear to pass while not actually validating that `SPANDATA.GEN_AI_SYSTEM` is present in span attributes.

Check warning on line 3542 in tests/integrations/openai_agents/test_openai_agents.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Inconsistent attribute key lookup for transaction trace context

The test at line 3540-3541 checks `transaction["contexts"]["trace"].get("attributes", {})` but the related positive assertion at line 3479-3480 uses `transaction["contexts"]["trace"]["data"]`. This inconsistency means the negative assertion at line 3540-3541 may not be testing the correct location. If the conversation_id would actually be stored under `["data"]` (as the positive test expects), this test would pass even if conversation_id IS incorrectly present, because it's checking the wrong key.

Check warning on line 834 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Test accesses V2 span format attributes on old-format embedded spans

In `test_message_history`, the code retrieves spans from `second_transaction["spans"]` (line 831) which returns spans in the old format with `op` at the top level and `data` for additional data. However, the filtering at line 833 uses `s["attributes"].get("sentry.op", "")` which is the new V2 envelope span format. Since embedded spans don't have an `attributes` key, this will raise a `KeyError` or fail to find any matching spans.