Skip to content

fix tests

bab7567
Select commit
Loading
Failed to load commit list.
Draft

feat: Send GenAI spans as V2 envelope items #6079

fix tests
bab7567
Select commit
Loading
Failed to load commit list.
@sentry/warden / warden: find-bugs completed Apr 17, 2026 in 25m 28s

8 issues

find-bugs: Found 8 issues (1 high, 5 medium, 2 low)

High

test_chat_completion_api_error does not enable gen_ai_as_v2_spans experiment due to unused variable - `tests/integrations/huggingface_hub/test_huggingface_hub.py:787`

Line 787 assigns _experiments = ({"gen_ai_as_v2_spans": True},) as a local variable instead of passing it to sentry_init(). The sentry_init(traces_sample_rate=1.0) call on line 786 does not include the _experiments parameter, meaning the test runs without the V2 spans feature enabled. This defeats the purpose of the test which validates V2 span behavior. Additionally, the value is incorrectly wrapped in a tuple instead of being a dict.

Also found at:

  • tests/integrations/huggingface_hub/test_huggingface_hub.py:847
  • tests/integrations/langchain/test_langchain.py:997
  • tests/integrations/openai/test_openai.py:1433-1434
  • tests/integrations/openai/test_openai.py:1452-1453
  • tests/integrations/openai/test_openai.py:1478-1479
  • tests/integrations/pydantic_ai/test_pydantic_ai.py:2978

Medium

Using pre-processed `event` instead of `event_opt` bypasses event scrubbing for GenAI spans - `sentry_sdk/client.py:130-133`

In capture_event, when converting GenAI spans to V2 format, the code passes the original event parameter to _serialized_v1_span_to_serialized_v2_span instead of event_opt (the prepared/scrubbed version). This means user attributes like user.id, user.email, user.name, release, environment, and SDK info extracted from the event for GenAI spans may bypass the event scrubber, potentially leaking PII that was supposed to be scrubbed. The scrubber runs on event_opt but the GenAI span conversion uses the unprocessed event.

Also found at:

  • sentry_sdk/client.py:1130
Test uses incorrect 'attributes' key instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2190`

The test test_extract_contents_messages_dict_inline_data uses "attributes" as the key inside inline_data, but the Google GenAI API and the implementation in transform_google_content_part expect "data". This makes the test pass vacuously since inline_data.get("data", "") returns an empty string, which still gets substituted with BLOB_DATA_SUBSTITUTE. Other tests in the same file (lines 1800, 1842) correctly use "data".

Also found at:

  • tests/integrations/google_genai/test_google_genai.py:333
Test asserts against stale _meta path after GenAI spans are extracted to V2 envelope items - `tests/integrations/langchain/test_langchain.py:1381`

Line 1381 asserts tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"][""]["len"] == 5 but with _experiments={"gen_ai_as_v2_spans": True} enabled (line 1313), the GenAI span is extracted from the transaction and sent as a separate envelope item. After extraction, the transaction's spans array no longer contains the GenAI span at index 0, making the _meta path invalid or pointing to a different span. This test will fail at runtime when the extracted spans leave behind mismatched _meta indices.

Also found at:

  • tests/integrations/langchain/test_langchain.py:1745
test_multiple_providers captures only transactions but asserts on spans - `tests/integrations/litellm/test_litellm.py:959`

The test was changed to use capture_items("transaction") at line 959, which only captures transaction items. However, at lines 1034-1037 (outside the hunk but in the same test function), the code iterates over spans with spans = [item.payload for item in items if item.type == "span"] and asserts SPANDATA.GEN_AI_SYSTEM in span["attributes"]. Since no spans are captured, the loop never executes and the assertion silently passes, making the test ineffective.

Also found at:

  • tests/integrations/litellm/test_litellm.py:1034-1037
Test assertion checks wrong key 'attributes' instead of 'data' for transaction contexts - `tests/integrations/openai_agents/test_openai_agents.py:3592-3594`

Line 3592-3594 checks transaction["contexts"]["trace"].get("attributes", {}) but transactions captured via capture_items use the original payload format which stores span data under the data key, not attributes. This is confirmed by line 3528 in the same PR which correctly uses transaction["contexts"]["trace"]["data"]. The capture_items fixture in conftest.py only transforms attributes for 'span' type items (lines 361-367), while transactions retain their original structure. This causes the assertion to always pass (checking an empty dict or missing key) even if gen_ai.conversation.id is incorrectly present.

Low

Hardcoded SDK version will cause test failure after version bump - `tests/integrations/huggingface_hub/test_huggingface_hub.py:524`

Line 524 uses a hardcoded string "2.58.0" for sentry.sdk.version, while all other similar assertions in this file (lines 601, 679, 757, 833, 949, 1046) correctly use mock.ANY. When the SDK version is incremented, test_text_generation will fail with a mismatched assertion. This appears to be an oversight during the test conversion.

Removed assertion checking expected transaction count in test_multiple_agents_asyncio - `tests/integrations/openai_agents/test_openai_agents.py:2293`

The refactored test at line 2293 unpacks directly from a generator expression without first verifying that exactly 3 transactions were captured. The original code had assert len(events) == 3 before unpacking. If the count is wrong, the test will now fail with an unclear ValueError about unpacking instead of an assertion error showing the actual count. This reduces test debuggability and removes validation that the correct number of transactions are generated.


Duration: 25m 14s · Tokens: 20.4M in / 226.5k out · Cost: $30.61 (+extraction: $0.05, +merge: $0.01, +fix_gate: $0.01)

Annotations

Check failure on line 787 in tests/integrations/huggingface_hub/test_huggingface_hub.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

test_chat_completion_api_error does not enable gen_ai_as_v2_spans experiment due to unused variable

Line 787 assigns `_experiments = ({"gen_ai_as_v2_spans": True},)` as a local variable instead of passing it to `sentry_init()`. The `sentry_init(traces_sample_rate=1.0)` call on line 786 does not include the `_experiments` parameter, meaning the test runs without the V2 spans feature enabled. This defeats the purpose of the test which validates V2 span behavior. Additionally, the value is incorrectly wrapped in a tuple instead of being a dict.

Check failure on line 847 in tests/integrations/huggingface_hub/test_huggingface_hub.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[MMT-ELL] test_chat_completion_api_error does not enable gen_ai_as_v2_spans experiment due to unused variable (additional location)

Line 787 assigns `_experiments = ({"gen_ai_as_v2_spans": True},)` as a local variable instead of passing it to `sentry_init()`. The `sentry_init(traces_sample_rate=1.0)` call on line 786 does not include the `_experiments` parameter, meaning the test runs without the V2 spans feature enabled. This defeats the purpose of the test which validates V2 span behavior. Additionally, the value is incorrectly wrapped in a tuple instead of being a dict.

Check failure on line 997 in tests/integrations/langchain/test_langchain.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[MMT-ELL] test_chat_completion_api_error does not enable gen_ai_as_v2_spans experiment due to unused variable (additional location)

Line 787 assigns `_experiments = ({"gen_ai_as_v2_spans": True},)` as a local variable instead of passing it to `sentry_init()`. The `sentry_init(traces_sample_rate=1.0)` call on line 786 does not include the `_experiments` parameter, meaning the test runs without the V2 spans feature enabled. This defeats the purpose of the test which validates V2 span behavior. Additionally, the value is incorrectly wrapped in a tuple instead of being a dict.

Check failure on line 1434 in tests/integrations/openai/test_openai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[MMT-ELL] test_chat_completion_api_error does not enable gen_ai_as_v2_spans experiment due to unused variable (additional location)

Line 787 assigns `_experiments = ({"gen_ai_as_v2_spans": True},)` as a local variable instead of passing it to `sentry_init()`. The `sentry_init(traces_sample_rate=1.0)` call on line 786 does not include the `_experiments` parameter, meaning the test runs without the V2 spans feature enabled. This defeats the purpose of the test which validates V2 span behavior. Additionally, the value is incorrectly wrapped in a tuple instead of being a dict.

Check failure on line 1453 in tests/integrations/openai/test_openai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[MMT-ELL] test_chat_completion_api_error does not enable gen_ai_as_v2_spans experiment due to unused variable (additional location)

Line 787 assigns `_experiments = ({"gen_ai_as_v2_spans": True},)` as a local variable instead of passing it to `sentry_init()`. The `sentry_init(traces_sample_rate=1.0)` call on line 786 does not include the `_experiments` parameter, meaning the test runs without the V2 spans feature enabled. This defeats the purpose of the test which validates V2 span behavior. Additionally, the value is incorrectly wrapped in a tuple instead of being a dict.

Check failure on line 1479 in tests/integrations/openai/test_openai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[MMT-ELL] test_chat_completion_api_error does not enable gen_ai_as_v2_spans experiment due to unused variable (additional location)

Line 787 assigns `_experiments = ({"gen_ai_as_v2_spans": True},)` as a local variable instead of passing it to `sentry_init()`. The `sentry_init(traces_sample_rate=1.0)` call on line 786 does not include the `_experiments` parameter, meaning the test runs without the V2 spans feature enabled. This defeats the purpose of the test which validates V2 span behavior. Additionally, the value is incorrectly wrapped in a tuple instead of being a dict.

Check failure on line 2978 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[MMT-ELL] test_chat_completion_api_error does not enable gen_ai_as_v2_spans experiment due to unused variable (additional location)

Line 787 assigns `_experiments = ({"gen_ai_as_v2_spans": True},)` as a local variable instead of passing it to `sentry_init()`. The `sentry_init(traces_sample_rate=1.0)` call on line 786 does not include the `_experiments` parameter, meaning the test runs without the V2 spans feature enabled. This defeats the purpose of the test which validates V2 span behavior. Additionally, the value is incorrectly wrapped in a tuple instead of being a dict.

Check warning on line 133 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Using pre-processed `event` instead of `event_opt` bypasses event scrubbing for GenAI spans

In `capture_event`, when converting GenAI spans to V2 format, the code passes the original `event` parameter to `_serialized_v1_span_to_serialized_v2_span` instead of `event_opt` (the prepared/scrubbed version). This means user attributes like `user.id`, `user.email`, `user.name`, release, environment, and SDK info extracted from the event for GenAI spans may bypass the event scrubber, potentially leaking PII that was supposed to be scrubbed. The scrubber runs on `event_opt` but the GenAI span conversion uses the unprocessed `event`.

Check warning on line 1130 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[HNQ-DEC] Using pre-processed `event` instead of `event_opt` bypasses event scrubbing for GenAI spans (additional location)

In `capture_event`, when converting GenAI spans to V2 format, the code passes the original `event` parameter to `_serialized_v1_span_to_serialized_v2_span` instead of `event_opt` (the prepared/scrubbed version). This means user attributes like `user.id`, `user.email`, `user.name`, release, environment, and SDK info extracted from the event for GenAI spans may bypass the event scrubber, potentially leaking PII that was supposed to be scrubbed. The scrubber runs on `event_opt` but the GenAI span conversion uses the unprocessed `event`.

Check warning on line 2190 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Test uses incorrect 'attributes' key instead of 'data' for inline_data

The test `test_extract_contents_messages_dict_inline_data` uses `"attributes"` as the key inside `inline_data`, but the Google GenAI API and the implementation in `transform_google_content_part` expect `"data"`. This makes the test pass vacuously since `inline_data.get("data", "")` returns an empty string, which still gets substituted with `BLOB_DATA_SUBSTITUTE`. Other tests in the same file (lines 1800, 1842) correctly use `"data"`.

Check warning on line 333 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[SEF-S7T] Test uses incorrect 'attributes' key instead of 'data' for inline_data (additional location)

The test `test_extract_contents_messages_dict_inline_data` uses `"attributes"` as the key inside `inline_data`, but the Google GenAI API and the implementation in `transform_google_content_part` expect `"data"`. This makes the test pass vacuously since `inline_data.get("data", "")` returns an empty string, which still gets substituted with `BLOB_DATA_SUBSTITUTE`. Other tests in the same file (lines 1800, 1842) correctly use `"data"`.

Check warning on line 1381 in tests/integrations/langchain/test_langchain.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Test asserts against stale _meta path after GenAI spans are extracted to V2 envelope items

Line 1381 asserts `tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"][""]["len"] == 5` but with `_experiments={"gen_ai_as_v2_spans": True}` enabled (line 1313), the GenAI span is extracted from the transaction and sent as a separate envelope item. After extraction, the transaction's `spans` array no longer contains the GenAI span at index 0, making the `_meta` path invalid or pointing to a different span. This test will fail at runtime when the extracted spans leave behind mismatched `_meta` indices.

Check warning on line 1745 in tests/integrations/langchain/test_langchain.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[5CT-ZAN] Test asserts against stale _meta path after GenAI spans are extracted to V2 envelope items (additional location)

Line 1381 asserts `tx["_meta"]["spans"]["0"]["data"]["gen_ai.request.messages"][""]["len"] == 5` but with `_experiments={"gen_ai_as_v2_spans": True}` enabled (line 1313), the GenAI span is extracted from the transaction and sent as a separate envelope item. After extraction, the transaction's `spans` array no longer contains the GenAI span at index 0, making the `_meta` path invalid or pointing to a different span. This test will fail at runtime when the extracted spans leave behind mismatched `_meta` indices.

Check warning on line 959 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

test_multiple_providers captures only transactions but asserts on spans

The test was changed to use `capture_items("transaction")` at line 959, which only captures transaction items. However, at lines 1034-1037 (outside the hunk but in the same test function), the code iterates over spans with `spans = [item.payload for item in items if item.type == "span"]` and asserts `SPANDATA.GEN_AI_SYSTEM in span["attributes"]`. Since no spans are captured, the loop never executes and the assertion silently passes, making the test ineffective.

Check warning on line 1037 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[FYZ-S46] test_multiple_providers captures only transactions but asserts on spans (additional location)

The test was changed to use `capture_items("transaction")` at line 959, which only captures transaction items. However, at lines 1034-1037 (outside the hunk but in the same test function), the code iterates over spans with `spans = [item.payload for item in items if item.type == "span"]` and asserts `SPANDATA.GEN_AI_SYSTEM in span["attributes"]`. Since no spans are captured, the loop never executes and the assertion silently passes, making the test ineffective.

Check warning on line 3594 in tests/integrations/openai_agents/test_openai_agents.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Test assertion checks wrong key 'attributes' instead of 'data' for transaction contexts

Line 3592-3594 checks `transaction["contexts"]["trace"].get("attributes", {})` but transactions captured via `capture_items` use the original payload format which stores span data under the `data` key, not `attributes`. This is confirmed by line 3528 in the same PR which correctly uses `transaction["contexts"]["trace"]["data"]`. The `capture_items` fixture in conftest.py only transforms `attributes` for 'span' type items (lines 361-367), while transactions retain their original structure. This causes the assertion to always pass (checking an empty dict or missing key) even if `gen_ai.conversation.id` is incorrectly present.