Skip to content

fix common tests

204b980
Select commit
Loading
Failed to load commit list.
Draft

feat: Send GenAI spans as V2 envelope items #6079

fix common tests
204b980
Select commit
Loading
Failed to load commit list.
@sentry/warden / warden: find-bugs completed Apr 17, 2026 in 23m 26s

9 issues

find-bugs: Found 9 issues (1 high, 6 medium, 2 low)

High

Test accesses wrong payload key 'data' instead of 'attributes', causing KeyError - `tests/tracing/test_misc.py:628`

The test was refactored to use capture_items('span') instead of capture_events(), but continues to access spans[0]['data']. The capture_items fixture transforms span payloads to have an 'attributes' key (see conftest.py lines 361-367), not 'data'. This will cause a KeyError when the test runs, making the test fail.

Also found at:

  • tests/integrations/pydantic_ai/test_pydantic_ai.py:830-832

Medium

V2 GenAI spans are missing metadata because `event` is used instead of `event_opt` - `sentry_sdk/client.py:230`

At line 1130, _serialized_v1_span_to_serialized_v2_span(span, event) passes the original event instead of the processed event_opt. The function extracts metadata like user info, release, environment, SDK info, and transaction name from the event parameter. Since event has not been through _prepare_event(), these fields may be missing or unpopulated (e.g., release, environment, sdk are set in _prepare_event at lines 811-817). This causes V2 GenAI span attributes to be incomplete compared to what the V1 spans in the same transaction would have.

Also found at:

  • sentry_sdk/client.py:1130
Test uses incorrect 'attributes' key instead of 'data' for inline_data format - `tests/integrations/google_genai/test_google_genai.py:2153`

The test test_extract_contents_messages_dict_inline_data was changed to use "attributes" as the key for binary data, but the Google GenAI API and the implementation in sentry_sdk/ai/utils.py:286 expect "data". The documented input format is {"inline_data": {"mime_type": "...", "data": "..."}}. All other tests in this file and test_ai_monitoring.py consistently use "data". This test now passes incorrectly because the code defaults to an empty string when data is missing, making it not effectively test the inline_data parsing.

Also found at:

  • tests/integrations/google_genai/test_google_genai.py:330
test_multiple_providers does not capture spans but later asserts on them - `tests/integrations/litellm/test_litellm.py:945`

The capture_items("transaction") call on line 945 only captures transaction items, but the test later (line 1020-1023) attempts to filter and assert on span items with [item.payload for item in items if item.type == "span"]. Since spans are never captured into the items list, this assertion will operate on an empty list, making the test ineffective at validating span attributes. The async version test_async_multiple_providers correctly uses capture_items("transaction", "span").

Also found at:

  • tests/integrations/langchain/test_langchain.py:1840-1844
  • tests/integrations/litellm/test_litellm.py:1020-1023
Inconsistent transaction assertion uses 'attributes' instead of 'data' - `tests/integrations/openai_agents/test_openai_agents.py:3560-3562`

Line 3560-3561 checks transaction["contexts"]["trace"].get("attributes", {}) but all other transaction assertions in this file (lines 3359 and 3497) use ["data"] instead of ["attributes"]. This inconsistency means the assertion is checking the wrong field - if transactions use data format, this assertion will always pass (since attributes would be empty/missing), making the test ineffective at detecting bugs.

Missing validation of model_behaviour_error when handled_tool_call_exceptions=False - `tests/integrations/pydantic_ai/test_pydantic_ai.py:490-496`

In test_agent_with_tool_validation_error, the original code validated that a model_behaviour_error event was captured in both the handled_tool_call_exceptions=True and False cases. The new code only unpacks and validates events when handled_tool_call_exceptions=True, completely skipping any verification when False. This means the test no longer verifies that the UnexpectedModelBehavior exception is being captured as an error event when the flag is false, reducing test coverage.

Test can pass vacuously if no tool spans are captured - `tests/integrations/pydantic_ai/test_pydantic_ai.py:964-966`

The test_include_prompts_false_with_tools test iterates over tool_spans with assertions but never validates that tool_spans is non-empty. If no tool spans are captured (e.g., due to a bug in the integration or the tool not being executed), the for loop on line 964 will simply not execute, and the test passes without actually verifying anything. Other similar tests in this file (e.g., lines 351-353, 426-428) include assert len(tool_spans) >= 1 before iterating.

Also found at:

  • tests/integrations/pydantic_ai/test_pydantic_ai.py:993-995

Low

Hardcoded SDK version in test will cause test failure on version bump - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

Line 523 hardcodes "sentry.sdk.version": "2.58.0" while all other test functions in this file use mock.ANY for this field (lines 599, 676, 753, 828, 942, 1038). This test will fail whenever the SDK version is incremented, requiring manual updates to the test file.

Unused fixture parameter in test function - `tests/integrations/openai_agents/test_openai_agents.py:3057`

The function signature for test_openai_agents_message_truncation was changed from capture_events to capture_items, but the test body never calls or uses the capture_items fixture. This appears to be a mechanical find-replace during refactoring that left an unused fixture parameter. While not a runtime bug (pytest will simply inject an unused fixture), it adds unnecessary test setup overhead and may confuse future maintainers.


Duration: 23m 6s · Tokens: 17.9M in / 201.5k out · Cost: $26.67 (+extraction: $0.03, +merge: $0.01, +fix_gate: $0.02)

Annotations

Check failure on line 628 in tests/tracing/test_misc.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Test accesses wrong payload key 'data' instead of 'attributes', causing KeyError

The test was refactored to use `capture_items('span')` instead of `capture_events()`, but continues to access `spans[0]['data']`. The `capture_items` fixture transforms span payloads to have an 'attributes' key (see conftest.py lines 361-367), not 'data'. This will cause a KeyError when the test runs, making the test fail.

Check failure on line 832 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[3V5-L9T] Test accesses wrong payload key 'data' instead of 'attributes', causing KeyError (additional location)

The test was refactored to use `capture_items('span')` instead of `capture_events()`, but continues to access `spans[0]['data']`. The `capture_items` fixture transforms span payloads to have an 'attributes' key (see conftest.py lines 361-367), not 'data'. This will cause a KeyError when the test runs, making the test fail.

Check warning on line 230 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

V2 GenAI spans are missing metadata because `event` is used instead of `event_opt`

At line 1130, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes the original `event` instead of the processed `event_opt`. The function extracts metadata like user info, release, environment, SDK info, and transaction name from the event parameter. Since `event` has not been through `_prepare_event()`, these fields may be missing or unpopulated (e.g., `release`, `environment`, `sdk` are set in `_prepare_event` at lines 811-817). This causes V2 GenAI span attributes to be incomplete compared to what the V1 spans in the same transaction would have.

Check warning on line 1130 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[K7L-QRA] V2 GenAI spans are missing metadata because `event` is used instead of `event_opt` (additional location)

At line 1130, `_serialized_v1_span_to_serialized_v2_span(span, event)` passes the original `event` instead of the processed `event_opt`. The function extracts metadata like user info, release, environment, SDK info, and transaction name from the event parameter. Since `event` has not been through `_prepare_event()`, these fields may be missing or unpopulated (e.g., `release`, `environment`, `sdk` are set in `_prepare_event` at lines 811-817). This causes V2 GenAI span attributes to be incomplete compared to what the V1 spans in the same transaction would have.

Check warning on line 2153 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Test uses incorrect 'attributes' key instead of 'data' for inline_data format

The test `test_extract_contents_messages_dict_inline_data` was changed to use `"attributes"` as the key for binary data, but the Google GenAI API and the implementation in `sentry_sdk/ai/utils.py:286` expect `"data"`. The documented input format is `{"inline_data": {"mime_type": "...", "data": "..."}}`. All other tests in this file and `test_ai_monitoring.py` consistently use `"data"`. This test now passes incorrectly because the code defaults to an empty string when `data` is missing, making it not effectively test the inline_data parsing.

Check warning on line 330 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[R23-VTZ] Test uses incorrect 'attributes' key instead of 'data' for inline_data format (additional location)

The test `test_extract_contents_messages_dict_inline_data` was changed to use `"attributes"` as the key for binary data, but the Google GenAI API and the implementation in `sentry_sdk/ai/utils.py:286` expect `"data"`. The documented input format is `{"inline_data": {"mime_type": "...", "data": "..."}}`. All other tests in this file and `test_ai_monitoring.py` consistently use `"data"`. This test now passes incorrectly because the code defaults to an empty string when `data` is missing, making it not effectively test the inline_data parsing.

Check warning on line 945 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

test_multiple_providers does not capture spans but later asserts on them

The `capture_items("transaction")` call on line 945 only captures transaction items, but the test later (line 1020-1023) attempts to filter and assert on span items with `[item.payload for item in items if item.type == "span"]`. Since spans are never captured into the items list, this assertion will operate on an empty list, making the test ineffective at validating span attributes. The async version `test_async_multiple_providers` correctly uses `capture_items("transaction", "span")`.

Check warning on line 1844 in tests/integrations/langchain/test_langchain.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[ZRZ-HUZ] test_multiple_providers does not capture spans but later asserts on them (additional location)

The `capture_items("transaction")` call on line 945 only captures transaction items, but the test later (line 1020-1023) attempts to filter and assert on span items with `[item.payload for item in items if item.type == "span"]`. Since spans are never captured into the items list, this assertion will operate on an empty list, making the test ineffective at validating span attributes. The async version `test_async_multiple_providers` correctly uses `capture_items("transaction", "span")`.

Check warning on line 1023 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[ZRZ-HUZ] test_multiple_providers does not capture spans but later asserts on them (additional location)

The `capture_items("transaction")` call on line 945 only captures transaction items, but the test later (line 1020-1023) attempts to filter and assert on span items with `[item.payload for item in items if item.type == "span"]`. Since spans are never captured into the items list, this assertion will operate on an empty list, making the test ineffective at validating span attributes. The async version `test_async_multiple_providers` correctly uses `capture_items("transaction", "span")`.

Check warning on line 3562 in tests/integrations/openai_agents/test_openai_agents.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Inconsistent transaction assertion uses 'attributes' instead of 'data'

Line 3560-3561 checks `transaction["contexts"]["trace"].get("attributes", {})` but all other transaction assertions in this file (lines 3359 and 3497) use `["data"]` instead of `["attributes"]`. This inconsistency means the assertion is checking the wrong field - if transactions use `data` format, this assertion will always pass (since `attributes` would be empty/missing), making the test ineffective at detecting bugs.

Check warning on line 496 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Missing validation of model_behaviour_error when handled_tool_call_exceptions=False

In `test_agent_with_tool_validation_error`, the original code validated that a `model_behaviour_error` event was captured in both the `handled_tool_call_exceptions=True` and `False` cases. The new code only unpacks and validates events when `handled_tool_call_exceptions=True`, completely skipping any verification when `False`. This means the test no longer verifies that the `UnexpectedModelBehavior` exception is being captured as an error event when the flag is false, reducing test coverage.

Check warning on line 966 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

Test can pass vacuously if no tool spans are captured

The `test_include_prompts_false_with_tools` test iterates over `tool_spans` with assertions but never validates that `tool_spans` is non-empty. If no tool spans are captured (e.g., due to a bug in the integration or the tool not being executed), the for loop on line 964 will simply not execute, and the test passes without actually verifying anything. Other similar tests in this file (e.g., lines 351-353, 426-428) include `assert len(tool_spans) >= 1` before iterating.

Check warning on line 995 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

[AM9-8AW] Test can pass vacuously if no tool spans are captured (additional location)

The `test_include_prompts_false_with_tools` test iterates over `tool_spans` with assertions but never validates that `tool_spans` is non-empty. If no tool spans are captured (e.g., due to a bug in the integration or the tool not being executed), the for loop on line 964 will simply not execute, and the test passes without actually verifying anything. Other similar tests in this file (e.g., lines 351-353, 426-428) include `assert len(tool_spans) >= 1` before iterating.