Skip to content

common tests

a54cab4
Select commit
Loading
Failed to load commit list.
Draft

feat: Send GenAI spans as V2 envelope items #6079

common tests
a54cab4
Select commit
Loading
Failed to load commit list.
@sentry/warden / warden completed Apr 17, 2026 in 42m 23s

16 issues

High

Span assertions never execute in test_multiple_providers due to missing span capture - `tests/integrations/litellm/test_litellm.py:1020-1023`

The sync test_multiple_providers function calls capture_items("transaction") at line 945, but the new code at line 1020 filters for item.type == "span". Since spans are never captured, the spans list will always be empty, causing the for-loop to never execute and the SPANDATA.GEN_AI_SYSTEM assertion to be silently skipped. The async version was correctly updated to capture_items("transaction", "span") at line 1040, but the sync version was missed.

Also found at:

  • tests/integrations/litellm/test_litellm.py:945
test_message_history mixes incompatible span formats causing test to silently pass without validation - `tests/integrations/pydantic_ai/test_pydantic_ai.py:830-840`

In test_message_history, spans are extracted from second_transaction["spans"] (line 830, old format with op field), but then filtered using s["attributes"].get("sentry.op", "") (lines 831-833, new V2 format). Since spans from transaction["spans"] use span["op"] not span["attributes"]["sentry.op"], the filter will never match any spans, making chat_spans empty. The subsequent assertions are wrapped in if chat_spans: and if "gen_ai.request.messages" in chat_span["attributes"]:, so the test silently passes without actually verifying anything.

Also found at:

  • tests/integrations/openai_agents/test_openai_agents.py:3560-3562
GenAI span conversion uses unprocessed `event` instead of `event_opt` - `sentry_sdk/client.py:1130`

On line 1130, _serialized_v1_span_to_serialized_v2_span(span, event) passes event (the original input parameter) instead of event_opt (the processed event). The _prepare_event method enriches the event with release, environment, sdk info, user data, and trace context. V2 spans converted from genAI spans will be missing this enriched data, causing inconsistency between the transaction and its extracted spans.

Also found at:

  • sentry_sdk/client.py:230

Medium

New span conversion functions lack test coverage - `sentry_sdk/client.py:102-298`

The new functions _serialized_v1_attribute_to_serialized_v2_attribute, _serialized_v1_span_to_serialized_v2_span, and _split_gen_ai_spans are not covered by dedicated unit tests. While the integration tests in the PR may exercise these code paths indirectly, direct unit tests would help verify edge cases like None values, empty lists, heterogeneous arrays, and invalid timestamp formats.

Also found at:

  • sentry_sdk/client.py:1130
Test uses incorrect key 'attributes' instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`

The test data structure uses {"inline_data": {"attributes": b"binary_data", ...}} but Google GenAI's inline_data format uses data not attributes. The production code in sentry_sdk/ai/utils.py:286 retrieves inline_data.get("data", ""). Other tests in this same file (lines 1765, 1806) correctly use {"inline_data": {"data": ..., "mime_type": ...}}. This test will fail to properly exercise the code path for extracting blob content.

Also found at:

  • tests/integrations/google_genai/test_google_genai.py:330
Hardcoded SDK version will cause test failure when version changes - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

The test test_text_generation hardcodes "sentry.sdk.version": "2.58.0" in expected_data, while the other tests in this file (test_text_generation_streaming, test_chat_completion) correctly use mock.ANY. This will cause the test to fail when the SDK version is bumped, introducing test fragility.

Unused list comprehension results in no actual test validation - `tests/integrations/langchain/test_langchain.py:1840-1844`

The list comprehension on lines 1840-1844 creates a list of error events but doesn't assign it to any variable, making it dead code with no effect. The original code also had this issue (assigning to [e for e in events...] without using e), but this refactoring maintains the bug. This means test_langchain_embeddings_error_handling doesn't actually validate that errors are captured properly.

Test uses incorrect 'attributes' field instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`

The test input was changed from {"inline_data": {"data": b"binary_data", ...}} to {"inline_data": {"attributes": b"binary_data", ...}}. However, the transform_google_content_part function in sentry_sdk/ai/utils.py (line 286) expects inline_data.get("data", ""), not attributes. This means the test will no longer properly validate the inline_data handling, as the actual binary data will be ignored and an empty string will be used instead. Other tests in this file (lines 1765, 1806, 1851) correctly use "data".

Also found at:

  • tests/integrations/google_genai/test_google_genai.py:330
Hardcoded SDK version will cause test failures on version bump - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

In test_text_generation, the sentry.sdk.version expected value is hardcoded to "2.58.0" instead of using mock.ANY. All other tests in this file and similar tests in other integration test files use mock.ANY for this field. This test will fail when the SDK version is bumped.

test_multiple_providers fails to capture spans, making provider detection assertion ineffective - `tests/integrations/litellm/test_litellm.py:945`

On line 945, capture_items("transaction") only captures transaction items, but the test later (line 1020-1023) attempts to filter for spans and assert that each span has SPANDATA.GEN_AI_SYSTEM in its attributes. Since spans are not captured, the spans list will always be empty, and the for span in spans: loop never executes any assertions. This makes the provider detection verification ineffective - the test will pass regardless of whether spans are correctly captured.

Also found at:

  • tests/integrations/litellm/test_litellm.py:1020-1023

...and 6 more

4 skills analyzed
Skill Findings Duration Cost
code-review 7 20m 35s $20.19
find-bugs 9 41m 58s $27.82
skill-scanner 0 40m 41s $6.80
security-review 0 33m 44s $5.48

Duration: 136m 58s · Tokens: 38.8M in / 461.5k out · Cost: $60.39 (+extraction: $0.03, +merge: $0.01, +fix_gate: $0.03, +dedup: $0.02)