Skip to content

common tests

a54cab4
Select commit
Loading
Failed to load commit list.
Draft

feat: Send GenAI spans as V2 envelope items #6079

common tests
a54cab4
Select commit
Loading
Failed to load commit list.
@sentry/warden / warden: code-review completed Apr 17, 2026 in 20m 48s

7 issues

code-review: Found 7 issues (2 high, 4 medium, 1 low)

High

Span assertions never execute in test_multiple_providers due to missing span capture - `tests/integrations/litellm/test_litellm.py:1020-1023`

The sync test_multiple_providers function calls capture_items("transaction") at line 945, but the new code at line 1020 filters for item.type == "span". Since spans are never captured, the spans list will always be empty, causing the for-loop to never execute and the SPANDATA.GEN_AI_SYSTEM assertion to be silently skipped. The async version was correctly updated to capture_items("transaction", "span") at line 1040, but the sync version was missed.

Also found at:

  • tests/integrations/litellm/test_litellm.py:945
test_message_history mixes incompatible span formats causing test to silently pass without validation - `tests/integrations/pydantic_ai/test_pydantic_ai.py:830-840`

In test_message_history, spans are extracted from second_transaction["spans"] (line 830, old format with op field), but then filtered using s["attributes"].get("sentry.op", "") (lines 831-833, new V2 format). Since spans from transaction["spans"] use span["op"] not span["attributes"]["sentry.op"], the filter will never match any spans, making chat_spans empty. The subsequent assertions are wrapped in if chat_spans: and if "gen_ai.request.messages" in chat_span["attributes"]:, so the test silently passes without actually verifying anything.

Also found at:

  • tests/integrations/openai_agents/test_openai_agents.py:3560-3562

Medium

New span conversion functions lack test coverage - `sentry_sdk/client.py:102-298`

The new functions _serialized_v1_attribute_to_serialized_v2_attribute, _serialized_v1_span_to_serialized_v2_span, and _split_gen_ai_spans are not covered by dedicated unit tests. While the integration tests in the PR may exercise these code paths indirectly, direct unit tests would help verify edge cases like None values, empty lists, heterogeneous arrays, and invalid timestamp formats.

Also found at:

  • sentry_sdk/client.py:1130
Test uses incorrect key 'attributes' instead of 'data' for inline_data - `tests/integrations/google_genai/test_google_genai.py:2153`

The test data structure uses {"inline_data": {"attributes": b"binary_data", ...}} but Google GenAI's inline_data format uses data not attributes. The production code in sentry_sdk/ai/utils.py:286 retrieves inline_data.get("data", ""). Other tests in this same file (lines 1765, 1806) correctly use {"inline_data": {"data": ..., "mime_type": ...}}. This test will fail to properly exercise the code path for extracting blob content.

Also found at:

  • tests/integrations/google_genai/test_google_genai.py:330
Hardcoded SDK version will cause test failure when version changes - `tests/integrations/huggingface_hub/test_huggingface_hub.py:523`

The test test_text_generation hardcodes "sentry.sdk.version": "2.58.0" in expected_data, while the other tests in this file (test_text_generation_streaming, test_chat_completion) correctly use mock.ANY. This will cause the test to fail when the SDK version is bumped, introducing test fragility.

Unused list comprehension results in no actual test validation - `tests/integrations/langchain/test_langchain.py:1840-1844`

The list comprehension on lines 1840-1844 creates a list of error events but doesn't assign it to any variable, making it dead code with no effect. The original code also had this issue (assigning to [e for e in events...] without using e), but this refactoring maintains the bug. This means test_langchain_embeddings_error_handling doesn't actually validate that errors are captured properly.

Low

Test assertions may be silently skipped if no chat_spans are found - `tests/integrations/pydantic_ai/test_pydantic_ai.py:993-995`

In test_include_prompts_requires_pii, the test loops over chat_spans to verify assertions, but doesn't verify that chat_spans has at least one element. If no chat spans are generated due to a bug, the test would pass without actually testing anything. Consider adding assert len(chat_spans) >= 1 before the loop to ensure the test is actually validating the expected behavior.


Duration: 20m 35s · Tokens: 13.5M in / 167.6k out · Cost: $20.22 (+extraction: $0.01, +merge: $0.00, +fix_gate: $0.01)

Annotations

Check failure on line 1023 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Span assertions never execute in test_multiple_providers due to missing span capture

The sync `test_multiple_providers` function calls `capture_items("transaction")` at line 945, but the new code at line 1020 filters for `item.type == "span"`. Since spans are never captured, the `spans` list will always be empty, causing the for-loop to never execute and the `SPANDATA.GEN_AI_SYSTEM` assertion to be silently skipped. The async version was correctly updated to `capture_items("transaction", "span")` at line 1040, but the sync version was missed.

Check failure on line 945 in tests/integrations/litellm/test_litellm.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[UEC-9WY] Span assertions never execute in test_multiple_providers due to missing span capture (additional location)

The sync `test_multiple_providers` function calls `capture_items("transaction")` at line 945, but the new code at line 1020 filters for `item.type == "span"`. Since spans are never captured, the `spans` list will always be empty, causing the for-loop to never execute and the `SPANDATA.GEN_AI_SYSTEM` assertion to be silently skipped. The async version was correctly updated to `capture_items("transaction", "span")` at line 1040, but the sync version was missed.

Check failure on line 840 in tests/integrations/pydantic_ai/test_pydantic_ai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

test_message_history mixes incompatible span formats causing test to silently pass without validation

In test_message_history, spans are extracted from `second_transaction["spans"]` (line 830, old format with `op` field), but then filtered using `s["attributes"].get("sentry.op", "")` (lines 831-833, new V2 format). Since spans from transaction["spans"] use `span["op"]` not `span["attributes"]["sentry.op"]`, the filter will never match any spans, making `chat_spans` empty. The subsequent assertions are wrapped in `if chat_spans:` and `if "gen_ai.request.messages" in chat_span["attributes"]:`, so the test silently passes without actually verifying anything.

Check failure on line 3562 in tests/integrations/openai_agents/test_openai_agents.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[NK9-2UC] test_message_history mixes incompatible span formats causing test to silently pass without validation (additional location)

In test_message_history, spans are extracted from `second_transaction["spans"]` (line 830, old format with `op` field), but then filtered using `s["attributes"].get("sentry.op", "")` (lines 831-833, new V2 format). Since spans from transaction["spans"] use `span["op"]` not `span["attributes"]["sentry.op"]`, the filter will never match any spans, making `chat_spans` empty. The subsequent assertions are wrapped in `if chat_spans:` and `if "gen_ai.request.messages" in chat_span["attributes"]:`, so the test silently passes without actually verifying anything.

Check warning on line 298 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

New span conversion functions lack test coverage

The new functions `_serialized_v1_attribute_to_serialized_v2_attribute`, `_serialized_v1_span_to_serialized_v2_span`, and `_split_gen_ai_spans` are not covered by dedicated unit tests. While the integration tests in the PR may exercise these code paths indirectly, direct unit tests would help verify edge cases like None values, empty lists, heterogeneous arrays, and invalid timestamp formats.

Check warning on line 1130 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[KML-HGS] New span conversion functions lack test coverage (additional location)

The new functions `_serialized_v1_attribute_to_serialized_v2_attribute`, `_serialized_v1_span_to_serialized_v2_span`, and `_split_gen_ai_spans` are not covered by dedicated unit tests. While the integration tests in the PR may exercise these code paths indirectly, direct unit tests would help verify edge cases like None values, empty lists, heterogeneous arrays, and invalid timestamp formats.

Check warning on line 2153 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Test uses incorrect key 'attributes' instead of 'data' for inline_data

The test data structure uses `{"inline_data": {"attributes": b"binary_data", ...}}` but Google GenAI's inline_data format uses `data` not `attributes`. The production code in `sentry_sdk/ai/utils.py:286` retrieves `inline_data.get("data", "")`. Other tests in this same file (lines 1765, 1806) correctly use `{"inline_data": {"data": ..., "mime_type": ...}}`. This test will fail to properly exercise the code path for extracting blob content.

Check warning on line 330 in tests/integrations/google_genai/test_google_genai.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

[G6N-XFX] Test uses incorrect key 'attributes' instead of 'data' for inline_data (additional location)

The test data structure uses `{"inline_data": {"attributes": b"binary_data", ...}}` but Google GenAI's inline_data format uses `data` not `attributes`. The production code in `sentry_sdk/ai/utils.py:286` retrieves `inline_data.get("data", "")`. Other tests in this same file (lines 1765, 1806) correctly use `{"inline_data": {"data": ..., "mime_type": ...}}`. This test will fail to properly exercise the code path for extracting blob content.

Check warning on line 523 in tests/integrations/huggingface_hub/test_huggingface_hub.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Hardcoded SDK version will cause test failure when version changes

The test `test_text_generation` hardcodes `"sentry.sdk.version": "2.58.0"` in expected_data, while the other tests in this file (`test_text_generation_streaming`, `test_chat_completion`) correctly use `mock.ANY`. This will cause the test to fail when the SDK version is bumped, introducing test fragility.

Check warning on line 1844 in tests/integrations/langchain/test_langchain.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: code-review

Unused list comprehension results in no actual test validation

The list comprehension on lines 1840-1844 creates a list of error events but doesn't assign it to any variable, making it dead code with no effect. The original code also had this issue (assigning to `[e for e in events...]` without using `e`), but this refactoring maintains the bug. This means `test_langchain_embeddings_error_handling` doesn't actually validate that errors are captured properly.