feat(data-collection): create DataCollection option in client#6702
feat(data-collection): create DataCollection option in client#6702ericapisani wants to merge 16 commits into
3 issues
code-review: Found 4 issues (2 medium, 2 low)
Medium
`_map_from_send_default_pii` gates `graphql.document` on `send_default_pii`, contradicting its own docstring - `sentry_sdk/client.py:26`
The docstring of _map_from_send_default_pii states that graphql.variables and database.query_params follow send_default_pii while graphql.document should stay True (the document itself is not treated as PII). The implementation instead sets "graphql": {"document": send_default_pii, "variables": send_default_pii}, so with the default send_default_pii=False, graphql.document resolves to False and GraphQL documents are silently suppressed. Consider {"document": True, "variables": send_default_pii} to match the documented behavior.
Also found at:
sentry_sdk/data_collection.py:110tests/test_data_collection.py:131-140
`should_send_default_pii()` diverges from `data_collection["user_info"]` when only `data_collection` is set - `sentry_sdk/client.py:357`
The EventScrubber at line 357 now correctly uses data_collection["user_info"], but _Client.should_send_default_pii() (line 747) still reads directly from options["send_default_pii"]. Any user who sets data_collection={"user_info": True} without also setting send_default_pii=True will get a scrubber that allows PII through, while every integration calling should_send_default_pii() (fastapi, sanic, asgi, strawberry, rq, huey, etc.) will still return False and suppress PII collection — making the new option effectively broken for those integrations.
Also found at:
sentry_sdk/client.py:752-753
Low
`BaseClient.data_collection` returns a shared mutable module-level dict - `sentry_sdk/client.py:440-441`
BaseClient.data_collection (client.py:441) returns the module-level _DISABLED_DATA_COLLECTION_CONFIG dict directly, without copying. This dict contains nested mutable objects (a list for http_bodies, dicts for cookies/http_headers/query_params/etc.). It is exposed publicly via get_data_collection() / Scope.get_client().data_collection. If any future consumer mutates a nested field (e.g. client.data_collection["http_bodies"].append(...)), it would silently corrupt the shared constant for all callers of the non-recording/pre-init client path. No current caller mutates the returned structure, so this is a defensive hardening concern rather than an active bug. Consider returning a shallow/deep copy or an immutable structure.
Deprecation warning stacklevel too low — points to SDK internals, not user code - `sentry_sdk/consts.py:1437`
The DeprecationWarning warning that send_default_pii is deprecated/ignored when data_collection is set uses stacklevel=2 in resolve_data_collection() (sentry_sdk/data_collection.py). Because the warning is emitted directly inside resolve_data_collection, stacklevel=2 attributes the warning to its immediate caller _get_options() in client.py, rather than the user's sentry_sdk.init() call, making the warning harder to act on.
⏱ 9m 12s · 3.5M in / 103.0k out · $4.01
Annotations
Check warning on line 26 in sentry_sdk/client.py
sentry-warden / warden: code-review
`_map_from_send_default_pii` gates `graphql.document` on `send_default_pii`, contradicting its own docstring
The docstring of `_map_from_send_default_pii` states that `graphql.variables` and `database.query_params` follow `send_default_pii` while `graphql.document` should stay `True` (the document itself is not treated as PII). The implementation instead sets `"graphql": {"document": send_default_pii, "variables": send_default_pii}`, so with the default `send_default_pii=False`, `graphql.document` resolves to `False` and GraphQL documents are silently suppressed. Consider `{"document": True, "variables": send_default_pii}` to match the documented behavior.
Check warning on line 110 in sentry_sdk/data_collection.py
sentry-warden / warden: code-review
[AD7-SAE] `_map_from_send_default_pii` gates `graphql.document` on `send_default_pii`, contradicting its own docstring (additional location)
The docstring of `_map_from_send_default_pii` states that `graphql.variables` and `database.query_params` follow `send_default_pii` while `graphql.document` should stay `True` (the document itself is not treated as PII). The implementation instead sets `"graphql": {"document": send_default_pii, "variables": send_default_pii}`, so with the default `send_default_pii=False`, `graphql.document` resolves to `False` and GraphQL documents are silently suppressed. Consider `{"document": True, "variables": send_default_pii}` to match the documented behavior.
Check warning on line 140 in tests/test_data_collection.py
sentry-warden / warden: code-review
[AD7-SAE] `_map_from_send_default_pii` gates `graphql.document` on `send_default_pii`, contradicting its own docstring (additional location)
The docstring of `_map_from_send_default_pii` states that `graphql.variables` and `database.query_params` follow `send_default_pii` while `graphql.document` should stay `True` (the document itself is not treated as PII). The implementation instead sets `"graphql": {"document": send_default_pii, "variables": send_default_pii}`, so with the default `send_default_pii=False`, `graphql.document` resolves to `False` and GraphQL documents are silently suppressed. Consider `{"document": True, "variables": send_default_pii}` to match the documented behavior.
Check warning on line 357 in sentry_sdk/client.py
sentry-warden / warden: code-review
`should_send_default_pii()` diverges from `data_collection["user_info"]` when only `data_collection` is set
The `EventScrubber` at line 357 now correctly uses `data_collection["user_info"]`, but `_Client.should_send_default_pii()` (line 747) still reads directly from `options["send_default_pii"]`. Any user who sets `data_collection={"user_info": True}` without also setting `send_default_pii=True` will get a scrubber that allows PII through, while every integration calling `should_send_default_pii()` (fastapi, sanic, asgi, strawberry, rq, huey, etc.) will still return `False` and suppress PII collection — making the new option effectively broken for those integrations.
Check warning on line 753 in sentry_sdk/client.py
sentry-warden / warden: code-review
[DJC-KKP] `should_send_default_pii()` diverges from `data_collection["user_info"]` when only `data_collection` is set (additional location)
The `EventScrubber` at line 357 now correctly uses `data_collection["user_info"]`, but `_Client.should_send_default_pii()` (line 747) still reads directly from `options["send_default_pii"]`. Any user who sets `data_collection={"user_info": True}` without also setting `send_default_pii=True` will get a scrubber that allows PII through, while every integration calling `should_send_default_pii()` (fastapi, sanic, asgi, strawberry, rq, huey, etc.) will still return `False` and suppress PII collection — making the new option effectively broken for those integrations.