feat(data-collection): create DataCollection option in client#6702
feat(data-collection): create DataCollection option in client#6702ericapisani wants to merge 16 commits into
4 issues
find-bugs: Found 6 issues (6 low)
Low
http_bodies accepts a bare string and silently converts it to individual characters - `sentry_sdk/_types.py:148`
In _resolve_explicit (sentry_sdk/data_collection.py), a user-supplied http_bodies value is passed directly to list(...) without checking it is a list. A string such as "incoming_request" is iterated character-by-character, producing ['i','n','c',...] instead of raising an error. Unlike key-value modes (validated via _kvcb_from_value against a whitelist), body type strings are never validated against ALL_HTTP_BODY_TYPES, so these bogus single-character entries pass through silently, effectively misconfiguring body collection without any diagnostic to the user.
Also found at:
sentry_sdk/data_collection.py:167-168sentry_sdk/data_collection.py:185-194
DeprecationWarning for `send_default_pii` + `data_collection` conflict points to internal SDK code - `sentry_sdk/client.py:29`
In resolve_data_collection, the warnings.warn(..., stacklevel=2) for the send_default_pii/data_collection conflict uses stacklevel=2, which attributes the warning to the immediate caller of resolve_data_collection. That caller is _get_options() in client.py (the rv["data_collection"] = resolve_data_collection(rv) line), i.e. internal SDK code, not the user's sentry_sdk.init() call site. As a result the DeprecationWarning is unhelpfully attributed to SDK internals, making it harder for users to locate the offending configuration in their own code.
Also found at:
sentry_sdk/data_collection.py:249-254
BaseClient.data_collection returns a mutable shared singleton dict - `sentry_sdk/client.py:440-442`
The property returns _DISABLED_DATA_COLLECTION_CONFIG directly by reference; any caller that mutates it (e.g., client.data_collection["http_bodies"].pop()) would corrupt the module-level singleton for all subsequent callers. Return a shallow copy or freeze the structure to prevent accidental mutation.
Also found at:
sentry_sdk/client.py:391-397
Spotlight re-derivation uses `is not False`, storing `stack_frame_variables=True` for a `None`-valued `include_local_variables` (inconsistent with normal resolution) - `sentry_sdk/client.py:640-643`
In DSN-less spotlight mode, _Client._setup_once re-derives data_collection with include_local_variables=self.options["include_local_variables"] is not False (and the same for include_source_context). When the option is explicitly set to None (permitted by the Optional[bool] typing in consts.py:1312-1313), None is not False evaluates to True, so the re-derived dict stores stack_frame_variables=True/frame_context_lines=5. The normal path resolve_data_collection instead passes the stored None straight through (options.get("include_local_variables", True) returns None), yielding stack_frame_variables=None (falsy). This is opposite to how None is treated everywhere else (utils.py:615 uses a truthy if include_local_variables: check, i.e. None disables capture). Impact is limited: the resolved stack_frame_variables/frame_context_lines values are not yet consumed for actual stacktrace capture (client.py:850 and utils.py read include_local_variables directly), so this is currently an internal-consistency defect in the resolved data_collection dict rather than a behavior change.
Direct key access on `options["data_collection"]` can raise KeyError when unpickling cross-version client state - `sentry_sdk/client.py:761`
The data_collection property returns self.options["data_collection"] with no fallback. The "data_collection" key is only injected by resolve_data_collection() inside _get_options(), which runs exclusively during __init__. When a _Client is unpickled, __setstate__ restores self.options = state["options"] and calls _init_impl() — which never calls _get_options(). Normal same-version pickle round-trips are safe because __getstate__ serializes self.options after it already contains the key. However, unpickling state produced by an older SDK version (serialized before this key existed) leaves self.options without "data_collection", causing a KeyError here (and also at line 637 in _init_impl under DSN-less spotlight mode). Prefer self.options.get("data_collection", _DISABLED_DATA_COLLECTION_CONFIG) to match the defensive pattern used by the adjacent should_send_default_pii() (self.options.get("send_default_pii") or False) and NonRecordingClient.data_collection.
Docstring for _map_from_send_default_pii incorrectly states graphql.document stays True - `sentry_sdk/data_collection.py:86-99`
The docstring (line 93) states that graphql.document stays True regardless of send_default_pii, but the implementation (line 110) sets graphql.document to the value of send_default_pii, so it becomes False when PII collection is off. The docstring misdescribes the actual PII-gating behavior.
Also found at:
tests/test_data_collection.py:196-204
⏱ 20m 18s · 6.9M in / 214.6k out · $7.86