Skip to content

Commit 7abc153

Browse files
committed
rfc(feature): SDK-symbolicated stack-frames
1 parent 1292d6b commit 7abc153

File tree

3 files changed

+166
-0
lines changed

3 files changed

+166
-0
lines changed
60.6 KB
Loading
64.1 KB
Loading
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
- Start Date: 2026-02-27
2+
- RFC Type: feature
3+
- RFC PR: https://github.com/getsentry/rfcs/pull/XXXX
4+
- RFC Status: draft
5+
- RFC Author: @supervacuus
6+
- RFC Approver:
7+
8+
# Summary
9+
10+
This RFC proposes a mechanism for SDKs to mark stack frames as already symbolicated on the client side, so that the backend (processing/symbolicator) can skip symbolication for those frames. This avoids wasted symbolicator resources, prevents false-positive "missing debug symbols" errors in the UI, and gives SDKs a first-class way to communicate that a frames symbol and associated module/image should be taken at face value and be considered missing.
11+
12+
# Motivation
13+
14+
When an SDK symbolicates a native stack frame on the client (e.g. because the debug symbols are available locally but not on the server), the backend currently has no way to know this. Processing and symbolicator still attempt to symbolicate every native frame. This leads to two concrete problems:
15+
16+
1. **Wasted resources:** Symbolicator attempts symbolication for frames that are already fully resolved, consuming resources unnecessarily. This might be an operational non-issue and I only add it for completeness.
17+
18+
2. **Incorrect and misleading UI errors:** When symbolicator cannot find debug symbols for an already-symbolicated frame, it sets `symbolicator_status: "missing"` and surfaces an error telling the user to upload the debug symbols for the associated module. In most cases this is **by design**: users typically do not want to add symbol-tables or debug-information to their deployment artifacts and usually release a stripped artifact and separately upload debug-information once per release to Sentry. In such a setup the client cannot symbolicate anyway and a `"missing"` status is sensible feedback. However, there are situations where the opposite is true: the symbols only exist on the client device (e.g. system libraries on end-user devices) and there is no realistic chance of collecting these upfront. The user sees a broken-looking stack trace and a confusing call-to-action that doesn't apply.
19+
20+
This problem is not theoretical. It already manifests today in at least two concrete scenarios:
21+
22+
- **Tombstone / native crash reporting**: Native SDKs that symbolicate system library frames on-device before sending the event. The backend cannot distinguish these from unsymbolicated frames and flags them as broken since it cannot find any symbol information in its stores. On Android in particular we usually have both situations effect: native user libraries will be packaged stripped and `Symbolicator` must resolve associated frames (i.e., the UI warning is sensible if symbol/debug-info is missing), but system and framework libraries usually will be symbolicated on-device and `Symbolicator` won't have access to data to further enrich the stack frame. In that case, the UI error is misleading and inactionable to the user.
23+
- **.NET SDK**: The .NET SDK resolves function names, file paths, and line numbers locally using portable PDB metadata. When these frames arrive at the backend, symbolicator still attempts to process them. Because the user hasn't uploaded PDB files to Sentry (the SDK already did the work), every frame gets `symbolicator_status: "missing"` and the UI shows a misleading symbolication error banner. This is tracked in [getsentry/sentry#97054](https://github.com/getsentry/sentry/issues/97054).
24+
25+
![Screenshot symbols rendered as missing although symbolicate](./xxxx-missing-symbols-screenshot.png)
26+
![Screenshot of images missing although no action required](./xxxx-missing-modules-screenshot.png)
27+
28+
# Background
29+
30+
## How `symbolicator_status` works today
31+
32+
After symbolicator processes an event, each native frame receives a `symbolicator_status` stored in `frame.data`. The status values include:
33+
34+
- `"symbolicated"`: symbolicator successfully resolved the frame.
35+
- `"missing"`: symbolicator could not find the required debug symbols.
36+
- `"unknown"`: symbolicator could not process the frame for other reasons.
37+
38+
This status is set exclusively by the backend during processing. It currently cannot be influenced by SDKs.
39+
40+
Relevant code: [`sentry/lang/native/processing.py`](https://github.com/getsentry/sentry/blob/422487ea4acad23710cd1fe5392a5b684e09c2e4/src/sentry/lang/native/processing.py#L129-L131)
41+
42+
## Frame handling in processing
43+
44+
The native frame handler in `processing.py` iterates over all frames and unconditionally sends them through symbolicator. There is already a code path that exits frame handling early when certain conditions are met:
45+
46+
[`processing.py` L378-L379](https://github.com/getsentry/sentry/blob/422487ea4acad23710cd1fe5392a5b684e09c2e4/src/sentry/lang/native/processing.py#L378-L379): this early-exit is what we want to trigger for SDK-symbolicated frames, accepting them at face value without attempting further symbolication.
47+
48+
## Existing workarounds
49+
50+
### Hard-coded exceptions in the monolith
51+
52+
A `FIXME` in `processing.py` adds a special case for .NET, which never had debug images before but can send fully symbolicated events from the SDK. This was added to prevent false symbolication errors when .NET started sending debug images. The `FIXME` tag indicates this was not considered the right long-term approach.
53+
54+
Notably, the .NET SDK has since evolved to **send debug images** (with `type: "pe_dotnet"`), because this enables symbolicator to fetch source context via the Microsoft symbol server and SourceLink. Once debug images are present, however, the FIXME's original precondition ("no debug images -> skip") no longer triggers. Symbolicator now sees debug images, attempts to resolve symbols for *all* frames, including user-code frames that the SDK already symbolicated locally, and marks them as `"missing"` because the PDB was never uploaded. This is exactly the scenario reported in [getsentry/sentry#97054](https://github.com/getsentry/sentry/issues/97054): the FIXME handled the old .NET world correctly, but the SDK outgrew the workaround.
55+
56+
Relevant code: [`processing.py` L149-L158](https://github.com/getsentry/sentry/blob/422487ea4acad23710cd1fe5392a5b684e09c2e4/src/sentry/lang/native/processing.py#L149-L158)
57+
58+
See also: [getsentry/sentry#46955](https://github.com/getsentry/sentry/issues/46955): "Remove error banner for non-app symbols"
59+
60+
### Third-party library detection
61+
62+
There is a `is_known_third_party()` check that suppresses missing-symbol errors for recognized system libraries (e.g. iOS system frameworks). This is a denylist approach and does not scale to arbitrary platforms or deployments.
63+
64+
### Passing `symbolicator_status` from the SDK
65+
66+
In theory, an SDK could set `symbolicator_status` directly on the frame's `data` dict. Since `data` is part of the stacktrace `Frame` and falls into relay's untyped "other" catch-all ([`relay-event-schema/.../stacktrace.rs` L200-202](https://github.com/getsentry/relay/blob/55c59cf75d3c35bbbb66df14072d147eca056bd7/relay-event-schema/src/protocol/stacktrace.rs#L200-L202)), it would technically be forwarded. However, using untyped catch-all fields for regular SDK usage is explicitly frowned upon and for good reason.
67+
68+
## Scope
69+
70+
This RFC focuses on the **stack trace display and symbolication** aspect of the problem. Specifically: how can an SDK tell the backend "this frame is already symbolicated, don't try again"? It is also assumed that any solution to misattributing "missing symbols" should also rectify attributing associated modules in the debug-meta as missing. The scope intentionally does **not** cover source context fetching: even if a frame is marked as symbolicated, the backend may still want to fetch source files for inline source context display.
71+
72+
## Affected components
73+
74+
This is a cross-cutting concern that touches:
75+
76+
- **SDKs**: All native code handling SDKs (sentry-native, sentry-java/Android NDK, sentry-dotnet, and downstream dependents + any future SDK performing client-side symbolication).
77+
- **Relay / ingestion**: The frame schema needs to either gain a new typed field or explicitly allow SDKs to set existing fields.
78+
- **Processing (monolith)**: `sentry/lang/native/processing.py` needs to honor the new signal and skip symbolication for marked frames.
79+
- **Symbolicator**: May need awareness of the flag if processing delegates the decision.
80+
- **UI**: Should stop showing misleading "missing symbols" errors for frames that are intentionally client-symbolicated.
81+
82+
# Options Considered
83+
84+
## Option A: New typed frame attribute: `symbolicated` (currently preferred)
85+
86+
Add a new boolean field `symbolicated` (or similar, can be renamed on the server to `client_symbolicated` analog to `in_app` -> `client_in_app`) to the frame schema in relay. When set to `true`, processing skips symbolication for that frame and treats it as already resolved. We might also make this an enum, to give the client finer grained control over the level frame enrichment, but I currently see no immediate scenario.
87+
88+
**Changes required:**
89+
90+
- **relay-event-schema**: Add `symbolicated: bool` to `Frame` (typed, not catch-all).
91+
- **SDKs**: Set `symbolicated: true` on frames the SDK has symbolicated.
92+
- **processing.py**: Check `symbolicated` early in frame handling; if `true`, set `symbolicator_status` to `"symbolicated"` (or a new value like `"client_symbolicated"`) and skip further processing.
93+
- **UI**: No changes needed if we reuse `"symbolicated"` status. If we introduce a new status value, the UI may want to render it distinctly. But we might be able to remove special-cases.
94+
95+
### Pros
96+
97+
- Clean, explicit, typed and no abuse of catch-all fields.
98+
- Can be adopted incrementally by different SDKs.
99+
- Clear contract between SDK and backend.
100+
- Minimal risk of side effects on existing flows.
101+
- Allows removal of special-case(s) in UI.
102+
103+
### Cons
104+
105+
- Requires a relay schema change.
106+
- New attribute that all parts of the pipeline need to be aware of.
107+
108+
## Option B: Infer from existing frame data (heuristic)
109+
110+
If a frame already has a resolved `function` name (and optionally `filename`, `lineno`) when it arrives from the SDK, processing could skip symbolication. The logic would be: "if the raw frame has a symbol name, don't attempt to re-symbolicate."
111+
112+
**Changes required:**
113+
114+
- **processing.py**: Add a check early in frame handling: if `function` is already present and non-empty, skip symbolication and mark as symbolicated.
115+
116+
### Pros
117+
118+
- No SDK changes required. Works immediately with existing data.
119+
- No schema changes in relay.
120+
- Simplest possible implementation.
121+
122+
### Cons
123+
124+
- Fragile heuristic: a frame might have a `function` from partial client symbolication but still benefit from server-side enrichment (source context, inline frames, demangling).
125+
- Risk of breaking existing workflows where the backend intentionally re-symbolicates client-provided function names (e.g. to add source context or correct demangling).
126+
- Doesn't distinguish between "SDK intentionally symbolicated this" and "SDK sent a partial/best-guess symbol name."
127+
- The .NET `FIXME` is a concrete example of this approach's fragility: a server-side heuristic that worked initially but silently broke when the SDK evolved to send debug images (see Background). Any new heuristic is susceptible to the same drift.
128+
129+
## Option C: Allow SDKs to set `symbolicator_status` directly
130+
131+
Let SDKs explicitly set `symbolicator_status: "symbolicated"` (or a new value) in `frame.data`. Relay would need to explicitly allow this field from SDKs rather than treating it as backend-only.
132+
133+
**Testing has shown that SDK-set values in `frame.data.symbolicator_status` are discarded or overwritten by the backend during processing.** This means Option C is not viable without changes to either relay (to stop stripping/ignoring the field) or processing (to check for an existing value before overwriting).
134+
135+
**Changes required (if pursued despite the above):**
136+
137+
- **relay-event-schema**: Make `symbolicator_status` a recognized field that SDKs may set, and ensure it is not stripped during ingestion.
138+
- **processing.py**: Check if `symbolicator_status` is already set before overwriting it with symbolicator's result.
139+
- **SDKs**: Set `data.symbolicator_status` on symbolicated frames.
140+
141+
### Pros
142+
143+
- Reuses existing infrastructure: no new field needed.
144+
- Processing already checks this field; minimal backend changes.
145+
146+
### Cons
147+
148+
- Blurs the line between SDK-set and backend-set status, making debugging harder.
149+
- Using catch-all fields for regular SDK usage is not desirable.
150+
- If the field is later moved or renamed during a schema cleanup, SDK behavior breaks silently.
151+
- Tested and confirmed not to work today: SDK-set values are discarded/overwritten in the backend.
152+
153+
# Unresolved Questions
154+
155+
- Are there workflow interactions between symbolication, line-number and source lookup that would be prevented by the presented approach?
156+
- What is the interaction with demangling? If an SDK provides a mangled C++ symbol, should the backend still demangle it even if the frame is marked as client-symbolicated?
157+
- Should the UI distinguish between server-symbolicated and client-symbolicated frames (e.g. a subtle indicator), or treat them identically?
158+
- Naming: `symbolicated`, `client_symbolicated`, `sdk_symbolicated`, `symbolication_source`, or something else?
159+
160+
# Related Issues and Prior Art
161+
162+
- [getsentry/sentry#97054](https://github.com/getsentry/sentry/issues/97054): "UI shows symbolication error without indication of what to do about it" (.NET scenario)
163+
- [getsentry/sentry#46955](https://github.com/getsentry/sentry/issues/46955): "Remove error banner for non-app symbols"
164+
- `FIXME(swatinem)` in [`processing.py` L149-158](https://github.com/getsentry/sentry/blob/422487ea4acad23710cd1fe5392a5b684e09c2e4/src/sentry/lang/native/processing.py#L149-L158): Hard-coded .NET exception
165+
- `is_known_third_party()` in [`processing.py`](https://github.com/getsentry/sentry/blob/422487ea4acad23710cd1fe5392a5b684e09c2e4/src/sentry/lang/native/processing.py) - Denylist-based workaround
166+
- Relay frame schema catch-all: [`relay-event-schema/.../stacktrace.rs` L200-202](https://github.com/getsentry/relay/blob/55c59cf75d3c35bbbb66df14072d147eca056bd7/relay-event-schema/src/protocol/stacktrace.rs#L200-L202)

0 commit comments

Comments
 (0)