Skip to content

Commit 0a6d0ca

Browse files
committed
Reliable JavaScript/SourceMap processing via DebugId
We want to make processing / SourceMap-ing of JavaScript stack traces more reliable. To achieve this, we want to uniquely identify a (minified / deployed) JavaScript file using a DebugId. The same DebugId also uniquely identifies the corresponding SourceMap. That way it should be possible to _reliably_ look up the SourceMap corresponding to a JavaScript file, which is necessary to have reliable SourceMap processing.
1 parent 5fb8214 commit 0a6d0ca

File tree

2 files changed

+363
-0
lines changed

2 files changed

+363
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,4 @@ This repository contains RFCs and DACIs. Lost?
3636
- [0071-continue-trace-over-process-boundaries](text/0071-continue-trace-over-process-boundaries.md): Continue trace over process boundaries
3737
- [0072-kafka-schema-registry](text/0072-kafka-schema-registry.md): Kafka Schema Registry
3838
- [0078-escalating-issues](text/0078-escalating-issues.md): Escalating Issues
39+
- [0081-sourcemap-debugid](text/0081-sourcemap-debugid.md): Reliable JavaScript/SourceMap processing via `DebugId`

text/0081-sourcemap-debugid.md

Lines changed: 362 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,362 @@
1+
- Start Date: 2023-03-21
2+
- RFC Type: initiative
3+
- RFC PR: https://github.com/getsentry/rfcs/pull/81
4+
- RFC Status: draft
5+
6+
# Summary / Motivation
7+
8+
We want to make processing / SourceMap-ing of JavaScript stack traces more reliable.
9+
To achieve this, we want to uniquely identify a (minified / deployed) JavaScript file using a `DebugId`.
10+
The same `DebugId` also uniquely identifies the corresponding SourceMap.
11+
That way it should be possible to _reliably_ look up the SourceMap corresponding to
12+
a JavaScript file.
13+
14+
# Background
15+
16+
It is currently not possible to _reliably_ find the associated SourceMap for a
17+
JavaScript file.
18+
19+
A JavaScript stack trace only points to the (minified / transformed) source file
20+
by its URL, such as `https://example.com/file.min.js`, or `/path/to/local/file.min.js`.
21+
22+
The corresponding SourceMap is often referenced using a `sourceMappingURL` comment
23+
at the end of that file. It is also possible to have a "hidden" SourceMap that is
24+
not referenced in such a way, but is typically found by its filename `{js_filename}.map`.
25+
26+
However it is not guaranteed that the SourceMap found in such a way actually
27+
corresponds to the JavaScript file in which the error happened.
28+
29+
A classical example is caching.
30+
31+
1. An end-user is loading version `1` of `https://example.com/file.min.js`.
32+
2. A new app version `2` is deployed.
33+
3. The user experiences an error.
34+
4. The SourceMap at `https://example.com/file.min.js.map` (version `2`) at this point in time does not correspond to
35+
the code the user was running.
36+
37+
This problem is even worse at Sentry scale, as at any point in time, errors can come in that happened with arbitrary
38+
versions of the deployed code, sometimes even involving multiple files which might be out-of-sync with each other.
39+
40+
To work around this problem, Sentry has used the combination of `release` and optional `dist` to better associate
41+
JavaScript files from one release with SourceMaps uploaded to Sentry.
42+
43+
However this solution is still not reliable, as mentioned above, even two files loaded in the end-users browser can
44+
belong to a different release, due to caching or other reasons.
45+
46+
Using a `DebugId`, which uniquely associates the JavaScript file and its corresponding SourceMap, should make source-mapping
47+
a lot more reliable.
48+
49+
# Supporting Data
50+
51+
TODO: please fill in the gaps here!
52+
53+
Sentry has used the `release + dist` solution for quite some time and found it inadequate.
54+
A lot of events are not being resolved correctly due to these mismatches, and problems with source-mapping are very
55+
common in customer-support interactions.
56+
57+
On the other hand, using a `DebugId` for symbolication of Native crashes and stack traces is working reliably both in
58+
Sentry and in the wider native ecosystem. The Native and C# community has the concept of _Symbol Servers_, which can
59+
serve any debug file based on its `DebugId`, which allows reliable symbolication for any release, at any point in time.
60+
61+
# Options Considered
62+
63+
To make `DebugId` work, we need to generate one, and associate it to both the JavaScript file, and its corresponding
64+
SourceMap.
65+
66+
## The `DebugId` format
67+
68+
The `DebugId` should have the same format as a standard UUI, specifically:
69+
It should be a 128 bit (16 byte), formatted to a string using base-16 hex encoding like so:
70+
71+
`XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX`
72+
73+
## How to generate `DebugId`s?
74+
75+
There is two options of choosing a `DebugId`: Making it completely random, or make it reproducible by deriving it from
76+
a content hash.
77+
78+
### Based on JavaScript Content-hash
79+
80+
This creates a new `DebugId` by hashing the contents of the JavaScript file.
81+
82+
**pros**
83+
84+
- Is fully reproducible. The same JavaScript file will always have the same `DebugId`.
85+
- Works well with existing caching solutions.
86+
87+
**cons**
88+
89+
- Increases overhead in server-side SourceMap processing, as one file can potentially be included in multiple _bundles_.
90+
See [_What is an `ArtifactBundle`_](#what-is-an-artifactbundle) below.
91+
- A difference in a source file might not be reflected in the JavaScript file. An example of this might be changes to
92+
whitespace, comments, or code that was dead-code-eliminated by bundlers.
93+
94+
### Based on SourceMap Content-hash
95+
96+
This creates a new `DebugId` by hashing the contents of the SourceMap file.
97+
98+
**pros**
99+
100+
- Generates a new `DebugId` for changes to source files that would otherwise not lead to changes in the JavaScript file.
101+
102+
**cons**
103+
104+
- Does lead to slightly more cache invalidation.
105+
106+
### Random `DebugId`
107+
108+
This option would create a new random `DebugId` for each file, on each build.
109+
110+
**pros**
111+
112+
- Simpler server-side SourceMap processing, as one `DebugId` is only included in a single _bundle_, and that one bundle
113+
can serve multiple stack frames for multiple files of the same build.
114+
115+
**cons**
116+
117+
- Completely breaks the concept of _caching_, as every file is unique for every build.
118+
119+
## How to inject the `DebugId` into the JavaScript file?
120+
121+
### `//# debugId` comment
122+
123+
We propose to add a new magic comment to the end of JavaScript files similar to the existing `//# sourceMappingURL`
124+
comment. It should be at the end of the file, preferable as the line right before the `sourceMappingURL`, as the
125+
second line from the bottom
126+
127+
It should look like this:
128+
129+
```js
130+
someRandomJSCode();
131+
//# debugId=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
132+
//# sourceMappingURL=file.min.js.map
133+
```
134+
135+
### Runtime Detection / Resolution of `DebugId`
136+
137+
In a shiny utopian future, Browsers would directly expose builtin APIs to programmatically access each frame of an `Error`s stack.
138+
This might include the absolute path, the line and column number, and the `DebugId`.
139+
Though the reality of today is that each browser has its own text-based `Error.stack` format, which might even give
140+
completely different line and column numbers across the different browsers.
141+
No programmatic API exists today, and might never exist. At the very least, widespread support for this is years away.
142+
143+
It is therefore necessary to extract this `DebugId` through other means.
144+
145+
#### Reading the `//# debugId` comment when capturing Errors
146+
147+
Current JavaScript stack traces include the absolute path (called `abs_path`) of each stack frame. It should be possible
148+
to load and inspect that file at runtime whenever an error happens.
149+
150+
**pros**
151+
152+
- Does not require injecting any _code_ into the JavaScript files.
153+
154+
**cons**
155+
156+
- Might incur some async fetching / IO when capturing an Error. Though any `abs_path` in the stack trace should be cached already.
157+
158+
#### Add the `DebugId` to a global at load time
159+
160+
One solution here is to inject a small snippet of JS which will be executed when the JavaScript file is loaded, and adds
161+
the `DebugId` to a global map.
162+
163+
An example snippet is here:
164+
165+
```
166+
!function(){try{var e="undefined"!=typeof window?window:"undefined"!=typeof global?global:"undefined"!=typeof self?self:{},n=(new Error).stack;n&&(e._sentryDebugIds=e._sentryDebugIds||{},e._sentryDebugIds[n]="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")}catch(e){}}()
167+
```
168+
169+
This snippet adds a complete `Error.stack` to a global called `_sentryDebugIds`.
170+
Further post-processing at time of capturing an `Error` is required to extract the `abs_path` from that captured stack.
171+
172+
**pros**
173+
174+
- Does not require any async fetching at time of capturing an `Error`.
175+
176+
**cons**
177+
178+
- It does however require parsing of the `Error.stack` at time of capturing the `Error`.
179+
180+
An alternative implementation might use the [`import.meta.url`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import.meta)
181+
property. This would avoid capturing and post-processing an `Error.stack`, but does require usage of ECMAScript Modules.
182+
183+
```
184+
((globalThis._sentryDebugIds=globalThis._sentryDebugIds||{})[import.meta.url]="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX");
185+
```
186+
187+
**pros**
188+
189+
- More compact snippet.
190+
- No post-processing required.
191+
192+
**cons**
193+
194+
- Depends on usage of ECMAScript Modules.
195+
196+
## When to inject the `DebugId` into the JavaScript file?
197+
198+
Deploying JavaScript applications can range from a simple _copy files via ftp_
199+
to a complex workflow like the following:
200+
201+
```mermaid
202+
graph TD
203+
transpile[Transpile source files] --> bundle[Bundle source files]
204+
bundle --> minify[Minify bundled chunk]
205+
minify --> fingerprint[Fingerprint minified chunks]
206+
minify --> sentry[Upload release to Sentry]
207+
fingerprint --> upload[Upload assets to CDN]
208+
upload --> propagate[Wait for CDS assets to propagate]
209+
fingerprint --> deploy[Deploy updated asset references]
210+
propagate --> deploy
211+
```
212+
213+
In this example, assets are _fingerprinted_, and after being fully propagated
214+
through a global CDN, they are starting to be referenced from the backend
215+
service via HTML.
216+
217+
This may work with unique content-hash based filenames, and even use _fingerprinting_ and
218+
[Subresource Integrity (SRI)](https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity).
219+
220+
An example may look like this, for a CDN-deployed and fingerprinted reference
221+
to [katex](https://katex.org/docs/browser.html#starter-template):
222+
223+
```html
224+
<script
225+
defer
226+
src="https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js"
227+
integrity="sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4"
228+
crossorigin="anonymous"
229+
></script>
230+
```
231+
232+
Not only is the deployment pipeline very complex, it can also involve a variety of tools with varying degree of
233+
integration between them.
234+
The example `<script>` tag shown above might be generated as part of one integrated JS bundler tool, or it might be
235+
generated by a Rust or python backend, based on supplied JSON file.
236+
237+
The checksums themselves might be directly output by a JS bundler tool, or they might be generated by a completely
238+
different tool at another stage of the build pipeline.
239+
240+
Each application and build pipeline is unique, and there is an ever growing multitude of tools.
241+
_Insert joke about a new JS bundler being created each week here._
242+
243+
It is therefore important that whatever comments and/or code we end up injecting into the final JavaScript assets is
244+
being injected at the right point in this pipeline. Ideally it would be injected **before** fingerprinting happens, and
245+
**before** any content-hash based naming happens.
246+
247+
As most JavaScript bundlers support automatic bundle-splitting, and will insert dynamic `import` or `require` statements
248+
referencing those chunks by (fingerprinted) filename, a deep integration into those various bundlers might be needed.
249+
250+
### Injection via `sentry-cli inject`
251+
252+
With this, injection would happen with a new command, `sentry-cli inject`. It will be the responsibility of the developer
253+
to call this at the appropriate time depending on their unique build pipeline.
254+
255+
**pros**
256+
257+
- Gives full control for build pipelines that involve a heterogenous set of tools and stages.
258+
259+
**cons**
260+
261+
- Requires manually using this command.
262+
- Does not work with bundlers that integrate fingerprinting.
263+
264+
### Injection at `sentry-cli upload` time
265+
266+
In this scenario, injection happens at the time of `sentry-cli upload`, and will also modify the files at that time.
267+
268+
**pros**
269+
270+
- Makes sure that assets uploaded to Sentry have a `DebugId`.
271+
- No additional command and invocation needed.
272+
273+
**cons**
274+
275+
- Does not work with bundlers that integrate fingerprinting.
276+
- Does not work in build pipelines where `sentry-cli upload` is not in the main deployment path.
277+
278+
### Injection via bundler plugins
279+
280+
Here, we would build `DebugId` injection right into the various JavaScript bundlers. This can happen with a third-party
281+
plugin at first, and might move into the core bundler packages once there is enough community buy-in for `DebugId`s.
282+
283+
Each bundler is unique though, and has different hooks at different stages of its internal pipeline. Some bundlers
284+
might not have the necessary hooks at the necessary stage at all.
285+
286+
#### Rollup
287+
288+
Rollup has a very comprehensive plugin system, with good documentation about the various hooks and the internal pipeline:
289+
https://rollupjs.org/plugin-development/#output-generation-hooks
290+
291+
According to the above diagram, the appropriate plugin hook to use might be the
292+
[`renderChunk`](https://rollupjs.org/plugin-development/#renderchunk) hook, which allows
293+
access and modification of a chunks `code` and `map` (SourceMap) output.
294+
This hook runs before the `augmentChunkHash` and `generateBundle` hooks which are responsible for fingerprinting and
295+
generating the _final_ output for each chunk.
296+
297+
TODO: further investigation and experimentation for this is needed
298+
299+
#### Webpack
300+
301+
Webpack documentation for plugin hooks is not as extensive, and there is no broad overview of the internal pipeline and
302+
phases. There is a general overview of all the `Compilation` hooks though:
303+
https://webpack.js.org/api/compilation-hooks/
304+
305+
It might be possible to use the [`processAssets`](https://webpack.js.org/api/compilation-hooks/#processassets) hook
306+
for this purpose. Documentation mentions the `PROCESS_ASSETS_STAGE_DEV_TOOLING` phase which is responsible for
307+
extracting SourceMaps, or the `PROCESS_ASSETS_STAGE_OPTIMIZE_HASH` which looks to be responsible for generating the
308+
final fingerprint of an asset.
309+
310+
TODO: further investigation and experimentation for this is needed
311+
312+
#### TODO: other popular bundlers and build tools
313+
314+
## Injecting the `DebugId` into the SourceMap
315+
316+
This is a less controversial part, as SourceMaps are in general not distributed to production, and are less likely to
317+
be fingerprinted or integrity-checked. They are also plain JSON, making it trivial to inject additional fields.
318+
We propose to add a new JSON field to the root of the SourceMap object called `debugId`.
319+
This new field should encode the `DebugId` as a plain string.
320+
321+
# Drawbacks
322+
323+
The main drawback is that this might feel like an invasive change to the JavaScript ecosystem. It is a huge implementation
324+
burden, and might not be received positively by neither customers nor the wider JS tools ecosystem.
325+
326+
Especially injecting a piece of JavaScript into every production asset might alienate some users.
327+
328+
The effectiveness and success of this initiative needs to be proved out first, and is not guaranteed.
329+
330+
# Unresolved questions
331+
332+
- ~~Why do we call the new SourceMap field `debug_id` and not `debugId`?
333+
All existing fields in SourceMaps are camelCase, and so is the general convention in the JS ecosystem.~~
334+
335+
# Implementation
336+
337+
- TODO: link to some implementation breadcrumbs and PRs
338+
- TODO: change the existing SourceMap implementation to use a camelCased `debugId` instead of the snake_cased `debug_id` field.
339+
340+
---
341+
342+
# Appendix
343+
344+
## What is an `ArtifactBundle`
345+
346+
Sentry bundles up all the assets of one release / build into a so-called `ArtifactBundle` (also called `SourceBundle`, or `ReleaseBundle`).
347+
348+
This is a special ZIP file which includes all the minified / production JavaScript files, their corresponding SourceMap,
349+
and the original source files as referenced by the SourceMaps in whatever format (TypeScript or other).
350+
351+
It also has a `manifest.json`, which has more metadata per file, like the type of a file, its `DebugId`, and an optional
352+
`SourceMap` reference from minified files to their SourceMap.
353+
354+
**pros**
355+
356+
- Customers naturally think in _releases_, so having one archive per release is good.
357+
- Only needing to download / cache / process a single file for one release can be more efficient.
358+
359+
**cons**
360+
361+
- Does not work well content-hash based `DebugId`s, as one `DebugId` can appear in a multitude of archives.
362+
- Feels like a workaround for inefficiencies in other parts of the processing pipeline when dealing with more smaller files.

0 commit comments

Comments
 (0)