|
| 1 | +- Start Date: 2023-03-21 |
| 2 | +- RFC Type: initiative |
| 3 | +- RFC PR: https://github.com/getsentry/rfcs/pull/81 |
| 4 | +- RFC Status: draft |
| 5 | + |
| 6 | +# Summary / Motivation |
| 7 | + |
| 8 | +We want to make processing / SourceMap-ing of JavaScript stack traces more reliable. |
| 9 | +To achieve this, we want to uniquely identify a (minified / deployed) JavaScript file using a `DebugId`. |
| 10 | +The same `DebugId` also uniquely identifies the corresponding SourceMap. |
| 11 | +That way it should be possible to _reliably_ look up the SourceMap corresponding to |
| 12 | +a JavaScript file. |
| 13 | + |
| 14 | +# Background |
| 15 | + |
| 16 | +It is currently not possible to _reliably_ find the associated SourceMap for a |
| 17 | +JavaScript file. |
| 18 | + |
| 19 | +A JavaScript stack trace only points to the (minified / transformed) source file |
| 20 | +by its URL, such as `https://example.com/file.min.js`, or `/path/to/local/file.min.js`. |
| 21 | + |
| 22 | +The corresponding SourceMap is often referenced using a `sourceMappingURL` comment |
| 23 | +at the end of that file. It is also possible to have a "hidden" SourceMap that is |
| 24 | +not referenced in such a way, but is typically found by its filename `{js_filename}.map`. |
| 25 | + |
| 26 | +However it is not guaranteed that the SourceMap found in such a way actually |
| 27 | +corresponds to the JavaScript file in which the error happened. |
| 28 | + |
| 29 | +A classical example is caching. |
| 30 | + |
| 31 | +1. An end-user is loading version `1` of `https://example.com/file.min.js`. |
| 32 | +2. A new app version `2` is deployed. |
| 33 | +3. The user experiences an error. |
| 34 | +4. The SourceMap at `https://example.com/file.min.js.map` (version `2`) at this point in time does not correspond to |
| 35 | + the code the user was running. |
| 36 | + |
| 37 | +This problem is even worse at Sentry scale, as at any point in time, errors can come in that happened with arbitrary |
| 38 | +versions of the deployed code, sometimes even involving multiple files which might be out-of-sync with each other. |
| 39 | + |
| 40 | +To work around this problem, Sentry has used the combination of `release` and optional `dist` to better associate |
| 41 | +JavaScript files from one release with SourceMaps uploaded to Sentry. |
| 42 | + |
| 43 | +However this solution is still not reliable, as mentioned above, even two files loaded in the end-users browser can |
| 44 | +belong to a different release, due to caching or other reasons. |
| 45 | + |
| 46 | +Using a `DebugId`, which uniquely associates the JavaScript file and its corresponding SourceMap, should make source-mapping |
| 47 | +a lot more reliable. |
| 48 | + |
| 49 | +# Supporting Data |
| 50 | + |
| 51 | +TODO: please fill in the gaps here! |
| 52 | + |
| 53 | +Sentry has used the `release + dist` solution for quite some time and found it inadequate. |
| 54 | +A lot of events are not being resolved correctly due to these mismatches, and problems with source-mapping are very |
| 55 | +common in customer-support interactions. |
| 56 | + |
| 57 | +On the other hand, using a `DebugId` for symbolication of Native crashes and stack traces is working reliably both in |
| 58 | +Sentry and in the wider native ecosystem. The Native and C# community has the concept of _Symbol Servers_, which can |
| 59 | +serve any debug file based on its `DebugId`, which allows reliable symbolication for any release, at any point in time. |
| 60 | + |
| 61 | +# Options Considered |
| 62 | + |
| 63 | +To make `DebugId` work, we need to generate one, and associate it to both the JavaScript file, and its corresponding |
| 64 | +SourceMap. |
| 65 | + |
| 66 | +## The `DebugId` format |
| 67 | + |
| 68 | +The `DebugId` should have the same format as a standard UUI, specifically: |
| 69 | +It should be a 128 bit (16 byte), formatted to a string using base-16 hex encoding like so: |
| 70 | + |
| 71 | +`XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX` |
| 72 | + |
| 73 | +## How to generate `DebugId`s? |
| 74 | + |
| 75 | +There is two options of choosing a `DebugId`: Making it completely random, or make it reproducible by deriving it from |
| 76 | +a content hash. |
| 77 | + |
| 78 | +### Based on JavaScript Content-hash |
| 79 | + |
| 80 | +This creates a new `DebugId` by hashing the contents of the JavaScript file. |
| 81 | + |
| 82 | +**pros** |
| 83 | + |
| 84 | +- Is fully reproducible. The same JavaScript file will always have the same `DebugId`. |
| 85 | +- Works well with existing caching solutions. |
| 86 | + |
| 87 | +**cons** |
| 88 | + |
| 89 | +- Increases overhead in server-side SourceMap processing, as one file can potentially be included in multiple _bundles_. |
| 90 | + See [_What is an `ArtifactBundle`_](#what-is-an-artifactbundle) below. |
| 91 | +- A difference in a source file might not be reflected in the JavaScript file. An example of this might be changes to |
| 92 | + whitespace, comments, or code that was dead-code-eliminated by bundlers. |
| 93 | + |
| 94 | +### Based on SourceMap Content-hash |
| 95 | + |
| 96 | +This creates a new `DebugId` by hashing the contents of the SourceMap file. |
| 97 | + |
| 98 | +**pros** |
| 99 | + |
| 100 | +- Generates a new `DebugId` for changes to source files that would otherwise not lead to changes in the JavaScript file. |
| 101 | + |
| 102 | +**cons** |
| 103 | + |
| 104 | +- Does lead to slightly more cache invalidation. |
| 105 | + |
| 106 | +### Random `DebugId` |
| 107 | + |
| 108 | +This option would create a new random `DebugId` for each file, on each build. |
| 109 | + |
| 110 | +**pros** |
| 111 | + |
| 112 | +- Simpler server-side SourceMap processing, as one `DebugId` is only included in a single _bundle_, and that one bundle |
| 113 | + can serve multiple stack frames for multiple files of the same build. |
| 114 | + |
| 115 | +**cons** |
| 116 | + |
| 117 | +- Completely breaks the concept of _caching_, as every file is unique for every build. |
| 118 | + |
| 119 | +## How to inject the `DebugId` into the JavaScript file? |
| 120 | + |
| 121 | +### `//# debugId` comment |
| 122 | + |
| 123 | +We propose to add a new magic comment to the end of JavaScript files similar to the existing `//# sourceMappingURL` |
| 124 | +comment. It should be at the end of the file, preferable as the line right before the `sourceMappingURL`, as the |
| 125 | +second line from the bottom |
| 126 | + |
| 127 | +It should look like this: |
| 128 | + |
| 129 | +```js |
| 130 | +someRandomJSCode(); |
| 131 | +//# debugId=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX |
| 132 | +//# sourceMappingURL=file.min.js.map |
| 133 | +``` |
| 134 | + |
| 135 | +### Runtime Detection / Resolution of `DebugId` |
| 136 | + |
| 137 | +In a shiny utopian future, Browsers would directly expose builtin APIs to programmatically access each frame of an `Error`s stack. |
| 138 | +This might include the absolute path, the line and column number, and the `DebugId`. |
| 139 | +Though the reality of today is that each browser has its own text-based `Error.stack` format, which might even give |
| 140 | +completely different line and column numbers across the different browsers. |
| 141 | +No programmatic API exists today, and might never exist. At the very least, widespread support for this is years away. |
| 142 | + |
| 143 | +It is therefore necessary to extract this `DebugId` through other means. |
| 144 | + |
| 145 | +#### Reading the `//# debugId` comment when capturing Errors |
| 146 | + |
| 147 | +Current JavaScript stack traces include the absolute path (called `abs_path`) of each stack frame. It should be possible |
| 148 | +to load and inspect that file at runtime whenever an error happens. |
| 149 | + |
| 150 | +**pros** |
| 151 | + |
| 152 | +- Does not require injecting any _code_ into the JavaScript files. |
| 153 | + |
| 154 | +**cons** |
| 155 | + |
| 156 | +- Might incur some async fetching / IO when capturing an Error. Though any `abs_path` in the stack trace should be cached already. |
| 157 | + |
| 158 | +#### Add the `DebugId` to a global at load time |
| 159 | + |
| 160 | +One solution here is to inject a small snippet of JS which will be executed when the JavaScript file is loaded, and adds |
| 161 | +the `DebugId` to a global map. |
| 162 | + |
| 163 | +An example snippet is here: |
| 164 | + |
| 165 | +``` |
| 166 | +!function(){try{var e="undefined"!=typeof window?window:"undefined"!=typeof global?global:"undefined"!=typeof self?self:{},n=(new Error).stack;n&&(e._sentryDebugIds=e._sentryDebugIds||{},e._sentryDebugIds[n]="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")}catch(e){}}() |
| 167 | +``` |
| 168 | + |
| 169 | +This snippet adds a complete `Error.stack` to a global called `_sentryDebugIds`. |
| 170 | +Further post-processing at time of capturing an `Error` is required to extract the `abs_path` from that captured stack. |
| 171 | + |
| 172 | +**pros** |
| 173 | + |
| 174 | +- Does not require any async fetching at time of capturing an `Error`. |
| 175 | + |
| 176 | +**cons** |
| 177 | + |
| 178 | +- It does however require parsing of the `Error.stack` at time of capturing the `Error`. |
| 179 | + |
| 180 | +An alternative implementation might use the [`import.meta.url`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/import.meta) |
| 181 | +property. This would avoid capturing and post-processing an `Error.stack`, but does require usage of ECMAScript Modules. |
| 182 | + |
| 183 | +``` |
| 184 | +((globalThis._sentryDebugIds=globalThis._sentryDebugIds||{})[import.meta.url]="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"); |
| 185 | +``` |
| 186 | + |
| 187 | +**pros** |
| 188 | + |
| 189 | +- More compact snippet. |
| 190 | +- No post-processing required. |
| 191 | + |
| 192 | +**cons** |
| 193 | + |
| 194 | +- Depends on usage of ECMAScript Modules. |
| 195 | + |
| 196 | +## When to inject the `DebugId` into the JavaScript file? |
| 197 | + |
| 198 | +Deploying JavaScript applications can range from a simple _copy files via ftp_ |
| 199 | +to a complex workflow like the following: |
| 200 | + |
| 201 | +```mermaid |
| 202 | +graph TD |
| 203 | + transpile[Transpile source files] --> bundle[Bundle source files] |
| 204 | + bundle --> minify[Minify bundled chunk] |
| 205 | + minify --> fingerprint[Fingerprint minified chunks] |
| 206 | + minify --> sentry[Upload release to Sentry] |
| 207 | + fingerprint --> upload[Upload assets to CDN] |
| 208 | + upload --> propagate[Wait for CDS assets to propagate] |
| 209 | + fingerprint --> deploy[Deploy updated asset references] |
| 210 | + propagate --> deploy |
| 211 | +``` |
| 212 | + |
| 213 | +In this example, assets are _fingerprinted_, and after being fully propagated |
| 214 | +through a global CDN, they are starting to be referenced from the backend |
| 215 | +service via HTML. |
| 216 | + |
| 217 | +This may work with unique content-hash based filenames, and even use _fingerprinting_ and |
| 218 | +[Subresource Integrity (SRI)](https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity). |
| 219 | + |
| 220 | +An example may look like this, for a CDN-deployed and fingerprinted reference |
| 221 | +to [katex](https://katex.org/docs/browser.html#starter-template): |
| 222 | + |
| 223 | +```html |
| 224 | +<script |
| 225 | + defer |
| 226 | + src="https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.js" |
| 227 | + integrity="sha384-PwRUT/YqbnEjkZO0zZxNqcxACrXe+j766U2amXcgMg5457rve2Y7I6ZJSm2A0mS4" |
| 228 | + crossorigin="anonymous" |
| 229 | +></script> |
| 230 | +``` |
| 231 | + |
| 232 | +Not only is the deployment pipeline very complex, it can also involve a variety of tools with varying degree of |
| 233 | +integration between them. |
| 234 | +The example `<script>` tag shown above might be generated as part of one integrated JS bundler tool, or it might be |
| 235 | +generated by a Rust or python backend, based on supplied JSON file. |
| 236 | + |
| 237 | +The checksums themselves might be directly output by a JS bundler tool, or they might be generated by a completely |
| 238 | +different tool at another stage of the build pipeline. |
| 239 | + |
| 240 | +Each application and build pipeline is unique, and there is an ever growing multitude of tools. |
| 241 | +_Insert joke about a new JS bundler being created each week here._ |
| 242 | + |
| 243 | +It is therefore important that whatever comments and/or code we end up injecting into the final JavaScript assets is |
| 244 | +being injected at the right point in this pipeline. Ideally it would be injected **before** fingerprinting happens, and |
| 245 | +**before** any content-hash based naming happens. |
| 246 | + |
| 247 | +As most JavaScript bundlers support automatic bundle-splitting, and will insert dynamic `import` or `require` statements |
| 248 | +referencing those chunks by (fingerprinted) filename, a deep integration into those various bundlers might be needed. |
| 249 | + |
| 250 | +### Injection via `sentry-cli inject` |
| 251 | + |
| 252 | +With this, injection would happen with a new command, `sentry-cli inject`. It will be the responsibility of the developer |
| 253 | +to call this at the appropriate time depending on their unique build pipeline. |
| 254 | + |
| 255 | +**pros** |
| 256 | + |
| 257 | +- Gives full control for build pipelines that involve a heterogenous set of tools and stages. |
| 258 | + |
| 259 | +**cons** |
| 260 | + |
| 261 | +- Requires manually using this command. |
| 262 | +- Does not work with bundlers that integrate fingerprinting. |
| 263 | + |
| 264 | +### Injection at `sentry-cli upload` time |
| 265 | + |
| 266 | +In this scenario, injection happens at the time of `sentry-cli upload`, and will also modify the files at that time. |
| 267 | + |
| 268 | +**pros** |
| 269 | + |
| 270 | +- Makes sure that assets uploaded to Sentry have a `DebugId`. |
| 271 | +- No additional command and invocation needed. |
| 272 | + |
| 273 | +**cons** |
| 274 | + |
| 275 | +- Does not work with bundlers that integrate fingerprinting. |
| 276 | +- Does not work in build pipelines where `sentry-cli upload` is not in the main deployment path. |
| 277 | + |
| 278 | +### Injection via bundler plugins |
| 279 | + |
| 280 | +Here, we would build `DebugId` injection right into the various JavaScript bundlers. This can happen with a third-party |
| 281 | +plugin at first, and might move into the core bundler packages once there is enough community buy-in for `DebugId`s. |
| 282 | + |
| 283 | +Each bundler is unique though, and has different hooks at different stages of its internal pipeline. Some bundlers |
| 284 | +might not have the necessary hooks at the necessary stage at all. |
| 285 | + |
| 286 | +#### Rollup |
| 287 | + |
| 288 | +Rollup has a very comprehensive plugin system, with good documentation about the various hooks and the internal pipeline: |
| 289 | +https://rollupjs.org/plugin-development/#output-generation-hooks |
| 290 | + |
| 291 | +According to the above diagram, the appropriate plugin hook to use might be the |
| 292 | +[`renderChunk`](https://rollupjs.org/plugin-development/#renderchunk) hook, which allows |
| 293 | +access and modification of a chunks `code` and `map` (SourceMap) output. |
| 294 | +This hook runs before the `augmentChunkHash` and `generateBundle` hooks which are responsible for fingerprinting and |
| 295 | +generating the _final_ output for each chunk. |
| 296 | + |
| 297 | +TODO: further investigation and experimentation for this is needed |
| 298 | + |
| 299 | +#### Webpack |
| 300 | + |
| 301 | +Webpack documentation for plugin hooks is not as extensive, and there is no broad overview of the internal pipeline and |
| 302 | +phases. There is a general overview of all the `Compilation` hooks though: |
| 303 | +https://webpack.js.org/api/compilation-hooks/ |
| 304 | + |
| 305 | +It might be possible to use the [`processAssets`](https://webpack.js.org/api/compilation-hooks/#processassets) hook |
| 306 | +for this purpose. Documentation mentions the `PROCESS_ASSETS_STAGE_DEV_TOOLING` phase which is responsible for |
| 307 | +extracting SourceMaps, or the `PROCESS_ASSETS_STAGE_OPTIMIZE_HASH` which looks to be responsible for generating the |
| 308 | +final fingerprint of an asset. |
| 309 | + |
| 310 | +TODO: further investigation and experimentation for this is needed |
| 311 | + |
| 312 | +#### TODO: other popular bundlers and build tools |
| 313 | + |
| 314 | +## Injecting the `DebugId` into the SourceMap |
| 315 | + |
| 316 | +This is a less controversial part, as SourceMaps are in general not distributed to production, and are less likely to |
| 317 | +be fingerprinted or integrity-checked. They are also plain JSON, making it trivial to inject additional fields. |
| 318 | +We propose to add a new JSON field to the root of the SourceMap object called `debugId`. |
| 319 | +This new field should encode the `DebugId` as a plain string. |
| 320 | + |
| 321 | +# Drawbacks |
| 322 | + |
| 323 | +The main drawback is that this might feel like an invasive change to the JavaScript ecosystem. It is a huge implementation |
| 324 | +burden, and might not be received positively by neither customers nor the wider JS tools ecosystem. |
| 325 | + |
| 326 | +Especially injecting a piece of JavaScript into every production asset might alienate some users. |
| 327 | + |
| 328 | +The effectiveness and success of this initiative needs to be proved out first, and is not guaranteed. |
| 329 | + |
| 330 | +# Unresolved questions |
| 331 | + |
| 332 | +- ~~Why do we call the new SourceMap field `debug_id` and not `debugId`? |
| 333 | + All existing fields in SourceMaps are camelCase, and so is the general convention in the JS ecosystem.~~ |
| 334 | + |
| 335 | +# Implementation |
| 336 | + |
| 337 | +- TODO: link to some implementation breadcrumbs and PRs |
| 338 | +- TODO: change the existing SourceMap implementation to use a camelCased `debugId` instead of the snake_cased `debug_id` field. |
| 339 | + |
| 340 | +--- |
| 341 | + |
| 342 | +# Appendix |
| 343 | + |
| 344 | +## What is an `ArtifactBundle` |
| 345 | + |
| 346 | +Sentry bundles up all the assets of one release / build into a so-called `ArtifactBundle` (also called `SourceBundle`, or `ReleaseBundle`). |
| 347 | + |
| 348 | +This is a special ZIP file which includes all the minified / production JavaScript files, their corresponding SourceMap, |
| 349 | +and the original source files as referenced by the SourceMaps in whatever format (TypeScript or other). |
| 350 | + |
| 351 | +It also has a `manifest.json`, which has more metadata per file, like the type of a file, its `DebugId`, and an optional |
| 352 | +`SourceMap` reference from minified files to their SourceMap. |
| 353 | + |
| 354 | +**pros** |
| 355 | + |
| 356 | +- Customers naturally think in _releases_, so having one archive per release is good. |
| 357 | +- Only needing to download / cache / process a single file for one release can be more efficient. |
| 358 | + |
| 359 | +**cons** |
| 360 | + |
| 361 | +- Does not work well content-hash based `DebugId`s, as one `DebugId` can appear in a multitude of archives. |
| 362 | +- Feels like a workaround for inefficiencies in other parts of the processing pipeline when dealing with more smaller files. |
0 commit comments