Derive source-map tuples from Babel's decoded map (#1741)

robhogan · meta-codesync[bot] · commit b658e36e3539 · 2026-06-25T07:46:33.000-07:00
Summary: Pull Request resolved: #1741 The transform worker built its source-map tuples via `result.rawMappings.map(toSegmentTuple)`. Accessing `result.rawMappings` forces `babel/generator` to run a second decode (`allMappings`) that allocates a flat array of ~4-5 objects per segment — even though Babel *already* computed an equivalent decoded map (`result.decodedMap`, the jridgewell/gen-mapping decoded format) eagerly during generation and Metro was discarding it. This swaps the source to `result.decodedMap` via a new `tuplesFromBabelDecodedMap` (decoded source lines are 0-based -> +1, name indices resolved against `decodedMap.names`). Output is byte-identical to `result.rawMappings.map(toSegmentTuple)`, and it eliminates the redundant `allMappings` decode for *every* build (not just compact source maps). This is a standalone, unconditional improvement, so it sits first in the stack ahead of the compact-source-map work, which builds on it. - `metro-source-map`: add `BabelDecodedMap` type + `tuplesFromBabelDecodedMap`. - `metro-transform-worker`: source tuples from `result.decodedMap`. - `babel_v7.x.x` libdef: add `decodedMap` to `GeneratorResult`. Microbenchmark (real `babel/generator` 7.29.1, 133 modules / ~30.6K segments, `--expose-gc`, median of 11): `generate()` alone 20.2 ms; `generate()` + access `decodedMap` 19.2 ms (~0 delta — it's a sunk, eager cost); `generate()` + access `rawMappings` 28.8 ms (+8.6 ms) with ~40% more heap (19.5 vs 13.9 MB). So consuming `decodedMap` drops the `rawMappings`/`allMappings` decode entirely. (`decodedMap` is eager in 7.29.1; even if a future Babel makes it lazy it allocates arrays-of-numbers vs `rawMappings`' nested objects, so it stays <=.) ## E2E benchmark — cold WildeBundle (this diff vs baseline = parent) Interleaved, paired A/B: each of 12 rounds runs one cold build per cell — {baseline, this diff} x {child-process workers, worker threads} — so slow machine drift is shared within each round and cancels in the per-round delta. Fresh Metro per build, transform cache wiped (cold), `maxWorkers=16`, default path (no compact source maps). "Transform CPU" = total user+sys CPU across the whole worker process tree; "tree RSS" = whole-tree resident set (captures workers in both modes); "graph heap" = main-isolate heapUsed post-build (the retained module graph). base/this-diff columns are medians; Δ is the paired mean with a 95% CI (Student-t, 11 df); "n.s." = CI includes 0. Child-process workers (Metro default; 12 paired rounds): | metric | baseline | this diff | Δ (95% CI) | |---|---|---|---| | transform CPU (s) | 625 | 612 | **-16.6 (-2.6%) [-24.7, -8.5]** | | build wall (s) | 65.9 | 65.6 | -0.5 (-0.7%) n.s. | | transient tree RSS (GB) | 15.8 | 16.0 | +0.06, n.s. | | post-build tree RSS (GB) | 15.1 | 15.1 | +0.08, n.s. | | graph heap, main isolate (GB) | 1.59 | 1.59 | ~0, n.s. | Worker threads (`unstable_workerThreads`; 12 paired rounds): | metric | baseline | this diff | Δ (95% CI) | |---|---|---|---| | transform CPU (s) | 664 | 653 | -18.6 (-2.8%) [-37.5, +0.3] | | build wall (s) | 59.8 | 59.5 | -1.2 (-1.9%) n.s. | | transient RSS (GB) | 13.2 | 12.7 | -0.46 (-3.5%) [-0.81, -0.11] | | post-build RSS (GB) | 12.3 | 11.9 | -0.45 (-3.7%) [-0.80, -0.10] | | graph heap, main isolate (GB) | 1.60 | 1.60 | ~0, n.s. | Takeaways: - **Transform CPU drops ~2.6-2.8%, equally in both worker modes** — the point estimates (-16.6 s child-process, -18.6 s threads) agree to within 2 s and their CIs overlap almost entirely, so there is no real asymmetry. This is exactly what the mechanism predicts: the optimization runs *inside* the worker (consume `decodedMap` instead of forcing the `rawMappings`/`allMappings` decode), so the saving is identical whether the worker is a child process or a thread. (An earlier small-n pass suggested a child-process-only win; that was sampling noise — threads-mode CPU is just noisier, SD 30 s vs 13 s, which only widens its CI without moving the point estimate.) - Build wall time is ~1-2% lower in both modes but within noise — the CPU saving is spread across 16 workers, so it moves the critical path little. - Main-isolate post-build heap (the retained graph of stored tuples) is unchanged in every config — no memory regression, byte-identical output. - Transient/post tree RSS shows a ~0.5 GB (~3.5%) reduction that is resolvable only in the lower-variance threads configuration; the noisier child-process configuration (RSS ~16 GB, CI half-width ~0.3 GB) cannot corroborate it, so treat it as suggestive, not established. Harness: `memory-investigation/run-worker-bench-ab.sh` (interleaved A/B) + `worker-bench-measure.js` + `worker-bench-stats.js` (paired CIs), in the base diff of this stack. Worker-threads mode under `js1 run` is GK-gated (`metro_worker_threads`); benched via a local `FORCE_WORKER_THREADS` override (not committed). Reviewed By: huntie, GijsWeterings Differential Revision: D108506323 fbshipit-source-id: 52c05932382b48aeed2b05ca9110d5908ea6ffeb
diff --git a/packages/metro-source-map/src/source-map.js b/packages/metro-source-map/src/source-map.js
@@ -35,6 +35,24 @@ export type MetroSourceMapSegmentTuple =
   | SourceMapping
   | GeneratedCodeMapping;
 
+// A single segment of a standard "decoded" source map (as produced by
+// `@babel/generator`'s `result.decodedMap` / `@jridgewell/gen-mapping`),
+// grouped by generated line. All fields are 0-based, including the source line
+// (unlike Metro's `MetroSourceMapSegmentTuple`, whose source line is 1-based):
+//   [generatedColumn]
+//   [generatedColumn, sourceIndex, sourceLine, sourceColumn]
+//   [generatedColumn, sourceIndex, sourceLine, sourceColumn, nameIndex]
+type BabelDecodedMapSegment =
+  | [number]
+  | [number, number, number, number]
+  | [number, number, number, number, number];
+
+export type BabelDecodedMap = {
+  readonly mappings: ReadonlyArray<ReadonlyArray<BabelDecodedMapSegment>>,
+  readonly names: ReadonlyArray<string>,
+  ...
+};
+
 export type HermesFunctionOffsets = {[number]: ReadonlyArray<number>, ...};
 
 export type FBSourcesArray = ReadonlyArray<?FBSourceMetadata>;
@@ -279,6 +297,51 @@ function toSegmentTuple(
   return [line, column, original.line, original.column, name];
 }
 
+/**
+ * Converts a Babel/gen-mapping "decoded" source map (`result.decodedMap` from
+ * `@babel/generator`) into raw mapping tuples, byte-identical to
+ * `result.rawMappings.map(toSegmentTuple)`.
+ *
+ * Preferred over `result.rawMappings` because `decodedMap` is computed eagerly
+ * during generation, whereas accessing `rawMappings` triggers a second decode
+ * (`allMappings`) that allocates ~4-5 objects per segment. No terminating
+ * mapping is appended (callers that need one use `countLinesAndTerminateMap`).
+ */
+function tuplesFromBabelDecodedMap(
+  decodedMap: BabelDecodedMap,
+): Array<MetroSourceMapSegmentTuple> {
+  const {mappings, names} = decodedMap;
+  const tuples: Array<MetroSourceMapSegmentTuple> = [];
+  for (let line = 0, n = mappings.length; line < n; ++line) {
+    // Decoded mappings are grouped by generated line (0-based); tuples use
+    // 1-based generated lines.
+    const generatedLine = line + 1;
+    const segments = mappings[line];
+    for (let i = 0, m = segments.length; i < m; ++i) {
+      const segment = segments[i];
+      switch (segment.length) {
+        case 1:
+          tuples.push([generatedLine, segment[0]]);
+          break;
+        case 4:
+          // Decoded source lines are 0-based; tuples use 1-based source lines.
+          tuples.push([generatedLine, segment[0], segment[2] + 1, segment[3]]);
+          break;
+        case 5:
+          tuples.push([
+            generatedLine,
+            segment[0],
+            segment[2] + 1,
+            segment[3],
+            names[segment[4]],
+          ]);
+          break;
+      }
+    }
+  }
+  return tuples;
+}
+
 function addMappingsForFile(
   generator: Generator,
   mappings: Array<MetroSourceMapSegmentTuple>,
@@ -349,6 +412,7 @@ export {
   normalizeSourcePath,
   toBabelSegments,
   toSegmentTuple,
+  tuplesFromBabelDecodedMap,
 };
 
 /**
diff --git a/packages/metro-source-map/types/source-map.d.ts b/packages/metro-source-map/types/source-map.d.ts
@@ -6,7 +6,7 @@
  *
  * @noformat
  * @oncall react_native
- * @generated SignedSource<<7303fe7149cb12d764c6106cdf4f49ee>>
+ * @generated SignedSource<<c2fb54d8a5eb6212af899a87f3fa4852>>
  *
  * This file was translated from Flow by scripts/generateTypeScriptDefinitions.js
  * Original file: packages/metro-source-map/src/source-map.js
@@ -35,6 +35,14 @@ export type MetroSourceMapSegmentTuple =
   | SourceMappingWithName
   | SourceMapping
   | GeneratedCodeMapping;
+type BabelDecodedMapSegment =
+  | [number]
+  | [number, number, number, number]
+  | [number, number, number, number, number];
+export type BabelDecodedMap = {
+  readonly mappings: ReadonlyArray<ReadonlyArray<BabelDecodedMapSegment>>;
+  readonly names: ReadonlyArray<string>;
+};
 export type HermesFunctionOffsets = {
   [$$Key$$: number]: ReadonlyArray<number>;
 };
@@ -125,6 +133,19 @@ declare function toBabelSegments(
 declare function toSegmentTuple(
   mapping: BabelSourceMapSegment,
 ): MetroSourceMapSegmentTuple;
+/**
+ * Converts a Babel/gen-mapping "decoded" source map (`result.decodedMap` from
+ * `@babel/generator`) into raw mapping tuples, byte-identical to
+ * `result.rawMappings.map(toSegmentTuple)`.
+ *
+ * Preferred over `result.rawMappings` because `decodedMap` is computed eagerly
+ * during generation, whereas accessing `rawMappings` triggers a second decode
+ * (`allMappings`) that allocates ~4-5 objects per segment. No terminating
+ * mapping is appended (callers that need one use `countLinesAndTerminateMap`).
+ */
+declare function tuplesFromBabelDecodedMap(
+  decodedMap: BabelDecodedMap,
+): Array<MetroSourceMapSegmentTuple>;
 export {
   BundleBuilder,
   composeSourceMaps,
@@ -137,6 +158,7 @@ export {
   normalizeSourcePath,
   toBabelSegments,
   toSegmentTuple,
+  tuplesFromBabelDecodedMap,
 };
 /**
  * Backwards-compatibility with CommonJS consumers using interopRequireDefault.
diff --git a/packages/metro-transform-worker/src/__tests__/tuplesFromBabelDecodedMap-test.js b/packages/metro-transform-worker/src/__tests__/tuplesFromBabelDecodedMap-test.js
@@ -0,0 +1,68 @@
+/**
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ *
+ * This source code is licensed under the MIT license found in the
+ * LICENSE file in the root directory of this source tree.
+ *
+ * @flow strict-local
+ * @format
+ * @oncall react_native
+ */
+
+'use strict';
+
+import generate from '@babel/generator';
+import * as babylon from '@babel/parser';
+import {toSegmentTuple, tuplesFromBabelDecodedMap} from 'metro-source-map';
+
+// The transform worker derives source-map tuples from Babel's eagerly-computed
+// `result.decodedMap` instead of triggering the more expensive `rawMappings`
+// (`allMappings`) decode. This must be byte-identical to the previous
+// `result.rawMappings.map(toSegmentTuple)`.
+const SAMPLES = [
+  `function foo(aaa, bbb) {
+  const ccc = aaa + bbb;
+  return ccc * 2;
+}
+class Bar extends Foo {
+  method(xxx) {
+    return this.value + xxx;
+  }
+}
+export default function entry(items) {
+  const obj = {a: 1, b: 2, c: [1, 2, 3]};
+  return items.map(x => x.value).filter(Boolean);
+}
+`,
+  `const x = require('foo');\nmodule.exports = (a, b) => { let s = 0; for (let i = 0; i < a.length; i++) { s += a[i] * b; } return s; };\n`,
+  `// header\nconst y = 1;\n\n\nfunction z() { return y; }\n`,
+  `const w = 42; const v = w + 1; export {w, v};`,
+  `1 + 1;\n`,
+];
+
+describe('tuplesFromBabelDecodedMap', () => {
+  test.each(SAMPLES.map((code, i) => [i, code]))(
+    'is byte-identical to rawMappings.map(toSegmentTuple) [sample %i]',
+    (_i, code) => {
+      const ast = babylon.parse(code, {sourceType: 'unambiguous'});
+      const result = generate(
+        ast,
+        {sourceMaps: true, sourceFileName: 'file.js'},
+        code,
+      );
+      const fromRaw = (result.rawMappings ?? []).map(toSegmentTuple);
+      const fromDecoded = tuplesFromBabelDecodedMap(
+        nullthrowsLocal(result.decodedMap),
+      );
+      expect(fromDecoded).toEqual(fromRaw);
+      expect(fromDecoded.length).toBeGreaterThan(0);
+    },
+  );
+});
+
+function nullthrowsLocal<T>(x: ?T): T {
+  if (x == null) {
+    throw new Error('Expected decodedMap to be present');
+  }
+  return x;
+}
diff --git a/packages/metro-transform-worker/src/index.js b/packages/metro-transform-worker/src/index.js
@@ -46,6 +46,7 @@ import {
   functionMapBabelPlugin,
   toBabelSegments,
   toSegmentTuple,
+  tuplesFromBabelDecodedMap,
 } from 'metro-source-map';
 import metroTransformPlugins from 'metro-transform-plugins';
 import collectDependencies from 'metro/private/ModuleGraph/worker/collectDependencies';
@@ -471,7 +472,12 @@ async function transformJS(
     file.code,
   );
 
-  let map = result.rawMappings ? result.rawMappings.map(toSegmentTuple) : [];
+  // Derive tuples from Babel's eagerly-computed decoded map rather than
+  // `result.rawMappings`, which would trigger a second, more expensive decode
+  // (`allMappings`). Byte-identical to `result.rawMappings.map(toSegmentTuple)`.
+  let map = result.decodedMap
+    ? tuplesFromBabelDecodedMap(result.decodedMap)
+    : [];
   let code = result.code;
 
   if (minify) {