turbo-tasks: shard task_cache the same as storage.map

lukesandberg · lukesandberg · commit dbced223a31f · 2026-05-10T09:45:05.000-07:00
`task_cache: FxDashMap::default()` falls through to dashmap's default shard
amount of `num_cpus * 4` (e.g. 64 shards on a 14-core machine), while
`storage.map` uses our `compute_shard_amount` heuristic which is quadratic
in worker count for a target ~3% collision probability (e.g. 4096 shards
on the same machine).

The 64× mismatch made `task_cache` lookups self-contend on every cache hit
even when `storage.map` accesses were uncontended. Profiles attributed
~10% of overhead samples to `dashmap::lock_exclusive_slow` on
`task_cache`'s shards, which is implausible for a properly sharded map at
this thread count.

Use `with_capacity_and_hasher_and_shard_amount` on `task_cache` with the
same `shard_amount` we already pass to `Storage::new`.
diff --git a/turbopack/crates/turbo-tasks-backend/src/backend/mod.rs b/turbopack/crates/turbo-tasks-backend/src/backend/mod.rs
@@ -239,7 +239,15 @@ impl<B: BackingStorage> TurboTasksBackendInner<B> {
                 TaskId::try_from(TRANSIENT_TASK_BIT).unwrap(),
                 TaskId::MAX,
             ),
-            task_cache: FxDashMap::default(),
+            // Match `storage.map`'s shard count instead of falling through to dashmap's
+            // default (`num_cpus * 4`). On a 14-core machine that default is 64 shards
+            // versus our heuristic's 4096; the cache lookup path was contending with
+            // itself on what should be cheap reads.
+            task_cache: FxDashMap::with_capacity_and_hasher_and_shard_amount(
+                0,
+                Default::default(),
+                shard_amount,
+            ),
             storage: Storage::new(shard_amount, small_preallocation),
             snapshot_coord: SnapshotCoordinator::new(),
             snapshot_in_progress: Mutex::new(()),