Skip to content

Commit d6cdf48

Browse files
authored
chore(haste-map): replace walker and hand-rolled traversal with fdir (#16187)
1 parent e115155 commit d6cdf48

18 files changed

Lines changed: 595 additions & 148 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
### Chore & Maintenance
1616

1717
- `[jest-haste-map]` Refactor massive class into multiple files ([#16180](https://github.com/jestjs/jest/pull/16180))
18+
- `[jest-haste-map]` Drop `walker` dependency; replace hand-rolled directory recursion in the JS crawler and watcher startup with `fdir` ([#16187](https://github.com/jestjs/jest/pull/16187))
1819
- `[jest-runtime]` Avoid magical `null` value in ESM loader ([#16160](https://github.com/jestjs/jest/pull/16160))
1920

2021
## 30.4.2

docs/Configuration.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -913,11 +913,22 @@ type HasteConfig = {
913913
computeSha1?: boolean;
914914
/** The platform to use as the default, e.g. 'ios'. */
915915
defaultPlatform?: string | null;
916-
/** Force use of Node's `fs` APIs rather than shelling out to `find` */
916+
/**
917+
* Force use of Node's `fs` APIs (via `fdir`) rather than shelling out to
918+
* the system `find` binary. Defaults to `false` on Linux/macOS.
919+
*
920+
* **Consider setting this to `true`**: `find(1)` receives no ignore
921+
* predicate, so it traverses ignored directories (e.g. `node_modules`,
922+
* `.git`) in full and discards the results afterward. The Node `fs` crawler
923+
* prunes those subtrees at `readdir` time. For most projects where
924+
* `node_modules` dwarfs source files, this makes the Node crawler faster
925+
* in practice despite `find(1)` being native code. A future Jest version
926+
* may flip this default.
927+
*/
917928
forceNodeFilesystemAPI?: boolean;
918929
/**
919930
* Whether to follow symlinks when crawling for files.
920-
* This options cannot be used in projects which use watchman.
931+
* This option cannot be used in projects which use watchman.
921932
* Projects with `watchman` set to true will error if this option is set to true.
922933
*/
923934
enableSymlinks?: boolean;

packages/jest-haste-map/CLAUDE.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,11 @@ Key files to know:
88

99
- `lib/FileProcessor.ts``processFile` (worker dispatch, haste-id extraction, duplicate tracking) and `buildHasteMap` (initial build loop).
1010
- `lib/CacheManager.ts` — v8 serialize/deserialize for the on-disk cache. **Sync I/O is intentional** — at haste-map's scale, async overhead adds no value and switching to `fs.promises` is not a free win.
11+
- `lib/walk.ts` — shared `fdir`-backed directory walker used by the node crawler (`crawlers/node.ts` `find`), `FSEventsWatcher.ts` startup, and `NodeWatcher.js` startup (via `watchers/common.js` `recReaddir`). Exports `walk(opts, done): void` (callback-based) and `WalkOptions`, `WalkEntryKind`. Uses `fdir`'s `.withCallback()` API with a bounded callback-driven `lstat` pool (default `Math.max(os.availableParallelism() * 4, 32)` inflight). `fdir` is constructed with `new Fdir({fs})` so it uses `graceful-fs` for `readdir` — mock `graceful-fs.readdir` in tests to control directory traversal. `WalkOptions.statCache` is an optional `Map<string, Stats>` passed by callers that invoke `walk()` multiple times (e.g. once per root): a path already in the map skips its `lstat` call. **The cache must not be stored on watcher instances or passed to runtime (post-startup) walks** — stale stats would be returned for files changed after startup. `find()` and `WatcherDriver.start()` each create a fresh cache, use it for all per-root startup walks, then discard it.
1112
- `watchers/ChangeQueue.ts` — 30 ms debounce, O(1) mtime-dedup via `Set<string>`, copy-on-write for the live map, file-processing dispatch.
12-
- `crawlers/watchman.ts` — fb-watchman with clock-based incremental updates. `crawlers/node.ts`pure Node.js fallback.
13-
- `watchers/types.ts``IWatcher`, `WatcherOptions`, `WatcherCtor`, `WatcherBackend`. New backends must implement `IWatcher` and accept `(root: string, opts: WatcherOptions)`.
14-
- `watchers/WatchmanWatcher.js` — macOS/Linux watchman. `FSEventsWatcher.ts` — macOS native. `NodeWatcher.js` — cross-platform fallback.
13+
- `crawlers/watchman.ts` — fb-watchman with clock-based incremental updates. `crawlers/node.ts``findNative` (`find(1)` shell-out) + `find` (`fdir` via `lib/walk`); `forceNodeFilesystemAPI` gates shell-out vs `fdir`.
14+
- `watchers/types.ts``IWatcher`, `WatcherOptions`, `WatcherCtor`. New backends must implement `IWatcher` and accept `(root: string, opts: WatcherOptions)`.
15+
- `watchers/WatchmanWatcher.js` — macOS/Linux watchman. `FSEventsWatcher.ts` — macOS native (startup walk via `lib/walk`). `NodeWatcher.js` — cross-platform fallback.
1516

1617
## Data model
1718

@@ -40,6 +41,8 @@ Key files to know:
4041

4142
**`enableSymlinks` guard** fires when `enableSymlinks && useWatchman`.
4243

44+
**`enableSymlinks` symlink semantics.** When `false` (default), symlinks are excluded by all walkers. When `true`: `find` (fdir path) passes `enableSymlinks` to `walk()`, which sets `excludeSymlinks: false` and keeps `resolveSymlinks: false` (fdir mode 2) — symlinks are included at their **original link path** (not realpath); `walk()` then calls `fs.stat` (follows the link) to get target stats (mtime/size). This preserves the path Jest uses to `require` the file while reporting the target's metadata. `findNative` (shell `find` path) uses `( -type f -o -type l )` when `enableSymlinks` is true to include symlinks, then `fs.stat`s each result. Do NOT set fdir's `resolveSymlinks: true` — it calls `realpath` and returns the resolved path, which would cause haste-map to track files under the wrong path.
45+
4346
**Config wiring from jest-config to jest-haste-map:** `HasteMap.Options` fields come from two places in `ProjectConfig`: `haste.enableSymlinks``enableSymlinks`, `haste.forceNodeFilesystemAPI``forceNodeFilesystemAPI`. The `useWatchman` field comes from the caller (e.g. `jest-runtime` passes `options?.watchman`; `jest-core` passes `globalConfig.watchman`). If you add a new `haste.*` config key that needs to reach `HasteMap`, add it to `HasteConfig` in `jest-types/src/Config.ts`, `HasteConfig` schema in `jest-schemas/src/raw-types.ts`, `Defaults.ts` (if it has a default), `ValidConfig.ts` (both `initialOptions.haste` and `initialProjectOptions.haste`), and the `HasteMap.create(...)` call in `jest-runtime/src/index.ts`.
4447

4548
## Tests

packages/jest-haste-map/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,9 @@ console.log(files);
7373
| dependencyExtractor | string \| null | No | `null` |
7474
| enableSymlinks | boolean | No | `false` |
7575
| extensions | Array&lt;string&gt; | Yes | - |
76-
| forceNodeFilesystemAPI | boolean | Yes | - |
77-
| hasteImplModulePath | string | Yes | - |
78-
| hasteMapModulePath | string | Yes | - |
76+
| forceNodeFilesystemAPI | boolean | No | `false` |
77+
| hasteImplModulePath | string | No | - |
78+
| hasteMapModulePath | string | No | - |
7979
| id | string | Yes | - |
8080
| ignorePattern | HasteRegExp | No | - |
8181
| maxWorkers | number | Yes | - |
@@ -85,8 +85,8 @@ console.log(files);
8585
| retainAllFiles | boolean | Yes | - |
8686
| rootDir | string | Yes | - |
8787
| roots | Array&lt;string&gt; | Yes | - |
88-
| skipPackageJson | boolean | Yes | - |
89-
| throwOnModuleCollision | boolean | Yes | - |
88+
| skipPackageJson | boolean | No | `false` |
89+
| throwOnModuleCollision | boolean | No | `false` |
9090
| useWatchman | boolean | No | `true` |
9191
9292
For more, you can check [github](https://github.com/jestjs/jest/tree/main/packages/jest-haste-map)

packages/jest-haste-map/package.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,12 @@
2323
"@types/node": "*",
2424
"anymatch": "^3.1.3",
2525
"fb-watchman": "^2.0.2",
26+
"fdir": "^6.5.0",
2627
"graceful-fs": "^4.2.11",
2728
"jest-regex-util": "workspace:*",
2829
"jest-util": "workspace:*",
2930
"jest-worker": "workspace:*",
30-
"picomatch": "^4.0.3",
31-
"walker": "^1.0.8"
31+
"picomatch": "^4.0.3"
3232
},
3333
"devDependencies": {
3434
"@types/fb-watchman": "^2.0.5",

packages/jest-haste-map/src/crawlers/__tests__/node.test.js

Lines changed: 78 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,44 +71,52 @@ jest.mock('graceful-fs', () => {
7171
throw new Error('readdir: callback is not a function!');
7272
}
7373

74-
if (slash(dir) === '/project/fruits') {
74+
const normalizedDir = slash(dir).replace(/\/+$/, '');
75+
if (normalizedDir === '/project/fruits') {
7576
setTimeout(
7677
() =>
7778
callback(null, [
7879
{
7980
isDirectory: () => true,
81+
isFile: () => false,
8082
isSymbolicLink: () => false,
8183
name: 'directory',
8284
},
8385
{
8486
isDirectory: () => false,
87+
isFile: () => true,
8588
isSymbolicLink: () => false,
8689
name: 'tomato.js',
8790
},
8891
{
8992
isDirectory: () => false,
93+
isFile: () => false,
9094
isSymbolicLink: () => true,
9195
name: 'symlink',
9296
},
9397
]),
9498
0,
9599
);
96-
} else if (slash(dir) === '/project/fruits/directory') {
100+
} else if (normalizedDir === '/project/fruits/directory') {
97101
setTimeout(
98102
() =>
99103
callback(null, [
100104
{
101105
isDirectory: () => false,
106+
isFile: () => true,
102107
isSymbolicLink: () => false,
103108
name: 'strawberry.js',
104109
},
105110
]),
106111
0,
107112
);
108-
} else if (slash(dir) === '/error') {
113+
} else if (normalizedDir === '/error') {
109114
setTimeout(() => callback({code: 'ENOTDIR'}, undefined), 0);
110115
}
111116
}),
117+
readdirSync: jest.fn(() => []),
118+
realpath: jest.fn(),
119+
realpathSync: jest.fn(),
112120
stat: jest.fn(stat),
113121
};
114122
});
@@ -364,6 +372,73 @@ describe('node crawler', () => {
364372
expect(removedFiles).toEqual(new Map());
365373
});
366374

375+
it('deduplicates results when roots overlap', async () => {
376+
nodeCrawl = require('../node').nodeCrawl;
377+
378+
const {hasteMap} = await nodeCrawl({
379+
data: {files: new Map()},
380+
extensions: ['js'],
381+
forceNodeFilesystemAPI: true,
382+
ignore: pearMatcher,
383+
rootDir,
384+
// /project/fruits/directory is a subdirectory of /project/fruits, so
385+
// strawberry.js is reachable from both roots.
386+
roots: ['/project/fruits', '/project/fruits/directory'],
387+
});
388+
389+
expect(hasteMap.files).toEqual(
390+
createMap({
391+
'fruits/directory/strawberry.js': expect.any(Array),
392+
'fruits/tomato.js': expect.any(Array),
393+
}),
394+
);
395+
});
396+
397+
it('passes symlink args to find when enableSymlinks is true', async () => {
398+
childProcess = require('child_process');
399+
nodeCrawl = require('../node').nodeCrawl;
400+
401+
await nodeCrawl({
402+
data: {files: new Map()},
403+
enableSymlinks: true,
404+
extensions: ['js'],
405+
ignore: pearMatcher,
406+
rootDir,
407+
roots: ['/project/fruits'],
408+
});
409+
410+
expect(childProcess.spawn).toHaveBeenLastCalledWith('find', [
411+
'/project/fruits',
412+
'(',
413+
'-type',
414+
'f',
415+
'-o',
416+
'-type',
417+
'l',
418+
')',
419+
'(',
420+
'-iname',
421+
'*.js',
422+
')',
423+
]);
424+
});
425+
426+
it('handles empty results from find after filtering', async () => {
427+
nodeCrawl = require('../node').nodeCrawl;
428+
// All paths match the ignore pattern, so the filtered list is empty.
429+
mockResponse = '/project/fruits/pear.js';
430+
431+
const {hasteMap} = await nodeCrawl({
432+
data: {files: new Map()},
433+
extensions: ['js'],
434+
ignore: pearMatcher,
435+
rootDir,
436+
roots: ['/project/fruits'],
437+
});
438+
439+
expect(hasteMap.files).toEqual(new Map());
440+
});
441+
367442
it('avoids calling lstat for directories and symlinks', async () => {
368443
nodeCrawl = require('../node').nodeCrawl;
369444
const fs = require('graceful-fs');

packages/jest-haste-map/src/crawlers/node.ts

Lines changed: 39 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import * as path from 'node:path';
1010
import * as fs from 'graceful-fs';
1111
import H from '../constants';
1212
import * as fastPath from '../lib/fast_path';
13+
import {walk} from '../lib/walk';
1314
import type {
1415
CrawlerOptions,
1516
FileData,
@@ -63,70 +64,36 @@ function find(
6364
enableSymlinks: boolean,
6465
callback: Callback,
6566
): void {
67+
const extSet = new Set(extensions);
6668
const result: Result = [];
67-
let activeCalls = 0;
68-
69-
function search(directory: string): void {
70-
activeCalls++;
71-
fs.readdir(directory, {withFileTypes: true}, (err, entries) => {
72-
activeCalls--;
73-
if (err) {
74-
if (activeCalls === 0) {
75-
callback(result);
76-
}
77-
return;
78-
}
79-
for (const entry of entries) {
80-
const file = path.join(directory, entry.name);
81-
82-
if (ignore(file)) {
83-
continue;
84-
}
85-
86-
if (entry.isSymbolicLink()) {
87-
continue;
88-
}
89-
if (entry.isDirectory()) {
90-
search(file);
91-
continue;
92-
}
69+
const statCache = new Map<string, fs.Stats>();
70+
let remaining = roots.length;
9371

94-
activeCalls++;
95-
96-
const stat = enableSymlinks ? fs.stat : fs.lstat;
97-
98-
stat(file, (err, stat) => {
99-
activeCalls--;
100-
101-
// This logic is unnecessary for node > v10.10, but leaving it in
102-
// since we need it for backwards-compatibility still.
103-
if (!err && stat && !stat.isSymbolicLink()) {
104-
if (stat.isDirectory()) {
105-
search(file);
106-
} else {
107-
const ext = path.extname(file).slice(1);
108-
if (extensions.includes(ext)) {
109-
result.push([file, stat.mtime.getTime(), stat.size]);
110-
}
111-
}
112-
}
113-
114-
if (activeCalls === 0) {
115-
callback(result);
116-
}
117-
});
118-
}
119-
120-
if (activeCalls === 0) {
121-
callback(result);
122-
}
123-
});
72+
if (remaining === 0) {
73+
callback(result);
74+
return;
12475
}
12576

126-
if (roots.length > 0) {
127-
for (const root of roots) search(root);
128-
} else {
129-
callback(result);
77+
for (const root of roots) {
78+
walk(
79+
{
80+
enableSymlinks,
81+
exclude: ignore,
82+
onEntry: (kind, filePath, stats) => {
83+
if (kind === 'file' && extSet.has(path.extname(filePath).slice(1))) {
84+
result.push([filePath, stats.mtime.getTime(), stats.size]);
85+
}
86+
},
87+
root,
88+
statCache,
89+
},
90+
() => {
91+
remaining--;
92+
if (remaining === 0) {
93+
callback(result);
94+
}
95+
},
96+
);
13097
}
13198
}
13299

@@ -158,20 +125,21 @@ function findNative(
158125
}
159126

160127
const child = spawn('find', args);
161-
let stdout = '';
162128
if (child.stdout === null) {
163129
throw new Error(
164130
'stdout is null - this should never happen. Please open up an issue at https://github.com/jestjs/jest',
165131
);
166132
}
167133
child.stdout.setEncoding('utf8');
168-
child.stdout.on('data', data => (stdout += data));
134+
const chunks: Array<string> = [];
135+
child.stdout.on('data', data => chunks.push(data));
169136

170137
child.stdout.on('close', () => {
171-
const lines = stdout
138+
const lines = chunks
139+
.join('')
172140
.trim()
173141
.split('\n')
174-
.filter(x => !ignore(x));
142+
.filter(x => x && !ignore(x));
175143
const result: Result = [];
176144
let count = lines.length;
177145
if (count) {
@@ -198,11 +166,11 @@ export async function nodeCrawl(options: CrawlerOptions): Promise<{
198166
}> {
199167
const {
200168
data,
169+
enableSymlinks,
201170
extensions,
202171
forceNodeFilesystemAPI,
203172
ignore,
204173
rootDir,
205-
enableSymlinks,
206174
roots,
207175
} = options;
208176

@@ -233,6 +201,12 @@ export async function nodeCrawl(options: CrawlerOptions): Promise<{
233201
};
234202

235203
if (useNativeFind) {
204+
// TODO: consider making forceNodeFilesystemAPI the default. find(1) does
205+
// not receive the ignore predicate, so it traverses ignored directories
206+
// (e.g. node_modules, .git) in full and discards results afterward.
207+
// find() via fdir prunes those subtrees at readdir time. For a typical
208+
// project where node_modules dwarfs source files, the wasted traversal
209+
// likely outweighs find(1)'s native speed advantage.
236210
findNative(roots, extensions, ignore, enableSymlinks, callback);
237211
} else {
238212
find(roots, extensions, ignore, enableSymlinks, callback);

0 commit comments

Comments
 (0)