-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[RFC]: Bringing glob Dependency In-House #11234
Description
Bringing the glob Dependency In-House
Replacing the external glob npm package with stdlib-native equivalents to eliminate a supply-chain risk and align all production and tooling code with stdlib's quality standards (docs, tests, examples, benchmarks, backward compatibility).
Background
What glob Does
A robust glob implementation in JavaScript that matches files using patterns the shell uses (like stars and question marks). It works by walking the filesystem and applying regex-based matching to paths.
More here: https://github.com/isaacs/node-glob
Its core dependencies
minimatch: Converts the glob strings into regular expressions and performs the actual string matching.fs.realpath/inflight/once: Various utilities used under the hood to manage concurrent filesystem calls and path resolutions.
How stdlib uses it
- 50+ files across:
@stdlib/_tools/pkgs/*,@stdlib/_tools/lint/*,@stdlib/_tools/static-analysis/*, and various internal bundling or testing scripts. - Only the core discovery functions are used:
glob( pattern, opts, clbk )andglob.sync( pattern, opts ). - Primary use cases: Finding
package.jsonfiles, source.jsfiles, or.cfiles, while explicitly passing anignoreoption array (e.g.,['node_modules/**', '.git/**']) to prevent scanning massive dependency directories. - No advanced API usage: None of the files instantiate the
Globclass directly as an event emitter or use advanced bash features like brace expansion ({a,b}).
Proposed Changes
The plan creates 2 new packages mirroring stdlib's decomposable architecture, with each package being independently consumable.
Component 1: @stdlib/utils/regexp-from-glob
Replaces the
minimatchnpm dependency (transitive viaglob).
A utility to convert standard shell wildcard patterns into native JavaScript RegExp objects.
Scope: Strictly scoped to the wildcards stdlib actually uses (*, **, ?). It will not support brace expansions or extglobs to remain intentionally minimal and maintain high execution speed.
@stdlib/utils/regexp-from-glob/
├── lib/
│ ├── index.js # re-export main
│ └── main.js # regexpFromGlob(str) → RegExp
├── test/
│ └── test.js
├── benchmark/
│ └── benchmark.js
├── docs/
│ ├── repl.txt
│ └── types/
│ ├── index.d.ts
│ └── test.ts
├── examples/
│ └── index.js
├── README.md
└── package.json
API:
var regexpFromGlob = require( '@stdlib/utils/regexp-from-glob' );
var re = regexpFromGlob( '**/*.js' );
// returns RegExp
re.test( 'lib/index.js' ); // => true
re.test( 'lib/index.json' ); // => falseComponent 2: @stdlib/fs/glob
The core package — the direct replacement for
require('glob').
A filesystem traversal utility that applies the generated glob regexes to discover files. This is the main package all _tools files will switch to.
@stdlib/fs/glob/
├── lib/
│ ├── index.js # re-export main
│ ├── main.js # async implementation
│ ├── sync.js # sync implementation
│ ├── walk.js # internal DFS/BFS directory walker
│ └── validate.js # options validation
├── test/
│ ├── test.js # async tests
│ ├── test.sync.js # sync tests
│ └── test.walk.js # directory walker tree-pruning tests
├── benchmark/
│ ├── benchmark.js # benchmark: async traversal
│ └── benchmark.sync.js # benchmark: sync traversal
├── docs/
│ ├── repl.txt
│ └── types/
│ ├── index.d.ts
│ └── test.ts
├── examples/
│ └── index.js
├── README.md
└── package.json
API (drop-in compatible with current usage):
var glob = require( '@stdlib/fs/glob' );
var opts = {
'cwd': __dirname,
'ignore': [ 'node_modules/**' ],
'realpath': true
};
// Async usage
glob( '**/*.js', opts, function onGlob( error, matches ) {
if ( error ) {
console.error( error );
return;
}
console.dir( matches );
});
// Sync usage
var matches = glob.sync( '**/*.js', opts );Key implementation details:
- Uses
@stdlib/fs/read-dirand standardfs.statunder the hood. - Tree-Pruning (Crucial for perf): The internal
walk.jsalgorithm must evaluate theignoreoption before descending into a directory. - Options support:
cwd(defaults toprocess.cwd()),ignore(array of globs), andrealpath(boolean). - Returns arrays strictly normalized, preventing duplicates.
- No external dependencies (uses existing
stdlibutilities).
Component 3: Migration — Updating All Consumers
This is the mechanical bulk change, done after the new packages are created and verified.
[MODIFY] All 50+ files using require( 'glob' )
The change is a single-line replacement per file:
-var glob = require( 'glob' );
+var glob = require( '@stdlib/fs/glob' );Migration strategy (phased to reduce risk):
| Phase | Scope | Files |
|---|---|---|
| Phase 1 | @stdlib/_tools/pkgs/* (find, deps, clis, etc.) |
~15 files |
| Phase 2 | @stdlib/_tools/static-analysis/* (sloc-glob, etc.) |
~10 files |
| Phase 3 | @stdlib/_tools/lint/* (filenames, pkg-json, etc.) |
~20 files |
| Phase 4 | Remaining bundles and scripts | ~10 files |
Each phase follows the same process:
- Run
find+sedto replacerequire( 'glob' )→require( '@stdlib/fs/glob' ). - Run existing internal tool tests for the affected script area.
- Verify output matches the original glob dependency.
Uninstall the dependancy
After all phases are complete:
npm uninstall glob
Verification Plan
Automated Tests
1. Unit tests for @stdlib/utils/regexp-from-glob
make TESTS_FILTER=".*/utils/regexp-from-glob/.*" testTests should cover:
- Core wildcards (
*,**,?). - Escaping behavior for dot
., plus+, parentheses(), and brackets[]. - Prefix, suffix, and exact match conditions.
2. Unit tests for @stdlib/fs/glob
make TESTS_FILTER=".*/fs/glob/.*" testTests should cover:
- Standard sync and async pattern matching against a mock filesystem.
ignorearrays effectively pruning traversal.cwdoption shifts the search base.realpathcorrectly resolves to absolute paths.- Error handling (e.g., trying to read restricted directories gracefully).
3. Regression tests for migrated modules
Run the full test suite for each phase to ensure the tools operate as expected:
# Phase 1:
make TESTS_FILTER=".*/_tools/pkgs/.*" test
# Phase 2:
make TESTS_FILTER=".*/_tools/static-analysis/.*" test
# Phase 3:
make TESTS_FILTER=".*/_tools/lint/.*" test
# Phase 4: Full internal tools suite
make testManual Verification
After migration, verify the tools perform correctly on the monorepo:
-
Verify Package Discovery:
node ./lib/node_modules/@stdlib/_tools/pkgs/find/bin/cli
→ Verify: Outputs all valid
stdlibsub-packages identically to previous runs, without diving intonode_modules. -
Verify SLOC execution:
node ./lib/node_modules/@stdlib/_tools/static-analysis/js/sloc-glob/bin/cli
→ Verify: Accurately computes lines of code for the workspace.
-
Performance Check:
Time the execution before and after the swap.time node ./lib/node_modules/@stdlib/_tools/pkgs/find/bin/cli→ Verify: Our native, tree-pruned walker executes in equivalent or faster time than the external dependency.