Skip to content

[RFC]: Bringing glob Dependency In-House #11234

@therealharshit

Description

@therealharshit

Bringing the glob Dependency In-House

Replacing the external glob npm package with stdlib-native equivalents to eliminate a supply-chain risk and align all production and tooling code with stdlib's quality standards (docs, tests, examples, benchmarks, backward compatibility).

Background

What glob Does

A robust glob implementation in JavaScript that matches files using patterns the shell uses (like stars and question marks). It works by walking the filesystem and applying regex-based matching to paths.
More here: https://github.com/isaacs/node-glob

Its core dependencies

  • minimatch: Converts the glob strings into regular expressions and performs the actual string matching.
  • fs.realpath / inflight / once: Various utilities used under the hood to manage concurrent filesystem calls and path resolutions.

How stdlib uses it

  • 50+ files across: @stdlib/_tools/pkgs/*, @stdlib/_tools/lint/*, @stdlib/_tools/static-analysis/*, and various internal bundling or testing scripts.
  • Only the core discovery functions are used: glob( pattern, opts, clbk ) and glob.sync( pattern, opts ).
  • Primary use cases: Finding package.json files, source .js files, or .c files, while explicitly passing an ignore option array (e.g., ['node_modules/**', '.git/**']) to prevent scanning massive dependency directories.
  • No advanced API usage: None of the files instantiate the Glob class directly as an event emitter or use advanced bash features like brace expansion ({a,b}).

Proposed Changes

The plan creates 2 new packages mirroring stdlib's decomposable architecture, with each package being independently consumable.


Component 1: @stdlib/utils/regexp-from-glob

Replaces the minimatch npm dependency (transitive via glob).

A utility to convert standard shell wildcard patterns into native JavaScript RegExp objects.

Scope: Strictly scoped to the wildcards stdlib actually uses (*, **, ?). It will not support brace expansions or extglobs to remain intentionally minimal and maintain high execution speed.

@stdlib/utils/regexp-from-glob/
├── lib/
│   ├── index.js          # re-export main
│   └── main.js           # regexpFromGlob(str) → RegExp
├── test/
│   └── test.js
├── benchmark/
│   └── benchmark.js
├── docs/
│   ├── repl.txt
│   └── types/
│       ├── index.d.ts
│       └── test.ts
├── examples/
│   └── index.js
├── README.md
└── package.json

API:

var regexpFromGlob = require( '@stdlib/utils/regexp-from-glob' );

var re = regexpFromGlob( '**/*.js' );
// returns RegExp

re.test( 'lib/index.js' ); // => true
re.test( 'lib/index.json' ); // => false

Component 2: @stdlib/fs/glob

The core package — the direct replacement for require('glob').

A filesystem traversal utility that applies the generated glob regexes to discover files. This is the main package all _tools files will switch to.

@stdlib/fs/glob/
├── lib/
│   ├── index.js          # re-export main
│   ├── main.js           # async implementation
│   ├── sync.js           # sync implementation
│   ├── walk.js           # internal DFS/BFS directory walker
│   └── validate.js       # options validation
├── test/
│   ├── test.js           # async tests
│   ├── test.sync.js      # sync tests
│   └── test.walk.js      # directory walker tree-pruning tests
├── benchmark/
│   ├── benchmark.js      # benchmark: async traversal
│   └── benchmark.sync.js # benchmark: sync traversal
├── docs/
│   ├── repl.txt
│   └── types/
│       ├── index.d.ts
│       └── test.ts
├── examples/
│   └── index.js
├── README.md
└── package.json

API (drop-in compatible with current usage):

var glob = require( '@stdlib/fs/glob' );

var opts = {
    'cwd': __dirname,
    'ignore': [ 'node_modules/**' ],
    'realpath': true
};

// Async usage
glob( '**/*.js', opts, function onGlob( error, matches ) {
    if ( error ) {
        console.error( error );
        return;
    }
    console.dir( matches );
});

// Sync usage
var matches = glob.sync( '**/*.js', opts );

Key implementation details:

  • Uses @stdlib/fs/read-dir and standard fs.stat under the hood.
  • Tree-Pruning (Crucial for perf): The internal walk.js algorithm must evaluate the ignore option before descending into a directory.
  • Options support: cwd (defaults to process.cwd()), ignore (array of globs), and realpath (boolean).
  • Returns arrays strictly normalized, preventing duplicates.
  • No external dependencies (uses existing stdlib utilities).

Component 3: Migration — Updating All Consumers

This is the mechanical bulk change, done after the new packages are created and verified.

[MODIFY] All 50+ files using require( 'glob' )

The change is a single-line replacement per file:

-var glob = require( 'glob' );
+var glob = require( '@stdlib/fs/glob' );

Migration strategy (phased to reduce risk):

Phase Scope Files
Phase 1 @stdlib/_tools/pkgs/* (find, deps, clis, etc.) ~15 files
Phase 2 @stdlib/_tools/static-analysis/* (sloc-glob, etc.) ~10 files
Phase 3 @stdlib/_tools/lint/* (filenames, pkg-json, etc.) ~20 files
Phase 4 Remaining bundles and scripts ~10 files

Each phase follows the same process:

  1. Run find + sed to replace require( 'glob' )require( '@stdlib/fs/glob' ).
  2. Run existing internal tool tests for the affected script area.
  3. Verify output matches the original glob dependency.

Uninstall the dependancy

After all phases are complete:

npm uninstall glob

Verification Plan

Automated Tests

1. Unit tests for @stdlib/utils/regexp-from-glob

make TESTS_FILTER=".*/utils/regexp-from-glob/.*" test

Tests should cover:

  • Core wildcards (*, **, ?).
  • Escaping behavior for dot ., plus +, parentheses (), and brackets [].
  • Prefix, suffix, and exact match conditions.

2. Unit tests for @stdlib/fs/glob

make TESTS_FILTER=".*/fs/glob/.*" test

Tests should cover:

  • Standard sync and async pattern matching against a mock filesystem.
  • ignore arrays effectively pruning traversal.
  • cwd option shifts the search base.
  • realpath correctly resolves to absolute paths.
  • Error handling (e.g., trying to read restricted directories gracefully).

3. Regression tests for migrated modules

Run the full test suite for each phase to ensure the tools operate as expected:

# Phase 1:
make TESTS_FILTER=".*/_tools/pkgs/.*" test

# Phase 2:
make TESTS_FILTER=".*/_tools/static-analysis/.*" test

# Phase 3:
make TESTS_FILTER=".*/_tools/lint/.*" test

# Phase 4: Full internal tools suite
make test

Manual Verification

After migration, verify the tools perform correctly on the monorepo:

  1. Verify Package Discovery:

    node ./lib/node_modules/@stdlib/_tools/pkgs/find/bin/cli

    → Verify: Outputs all valid stdlib sub-packages identically to previous runs, without diving into node_modules.

  2. Verify SLOC execution:

    node ./lib/node_modules/@stdlib/_tools/static-analysis/js/sloc-glob/bin/cli

    → Verify: Accurately computes lines of code for the workspace.

  3. Performance Check:
    Time the execution before and after the swap.

    time node ./lib/node_modules/@stdlib/_tools/pkgs/find/bin/cli

    → Verify: Our native, tree-pruned walker executes in equivalent or faster time than the external dependency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions