Skip to content

refactor: refactor utf8 path conversions#219

Merged
stormslowly merged 2 commits into
mainfrom
codex/utf8-path-only
May 18, 2026
Merged

refactor: refactor utf8 path conversions#219
stormslowly merged 2 commits into
mainfrom
codex/utf8-path-only

Conversation

@hardfist
Copy link
Copy Markdown
Contributor

@hardfist hardfist commented May 18, 2026

What changed

Rspack support UTF8 only Path so it's not necessary for rspack-resolver to support non utf8-path

  • Replaced lossy path-to-string conversions in resolver, tsconfig, filesystem, NAPI, and tests with strict UTF-8 conversions.
  • Added a small shared path_to_str helper for core path assertions.
  • Kept path extension and alias string construction on borrowed UTF-8 path strings where possible.

Why

rspack-resolver only needs to support UTF-8 paths, so lossy conversions can silently hide invalid path data and add unnecessary conversion work.

Performance notes

CodSpeed reports this PR improves three resolver benchmarks:

Benchmark Base Head Efficiency
resolver[multi-thread] 65.1 ms 63 ms +3.29%
resolver[resolve from symlinks multi thread] 123.7 ms 115.1 ms +7.5%
resolver[resolve from symlinks] 195.6 ms 186 ms +5.15%

Local callgrind comparison for resolver/resolve from symlinks (8975e38 -> 85e0cec) shows the same direction:

Metric Base Head Delta
Ir 3,073,208,626 3,000,439,756 -72,768,870 / -2.37%
Dr 798,591,590 787,908,689 -1.34%
Dw 594,749,952 592,901,782 -0.31%
accesses 4,466,550,168 4,381,250,227 -1.91%

The main reason is removing the core::str::lossy::Utf8Chunks::next hot path from repeated to_string_lossy() calls. In the base profile it accounts for about 52.5M Ir; after this PR it is replaced by cheaper strict UTF-8 validation (core::str::converts::from_utf8, about 7.7M Ir). The affected hot paths are primarily load_extensions and load_alias_or_file, where resolver candidate paths are converted repeatedly.

Measurement command shape:

RUSTFLAGS='-C debuginfo=1 -C strip=none -g' cargo bench --bench resolver --no-run
valgrind --tool=callgrind --cache-sim=yes \
  --callgrind-out-file=optimization-artifacts/valgrind/callgrind/pr219-symbols/{base,head}/callgrind.out \
  target/release/deps/resolver-* 'resolve from symlinks'

Validation

  • cargo fmt --all --check
  • cargo check --workspace
  • cargo check --workspace --features enable_instrument
  • cargo clippy --all-features -- -D warnings
  • cargo test -p rspack_resolver --lib -- --skip pnp
  • GitHub Actions PR checks are passing, including Clippy, tests, benchmark, coverage, build, doc, wasm, format.

Full cargo test --workspace was also run locally. It still fails on existing environment-sensitive fixtures: PnP tests need fixtures/pnp/.pnp.cjs, and integration resolve tests need local pnpm node_modules content.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 18, 2026

Merging this PR will improve performance by 5.3%

⚡ 3 improved benchmarks
✅ 9 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation resolver[multi-thread] 65.1 ms 63 ms +3.29%
Simulation resolver[resolve from symlinks multi thread] 123.7 ms 115.1 ms +7.5%
Simulation resolver[resolve from symlinks] 195.6 ms 186 ms +5.15%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing codex/utf8-path-only (85e0cec) with main (8975e38)

Open in CodSpeed

@hardfist hardfist changed the title [codex] refactor utf8 path conversions refactor: refactor utf8 path conversions May 18, 2026
@hardfist hardfist marked this pull request as ready for review May 18, 2026 03:25
Copilot AI review requested due to automatic review settings May 18, 2026 03:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors path-to-string conversion to use strict UTF-8 handling instead of lossy conversion across resolver internals, tsconfig handling, filesystem code, NAPI output, and tests.

Changes:

  • Adds a shared path_to_str helper for strict UTF-8 path assertions.
  • Replaces to_string_lossy() conversions with to_str().expect(...) or path_to_str(...).
  • Keeps extension and alias path construction on borrowed UTF-8 strings where possible.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/path.rs Adds the shared strict UTF-8 path conversion helper.
src/lib.rs Applies strict path conversion in resolver tracing, alias, extension, and error construction paths.
src/tsconfig.rs Uses strict conversion for tsconfig tracing and ${configDir} substitution.
src/file_system.rs Uses strict UTF-8 conversion when rebuilding wasm symlink paths.
napi/src/lib.rs Converts resolved paths to JS strings with strict UTF-8 handling.
src/tests/alias.rs Updates alias tests to use strict path conversion.
src/tests/browser_field.rs Updates browser field fallback path conversion.
src/tests/memory_fs.rs Adds a local helper and uses strict paths for the in-memory filesystem.
src/tests/missing.rs Updates alias path setup to strict conversion.
src/tests/pnp.rs Updates PnP path assertions and formatting to strict conversion.
src/tests/resolve.rs Updates fixture path conversion to strict UTF-8.
src/tests/roots.rs Updates absolute path requests to strict UTF-8.
src/tests/windows.rs Updates Windows path assertions to strict UTF-8.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@stormslowly stormslowly merged commit 0616bdc into main May 18, 2026
25 checks passed
@stormslowly stormslowly deleted the codex/utf8-path-only branch May 18, 2026 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants