Skip to content

[codex] Use camino UTF-8 path API#221

Draft
hardfist wants to merge 2 commits into
mainfrom
codex/camino-utf8-path-api
Draft

[codex] Use camino UTF-8 path API#221
hardfist wants to merge 2 commits into
mainfrom
codex/camino-utf8-path-api

Conversation

@hardfist
Copy link
Copy Markdown
Contributor

Summary

  • Switch the public resolver path API from std::path::{Path, PathBuf} to camino::{Utf8Path, Utf8PathBuf}.
  • Carry UTF-8 path types through resolver results, errors, options, contexts, tsconfig/package metadata, cache, and the FileSystem trait.
  • Replace internal UTF-8 to_str().expect(...) conversions with Utf8Path::as_str() and explicit std-path boundary conversions.
  • Update napi, examples, benches, and tests to use the new UTF-8 path API.

Impact

Callers now pass and receive UTF-8 path types at the Rust API boundary, making the resolver's UTF-8 requirement explicit and avoiding hidden unwraps during internal path-to-string conversion. OS/PnP calls still cross through std paths via as_std_path(), and canonicalized std paths are converted back with an InvalidData error if they are not valid UTF-8.

Validation

  • cargo check --workspace
  • cargo check --benches
  • cargo test --no-run
  • pre-commit: cargo fmt, pnpm exec taplo format, cargo clippy --all-features -- -D warnings

Note: cargo test --lib runs and passes 125 tests locally, but 6 PnP runtime tests fail because the local fixtures/pnp directory does not contain a .pnp.cjs manifest for pnp::find_closest_pnp_manifest_path to discover.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 18, 2026

Merging this PR will improve performance by 5.92%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 11 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory resolver[multi-thread] 11 MB 10.4 MB +5.92%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing codex/camino-utf8-path-api (06d9e46) with main (0616bdc)

Open in CodSpeed

@hardfist
Copy link
Copy Markdown
Contributor Author

Performance follow-up: aligned cache path keys with the previous std path behavior in src/cache.rs while keeping the public API on camino. The cache now hashes and compares Utf8Path through as_std_path(), avoiding camino's component-iterating Utf8Path::hash / equality in the hot cache lookup path.\n\nLocal Callgrind check on resolver[single-thread]:\n- base 0616bdc: 1,886,855,400 Ir\n- initial camino API head 728b3f6: 2,256,664,815 Ir\n- after cache key alignment 06d9e46: 2,055,885,851 Ir\n\nThis recovers about 8.9% instruction count versus the initial head. There is still residual overhead versus base; the remaining hot area is mostly realpath / strip-prefix path component work rather than the cache-key hash itself.

@hardfist
Copy link
Copy Markdown
Contributor Author

@codspeedbot explain why this is faster

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 18, 2026

@codspeedbot explain why this is faster

Based on my analysis of the CodSpeed flamegraphs and the code changes in this PR, here's why the resolver[multi-thread] benchmark improved by 5.92% in memory (11 MB → 10.4 MB):


Root Cause: Eliminated String Conversion Allocations in Hot Paths

The switch from std::path::Path to camino::Utf8Path removes the need for to_str().expect(...) conversions throughout the resolver. With std::path, every time the resolver needed a string view of a path (for comparisons, joining, suffix matching), it had to go through a fallible to_str() conversion. With Utf8Path, .as_str() is a zero-cost view into the underlying String — no allocation or validation needed.

Where the Savings Show Up

Comparing the flamegraph memory breakdowns between BASE and HEAD, the largest reductions are concentrated in two functions:

Function BASE memory HEAD memory Reduction
CachedPathImpl::package_json 688 µs 371 µs −46%
load_as_file_or_directory 445 µs 318 µs −29%

These are the hottest path-manipulation functions in the resolver — they join paths, look up package.json files, and probe file extensions. Replacing PathBuf&str round-trips with direct Utf8Path::as_str() eliminates many small intermediate allocations on each call.

Cache Key Alignment (Commit 06d9e46)

The follow-up change to hash and compare cache keys through as_std_path() instead of camino's default Utf8Path::hash (which iterates component-by-component) also contributes. The component-iterating hash/eq creates small iterator state objects on each lookup; routing through std::path::Path's byte-level hash avoids this overhead entirely on the hot cache lookup path.

Why Multi-Thread Shows It More

In the multi-threaded benchmark, 3 concurrent resolution tasks hit overlapping path hierarchies. Each task independently performs path joins and cache lookups, so per-resolution allocation savings are multiplied across threads. The single-thread benchmark likely benefits similarly per-resolution, but the absolute memory delta is smaller and stays within the noise threshold.

Note on Environment Differences

⚠️ The base and head runs executed on different CPU hardware (AMD EPYC vs Intel Xeon). However, since this is a Memory mode measurement (heap allocation tracking via DHAT), the CPU difference does not affect the result — heap allocations are deterministic regardless of processor.

Summary
  • Warming up the performance wizard...
  • Analyzing performance improvement — Completed flamegraph and code analysis for base and HEAD runs
  • Handling comment — Comment handling finished successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant