🐛 Handle Windows-style extraction paths#2812
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Windows-aware path rewriting and bsdtar-style absolute-prefix stripping to extraction: new Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (extract cmd)
participant Editor as PathnameEditor
participant Entry as EntryName/EntryReference
participant Extractor as Extract logic
CLI->>Editor: receive raw archive entry path
Editor->>Editor: rewrite_path_for_extraction(raw) → RewrittenPath{path, had_root}
Editor->>Entry: EntryReference::from_utf8_preserve_root(rewritten.path, had_root)
Editor-->>CLI: return EntryReference
CLI->>Extractor: pass EntryReference
Extractor->>Extractor: is_unsafe_link_path(reference.as_str()) (uses rewrite + parent-dir check)
Extractor-->>CLI: allow or skip extraction
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug concerning the extraction of files with Windows-style paths. It introduces a new dependency, Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces handling for Windows-style extraction paths, which is a great improvement for cross-platform compatibility. The changes are well-contained, primarily within cli/src/command/core/path.rs, and the new logic for stripping Windows path prefixes in strip_absolute_path_bsdtar seems correct and is thoroughly tested. The integration into the existing path editing functions is clean. The suggestion for a minor performance improvement to reduce allocations is still valid and included. Overall, this is a solid contribution.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cli/src/command/core/path.rs`:
- Around line 45-47: The Windows absolute-prefix rewrite must run before
stripping components: call rewrite_path_for_extraction on the original path and
use its result as the input to transform_and_strip so drive letters/UNC/\\?\
prefixes aren't counted as leading components; update the three similar sites
that use transform_and_strip then rewrite (the occurrences around
transform_and_strip(...)? / rewrite_path_for_extraction(...) /
EntryName::from_utf8_preserve_root(...) and the other blocks at the later
locations) to perform rewrite_path_for_extraction first, then pass that
rewritten path into transform_and_strip, then continue to create
EntryName::from_utf8_preserve_root(rewritten.path.as_ref()) as before.
In `@cli/src/command/extract.rs`:
- Around line 1735-1737: The guard that enforces --no-allow-unsafe-links for
hardlinks currently only checks has_parent_dir_component on the rewritten path
and misses cases where a Windows root was stripped; modify the condition that
uses strip_absolute_path_bsdtar(...) so it uses the returned had_root flag as
part of the safety check (e.g., require blocking when had_root ||
crate::command::core::path::has_parent_dir_component(rewritten.as_ref())), and
ensure edit_hardlink() uses that combined boolean to reject unsafe hardlink
targets.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3754ce58-332d-45a3-a98a-141220cb8acb
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
cli/Cargo.tomlcli/src/command/core/path.rscli/src/command/extract.rscli/tests/cli/stdio/path_traversal.rs
0924cd8 to
ef27537
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
cli/src/command/extract.rs (1)
1735-1737:⚠️ Potential issue | 🟠 MajorThis still misses Windows roots already stripped in
edit_hardlink().By the time the hardlink path reaches this helper,
EntryReferenceis already the rewritten form (for exampleetc/hosts), sostrip_absolute_path_bsdtar(reference.as_str())can no longer recover the originalc:/...or\\?\UNC\...root.--no-allow-unsafe-linksstill needs to incorporate thehad_rootboolean returned byedit_hardlink().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli/src/command/extract.rs` around lines 1735 - 1737, The check here incorrectly recomputes had_root from the already-rewritten EntryReference so Windows roots stripped earlier in edit_hardlink() are lost; modify the helper signature (and its call sites) to accept and use the had_root flag produced by edit_hardlink() instead of calling strip_absolute_path_bsdtar(reference.as_str()) again, then return had_root || crate::command::core::path::has_parent_dir_component(rewritten.as_ref()); update callers (the place that calls edit_hardlink() and this helper) to forward the had_root boolean from edit_hardlink() so --no-allow-unsafe-links correctly respects original Windows roots.cli/src/command/core/path.rs (1)
45-47:⚠️ Potential issue | 🟠 MajorRewrite Windows prefixes before applying
--strip-components.These paths still go through
transform_and_strip()first, so drive letters / UNC prefixes are counted as leading components beforerewrite_path_for_extraction()sees them.c:/a/bwith--strip-components=1will still strip the drive prefix instead of the first real path segment, and//?/UNC/...can lose its recognizable shape before the rewrite logic runs.Also applies to: 65-68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli/src/command/core/path.rs` around lines 45 - 47, The Windows path-rewrite must run before any strip logic: currently transform_and_strip(...) is called first so drive letters/UNC prefixes count as components and can be stripped; instead call rewrite_path_for_extraction(...) on the original input first (so UNC/\\?\\/drive prefixes are normalized/preserved), then pass the rewritten path into transform_and_strip(...) (or adjust transform_and_strip to accept a pre-rewritten Path/Stripped flag) and finally construct EntryName::from_utf8_preserve_root(...) from that rewritten+stripped result; apply the same reorder/fix at the other occurrence that uses transform_and_strip -> rewrite_path_for_extraction -> EntryName::from_utf8_preserve_root.
🧹 Nitpick comments (1)
cli/tests/cli/stdio/path_traversal.rs (1)
546-691: Please add a hardlink regression for Windows-rooted targets.These new cases cover entry-name rewriting, but not
c:/...or\\?\UNC\...hardlink targets under--no-allow-unsafe-links. A focused stdio test there would catch the unsafe-link hole incli/src/command/extract.rs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli/tests/cli/stdio/path_traversal.rs` around lines 546 - 691, Add focused stdio tests that assert hardlink targets with Windows-rooted names are treated as unsafe and skipped when extraction is run with --no-allow-unsafe-links: create two tests (e.g., stdio_extract_skips_windows_drive_hardlink_target and stdio_extract_skips_windows_unc_hardlink_target) that use a helper to build an archive containing a hardlink entry whose target is "c:/fileX" and "\\\\?\\UNC\\server\\share\\fileY" respectively, run the same cargo_bin_cmd!("pna") extract invocation but include the --no-allow-unsafe-links flag, and assert that no files or links are created in out_dir; this will exercise the code path in cli/src/command/extract.rs that validates link targets and close the unsafe-link regression hole. Ensure the new tests use the same setup()/sorted_entry_names helpers and mirror the style of stdio_extract_rewrites_windows_drive_prefixed_entry_name and stdio_extract_rewrites_windows_unc_prefixed_entry_name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@cli/src/command/core/path.rs`:
- Around line 45-47: The Windows path-rewrite must run before any strip logic:
currently transform_and_strip(...) is called first so drive letters/UNC prefixes
count as components and can be stripped; instead call
rewrite_path_for_extraction(...) on the original input first (so UNC/\\?\\/drive
prefixes are normalized/preserved), then pass the rewritten path into
transform_and_strip(...) (or adjust transform_and_strip to accept a
pre-rewritten Path/Stripped flag) and finally construct
EntryName::from_utf8_preserve_root(...) from that rewritten+stripped result;
apply the same reorder/fix at the other occurrence that uses transform_and_strip
-> rewrite_path_for_extraction -> EntryName::from_utf8_preserve_root.
In `@cli/src/command/extract.rs`:
- Around line 1735-1737: The check here incorrectly recomputes had_root from the
already-rewritten EntryReference so Windows roots stripped earlier in
edit_hardlink() are lost; modify the helper signature (and its call sites) to
accept and use the had_root flag produced by edit_hardlink() instead of calling
strip_absolute_path_bsdtar(reference.as_str()) again, then return had_root ||
crate::command::core::path::has_parent_dir_component(rewritten.as_ref()); update
callers (the place that calls edit_hardlink() and this helper) to forward the
had_root boolean from edit_hardlink() so --no-allow-unsafe-links correctly
respects original Windows roots.
---
Nitpick comments:
In `@cli/tests/cli/stdio/path_traversal.rs`:
- Around line 546-691: Add focused stdio tests that assert hardlink targets with
Windows-rooted names are treated as unsafe and skipped when extraction is run
with --no-allow-unsafe-links: create two tests (e.g.,
stdio_extract_skips_windows_drive_hardlink_target and
stdio_extract_skips_windows_unc_hardlink_target) that use a helper to build an
archive containing a hardlink entry whose target is "c:/fileX" and
"\\\\?\\UNC\\server\\share\\fileY" respectively, run the same
cargo_bin_cmd!("pna") extract invocation but include the --no-allow-unsafe-links
flag, and assert that no files or links are created in out_dir; this will
exercise the code path in cli/src/command/extract.rs that validates link targets
and close the unsafe-link regression hole. Ensure the new tests use the same
setup()/sorted_entry_names helpers and mirror the style of
stdio_extract_rewrites_windows_drive_prefixed_entry_name and
stdio_extract_rewrites_windows_unc_prefixed_entry_name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6d247c5b-95aa-4932-a25d-a7ab8b13b887
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
cli/Cargo.tomlcli/src/command/core/path.rscli/src/command/extract.rscli/tests/cli/stdio/path_traversal.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- cli/Cargo.toml
ef27537 to
7beda9b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cli/src/command/core/path.rs (1)
125-140: Minor allocation inefficiency.The
into_owned()on line 127 followed bypath.into_owned()on line 136 creates an extra allocation. Consider returning a borrowedCowwhen possible:♻️ Optional: reduce allocations
fn rewrite_path_for_extraction<'a>(&self, path: &'a Path) -> RewrittenPath<'a> { - let raw = path.to_string_lossy().into_owned(); + let raw = path.to_string_lossy(); if self.absolute_paths { RewrittenPath { - path: Cow::Owned(raw), + path: Cow::Owned(raw.into_owned()), had_root: false, } } else { - let (path, had_root) = strip_absolute_path_bsdtar(&raw); + let (stripped, had_root) = strip_absolute_path_bsdtar(&raw); RewrittenPath { - path: Cow::Owned(path.into_owned()), + path: match (raw, stripped) { + (Cow::Borrowed(_), Cow::Borrowed(s)) => Cow::Owned(s.to_owned()), + (Cow::Owned(o), Cow::Borrowed(s)) if s.len() == o.len() => Cow::Owned(o), + (_, stripped) => Cow::Owned(stripped.into_owned()), + }, had_root, } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli/src/command/core/path.rs` around lines 125 - 140, The code calls path.to_string_lossy().into_owned() then later path.into_owned(), causing an extra allocation; change rewrite_path_for_extraction to keep the initial Cow from Path::to_string_lossy() (e.g. let raw = path.to_string_lossy();) and return that Cow directly in the absolute_paths branch or pass it into strip_absolute_path_bsdtar(&raw) in the other branch so you avoid calling into_owned twice; update the RewrittenPath construction to use the Cow values returned by to_string_lossy/strip_absolute_path_bsdtar instead of creating intermediate owned strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cli/src/command/core/path.rs`:
- Around line 125-140: The code calls path.to_string_lossy().into_owned() then
later path.into_owned(), causing an extra allocation; change
rewrite_path_for_extraction to keep the initial Cow from Path::to_string_lossy()
(e.g. let raw = path.to_string_lossy();) and return that Cow directly in the
absolute_paths branch or pass it into strip_absolute_path_bsdtar(&raw) in the
other branch so you avoid calling into_owned twice; update the RewrittenPath
construction to use the Cow values returned by
to_string_lossy/strip_absolute_path_bsdtar instead of creating intermediate
owned strings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7be01763-a16f-4624-9231-6ef038b14c11
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
cli/Cargo.tomlcli/src/command/core/path.rscli/src/command/extract.rscli/tests/cli/stdio/path_traversal.rs
✅ Files skipped from review due to trivial changes (1)
- cli/Cargo.toml
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cli/src/command/core/path.rs (1)
1068-1072: Consider using the actualis_unsafe_link_pathfunction in tests.The
is_unsafe_link_equivalenthelper duplicates the logic ofis_unsafe_link_path. Using the actual function would provide better test coverage of the public API.♻️ Suggested change
- /// Equivalent to is_unsafe_link logic for various link reference patterns. - fn is_unsafe_link_equivalent(s: &str) -> bool { - let (rewritten, had_root) = strip_absolute_path_bsdtar(s); - had_root || has_parent_dir_component(rewritten.as_ref()) - } - #[test] - fn unsafe_link_equivalent_detects_all_unsafe_patterns() { + fn is_unsafe_link_path_detects_all_unsafe_patterns() { // Windows drive prefix - assert!(is_unsafe_link_equivalent("C:/file")); + assert!(is_unsafe_link_path("C:/file")); // ... (update remaining calls)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli/src/command/core/path.rs` around lines 1068 - 1072, Replace the test-local helper is_unsafe_link_equivalent with direct calls to the public function is_unsafe_link_path so tests exercise the actual API; locate tests that call is_unsafe_link_equivalent (which currently uses strip_absolute_path_bsdtar and has_parent_dir_component) and update them to call is_unsafe_link_path with the same inputs, removing the duplicate logic and keeping assertions unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cli/src/command/core/path.rs`:
- Around line 1068-1072: Replace the test-local helper is_unsafe_link_equivalent
with direct calls to the public function is_unsafe_link_path so tests exercise
the actual API; locate tests that call is_unsafe_link_equivalent (which
currently uses strip_absolute_path_bsdtar and has_parent_dir_component) and
update them to call is_unsafe_link_path with the same inputs, removing the
duplicate logic and keeping assertions unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 747d4526-802f-477d-9349-0d2186c8528e
📒 Files selected for processing (2)
cli/src/command/core/path.rscli/src/command/extract.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- cli/src/command/extract.rs
b6fdeb1 to
0ae693b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cli/src/command/core/path.rs (1)
686-694: Consider adding a test combining Windows-style paths with--strip-components.The current tests verify Windows prefix stripping and component stripping separately. A test explicitly combining both (e.g.,
c:/a/b/cwithstrip_components=1) would document the expected interaction and help prevent future regressions.📝 Example test
#[test] fn editor_windows_prefix_with_strip_components() { let editor = PathnameEditor::new(Some(1), None, false, true); // c:/a/b/c: strip 1 component from a/b/c -> b/c (after prefix removal) let name = editor.edit_entry_name(Path::new("c:/a/b/c")).unwrap(); // On both Unix and Windows, should strip "c:/" prefix then one component assert!(name.as_str() == "b/c" || name.as_str() == "a/b/c"); // Note: exact behavior depends on whether prefix counts as component }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cli/src/command/core/path.rs` around lines 686 - 694, Add a unit test that exercises Windows-style prefixes together with strip-components by creating a PathnameEditor (use PathnameEditor::new with Some(1) for strip_components) and calling edit_entry_name on a Windows path like "c:/a/b/c"; assert the resulting string matches the expected post-prefix-and-component-stripped form (reference PathnameEditor::new and PathnameEditor::edit_entry_name to locate where to add the test) so the code documents and prevents regressions in the interaction between Windows prefix stripping and component stripping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cli/src/command/core/path.rs`:
- Around line 686-694: Add a unit test that exercises Windows-style prefixes
together with strip-components by creating a PathnameEditor (use
PathnameEditor::new with Some(1) for strip_components) and calling
edit_entry_name on a Windows path like "c:/a/b/c"; assert the resulting string
matches the expected post-prefix-and-component-stripped form (reference
PathnameEditor::new and PathnameEditor::edit_entry_name to locate where to add
the test) so the code documents and prevents regressions in the interaction
between Windows prefix stripping and component stripping.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c84117da-1132-44c5-8929-39c58786a2c9
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
cli/Cargo.tomlcli/src/command/core/path.rscli/src/command/extract.rscli/tests/cli/stdio/path_traversal.rs
✅ Files skipped from review due to trivial changes (1)
- cli/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (2)
- cli/src/command/extract.rs
- cli/tests/cli/stdio/path_traversal.rs
967fcb9 to
58df187
Compare
58df187 to
a27382a
Compare
Summary by CodeRabbit