Skip to content

Improve implementation of remove_dir_all_recursive for unix#154829

Open
asder8215 wants to merge 1 commit intorust-lang:mainfrom
asder8215:remove_dir_all_iterative
Open

Improve implementation of remove_dir_all_recursive for unix#154829
asder8215 wants to merge 1 commit intorust-lang:mainfrom
asder8215:remove_dir_all_iterative

Conversation

@asder8215
Copy link
Copy Markdown
Contributor

@asder8215 asder8215 commented Apr 4, 2026

This PR essentially changes the underlying remove_dir_all implementation used by unix and I believe uefi platforms from a recursive-based approach to an iterative version. The order it removes directories and files from the recursive version should be preserved in this iterative version. I'm curious if this iterative approach could show any performance improvement over the current recursive based approach that remove_dir_all has on unix/uefi platforms.

I also made a comment on how this iterative-based approach is redundantly holding a PathBuf in the tuple (which could incur a big allocation cost if we are removing deeply nested directories). This could be optimized if we are able to retrieve a reference to the path that ReadDir struct is holding, as it contains this information in its InnerReadDir's root field; I believe this would also make the iterative approach allocate less memory than the recursive approach as we would only need the main directory/nested directories &Path values when we are removing it.

I can make an ACP for ReadDir struct root path to be accessible via a method if that would be appropriate.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 4, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 4, 2026

r? @jhpratt

rustbot has assigned @jhpratt.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @ChrisDenton, libs
  • @ChrisDenton, libs expanded to 8 candidates
  • Random selection from Mark-Simulacrum, jhpratt

@asder8215 asder8215 changed the title Changed implementation of remove_dir_all from recursive to an iterative version Change implementation of remove_dir_all from recursive to an iterative version Apr 4, 2026
// in the ReadDir struct through a method that return &Path, we can reduce memory usage
// allocated to this VecDeque.

let mut directories = VecDeque::new();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, noting that this could use the std library LinkedList implementation if the resizing cost from VecDeque is undesirable.

@jhpratt
Copy link
Copy Markdown
Member

jhpratt commented Apr 7, 2026

@rustbot reroll

@rustbot rustbot assigned Mark-Simulacrum and unassigned jhpratt Apr 7, 2026
@Mark-Simulacrum
Copy link
Copy Markdown
Member

Can you share more about the motivation for doing this? You mention a performance improvement, but don't mention any specific benchmarks (I personally would expect the syscall cost to dominate any improvement here, but I could be wrong).

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 11, 2026
@asder8215
Copy link
Copy Markdown
Contributor Author

asder8215 commented Apr 11, 2026

Can you share more about the motivation for doing this? You mention a performance improvement, but don't mention any specific benchmarks (I personally would expect the syscall cost to dominate any improvement here, but I could be wrong).

As it stands now, there are some things in the PR that I need to add that I believe would result in an improvement over the current recursive version of this function. I made an ACP for a couple of changes that I think would be nice to introduce to allow for this improvement. I admit that I'm running off intuition and haven't done benchmarking for this yet.

However, my motivation for this comes from seeing how ReadDir was constructed through each platform's readdir function + seeing that DirEntry returns a PathBuf. This is the unix's readdir function, but in all the platform implementations (unix, uefi, windows, etc.) it takes in a &Path value and allocates a copy of that to keep as a PathBuf inside the ReadDir struct. I think the bigger issue here is that if we're going to receive a PathBuf value from a DirEntry, and we're recursing on that entry because it's a (nested) directory, we might as well let that PathBuf value be owned by the ReadDir struct rather than allocating that path again.

From the curr impl of remove_dir_all on unix/uefi, just pointing out pertinent parts of the code here in comments:

fn remove_dir_all_recursive(path: &Path) -> io::Result<()> {
    for child in fs::read_dir(path)? { // <-- allocating for a PathBuf here
        let result: io::Result<()> = try {
            let child = child?;
            if child.file_type()?.is_dir() {
                // child.path() returns a PathBuf. instead of passing a ref
                // for `ReadDir` to make a copy of it, just let it be owned
                // by `ReadDir` directly
                remove_dir_all_recursive(&child.path())?;
            } else {
                fs::remove_file(&child.path())?;
            }
        };
        if let Err(err) = &result
            && err.kind() != io::ErrorKind::NotFound
        {
            return result;
        }
    }
    ignore_notfound(fs::remove_dir(path))
}

The other thing to point out is that because ReadDir has an owned PathBuf in its struct (or inner struct), we can call something like ignore_notfound(fs::remove_dir(readdir.path()) and get a path reference of the directory we are reading from from ReadDir itself; at this point, the recursive version seems to be unnecessarily allocating a &Path in each recursive call for the nested directory when we only need those &Path from the nested directory at the time of removing it (when the ReadDir iterator is exhausted). I think these two together made me think that an iterative version might as well replace this recursive version.

I did read through the libs team notes on this proposal, and they made a good point that specialization could work here (which makes the std::fs::owned_read_dir part unnecessary). I'm aware that specialization needs to be fleshed out in Rust, but I do wonder if min_specialization would optimize the redundant allocation here.

@asder8215
Copy link
Copy Markdown
Contributor Author

asder8215 commented Apr 12, 2026

I was thinking about this yesterday and I think we could preserve the recursive implementation, but we could change the signature of the function to fn remove_dir_all_recursive(read_dir: ReadDir) -> io::Result<()>? If we can have ReadDir be constructed through both &Path/PathBuf (using specialization to avoid redundant allocation for PathBuf) and expose a method that returns a path reference from ReadDir, then we can optimize this function (avoids a syscall for heap allocating ReadDir on the VecDeque/LinkedList and instead allocating it on the stack, which it already did before).

@asder8215 asder8215 changed the title Change implementation of remove_dir_all from recursive to an iterative version Improve implementation of remove_dir_all_recursive for unix Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants