Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter_entry misbehaves when contents_first is enabled #171

Open
selendym opened this issue Feb 4, 2023 · 5 comments · May be fixed by #198
Open

filter_entry misbehaves when contents_first is enabled #171

selendym opened this issue Feb 4, 2023 · 5 comments · May be fixed by #198

Comments

@selendym
Copy link

selendym commented Feb 4, 2023

Hello.

It seems that when contents_first is enabled, filter_entry misbehaves when the deferred directories are filtered out.

Example:

fn main() {
    let _ = std::fs::create_dir_all("/tmp/walkdir/foo");
    let _ = std::fs::create_dir_all("/tmp/walkdir/bar");
    let _ = std::fs::File::create("/tmp/walkdir/foo/file");
    let _ = std::fs::File::create("/tmp/walkdir/bar/file");

    println!("Without filter_entry:");
    for entry in walkdir::WalkDir::new("/tmp/walkdir")
        .contents_first(true)
        .into_iter()
    {
        println!("  {entry:?}");
    }

    println!("With filter_entry:");
    for entry in walkdir::WalkDir::new("/tmp/walkdir")
        .contents_first(true)
        .into_iter()
        .filter_entry(|entry| !entry.file_type().is_dir())
    {
        println!("  {entry:?}");
    }
}

Output:

Without filter_entry:
  Ok(DirEntry("/tmp/walkdir/bar/file"))
  Ok(DirEntry("/tmp/walkdir/bar"))
  Ok(DirEntry("/tmp/walkdir/foo/file"))
  Ok(DirEntry("/tmp/walkdir/foo"))
  Ok(DirEntry("/tmp/walkdir"))
With filter_entry:
  Ok(DirEntry("/tmp/walkdir/bar/file"))

Expected output for the latter part:

With filter_entry:
  Ok(DirEntry("/tmp/walkdir/bar/file"))
  Ok(DirEntry("/tmp/walkdir/foo/file"))

This seems to be caused by double popping IntoIter.stack_list:

As a fix, a check for contents_first in https://github.com/BurntSushi/walkdir/blob/master/src/lib.rs#L1051 could perhaps work.
This would not fix direct usage of skip_current_dir, however.

Best regards.

@kartonrad
Copy link

This also happens when setting .follow_symlinks(false), while filtering out hidden entries,
and then encountering a hidden symlink on windows

filter_entries tries to delete the entry twice, wich leads to the iterator exiting the current folder without looking at any other entries

@kartonrad
Copy link

just ran into this by chance

@kenchou
Copy link

kenchou commented Dec 5, 2023

I encountered a similar issue.

test case:

mkdir -p /tmp/test-walkdir/{a,b,c}
for entry in walkdir::WalkDir::new("/tmp/test-walkdir")
    .contents_first(true)
    .sort_by(|a, b| a.file_name().cmp(&b.file_name()))
    .filter_entry(|e| e.file_name().to_string_lossy() != "a")
    .into_iter()
{
    println!("{}", entry.path().display());
}

output:

/tmp/test-walkdir

missing items b, c

output after change filter to .filter_entry(|e| e.file_name().to_string_lossy() != "b")

/tmp/test-walkdir/a
/tmp/test-walkdir

missing item c

It seems to have broken the iterator after encountering a filtered entry.

@wtachau

This comment was marked as duplicate.

@Daimonion1980
Copy link

Thanks @kikuomax for the proposed fix.

I think there should be one more thing to consider.
When I use filter_entry to filter out hidden folder (for example .git folders) I want the skip dir function to work. Because everything inside a folder the filter is trying to hide, should be hidden.
When using contents_first option and your proposed fix, only the the last part of the folder, which matches the filter criteria entry in the end of the path, will be sorted out. Of course the filter could be even more complex to match the criteria that also parts of the path will be sorted out, but I think that this should not be the intention of filter_entry.

Wouldn't it be better to provide the contents_first property to the iterator (e.g. deal with the direction of the iterator)?
Maybe something like this? https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.rev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants