Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour of skip_current_dir when combined with same_file_system (false) and symlinks to other file-systems #166

Open
cipriancraciun opened this issue Aug 20, 2022 · 6 comments

Comments

@cipriancraciun
Copy link

When one uses follow_links (true) but also same_file_system (true), and walkdir encounters a symlink to a folder on a different file-system (which will thus not be recursed-into), it will report it as a file or folder (assuming the pointed-to exists), and thus calling skip_current_dir will in fact skip iterating the parent folder of that symlink.

I don't know if this is actually an issue within walkdir itself, or the code that uses walkdir in such a manner.

Perhaps the best solution is to add a new method DirEntry::is_recursed that states if the current entry is or not actually the subject of recursion, and thus if calling skip_current_dir will affect this entry and not its parent.

Another solution, perhaps best if viewed from a security point-of-view, is to treat symlinks to entities outside the current file-system as broken links, thus the DirEntry::file_type and DirEntry::metadata would actually belong to the symlink and not to the pointed-to file or folder.

@BurntSushi
Copy link
Owner

I unfortunately can't make sense of your first paragraph. You mention the symlink is not recursed into, but also talk about skip_current_dir... So I'm not sure what you mean.

I think examples will help, or else I'll have to repro it when I get time. (Which might be a while.)

@BurntSushi
Copy link
Owner

To be clear, I'm unsure what you're actually calling inconsistent.

@cipriancraciun
Copy link
Author

I'll try to summarize the issue:

  • one uses walkdir with follow_links (true) and same_file_system (true); (plus sorting by name, although I don't think this is essential;)
  • let's take a folder to walk, let's call it r, which contains:
  • a symlink a that points to a folder outside the same file-system r resides on;
  • a symlink b that points to a folder within the same file-system r resides on;
  • both folders that a and b point to are not empty;
  • a file c;

When iterating, walkdir first returns a which .file_type().is_dir() (based on the .metadata() of the pointed-to folder according to follow_links (true)), then it returns b which also is .file_type().is_dir(), then it returns children from b, then it returns c. As expected it skips over a's children because it's not on the same file-system although follow_symlinks (true) was set.

However, if one tries to asses both a and b based only on the .metadata() returned by walkdir::DirEntry, one doesn't see any difference between them, both are .path_is_symlink() and both are folders according to .file_type().is_dir(). Thus if (somewhere down the line) some filter code decides that one of these folders should be not recursed-into and calls .skip_current_dir() the behaviour differs in the two cases:

  • if one calls .skip_current_dir() just after obtaining a from .next(), walkdir stops recursing the parent of a, namely r;
  • if one calls .skip_current_dir() just after obtaining b from .next(), walkdir stops recursing b, thus returning other children from r;

The problem is that walkdir treats both a and b as symlinks to folders (thus the .metadata() is that of the pointed to folder) when it comes for DirEntry purposes, but then it treats them differently when it comes to recursion and the behaviour of .skip_current_dir().

(For example I had tool that uses walkdir and it worked just fine by ignoring two folders that happened to be symlinks to folders on the same file-system, but then when one of the folders was moved on a different file-system, the tool started to misbehave by stopping the iteration of the parent folder.)

My proposal was to fix this corner-case by either making .skip_current_dir() take into account this particularity, or better, by making the DirEntry that represents a in our case be something akin to a "broken symlink", i.e. the .metadata() should return the lstat of the symlink.

To summarize: when it comes to symlinks, .same_file_system (true) should take priority over .follow_links (true), and the code should behave (only when presented with a symlink pointing to another file-system and same_file_system (true)) just as if walkdir was configured with .follow_links (false).

For example in handle_entry one should also check if .opts.same_file_system was set and in that case only self.follow(dent) only if the symlink points inside the same file-system as root.

https://github.com/BurntSushi/walkdir/blob/master/src/lib.rs#L821-L823

@BurntSushi
Copy link
Owner

Thank you for that thorough explanation! That makes things much clearer.

by making the DirEntry that represents a in our case be something akin to a "broken symlink", i.e. the .metadata() should return the lstat of the symlink.

So I do like this answer, but I also worry that it might be problematic to represent a non-broken symlink as if it were a broken symlink. It's difficult to come up with concrete problems here, but I am worried. On the other hand, if one thinks of "same file system" as a conceptual model where anything outside the file system the root directory is on is considered to basically not exist, then I think it does make sense...

Otherwise, I think it seems like the least worst option. I'd prefer not to add any new public APIs to fix this for example, because it's not reasonable to expect folks to deeply reason about this and correctly use some new API to work around it. The other option, making skip_current_dir aware of this peculiarity, also seems less clean to me.

PRs for this are welcome, as I'm not sure when I'll have time to get around to this myself.

@cipriancraciun
Copy link
Author

PRs for this are welcome, as I'm not sure when I'll have time to get around to this myself.

So, you would accept a PR which implements the "broken symlink" workaround as described above?

(I'm not promissing much, but if I get some time I'll try to implement it. At the moment I've made a workaround in my code that doesn't rely on .skip_current_dir().)

@BurntSushi
Copy link
Owner

BurntSushi commented Aug 20, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants