Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve case-sensitive path comparison #20

Open
2 of 3 tasks
ForNeVeR opened this issue Apr 21, 2024 · 1 comment
Open
2 of 3 tasks

Improve case-sensitive path comparison #20

ForNeVeR opened this issue Apr 21, 2024 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@ForNeVeR
Copy link
Owner

ForNeVeR commented Apr 21, 2024

I suggest the following changes.

  1. Introduce three different path comparator kinds.
    • Textual only. This one should operate on strict string equality, and named accordingly (something like StrictStringPathComparer?).

    • Platform-default comparer: should implement case-sensitive comparison on Linux, and case-insensitive (probably with corresponding relaxations related to Unicode normalization) on Windows and macOS.

    • File-system-aware comparer: for each compared path component, should compare the actual case sensitivity of the corresponding file system subroot. For non-existent paths, it should use the platform-dependent policy of calculating the case sensitivity for new subdirectories (is it normally taken from the parent directory?).

      This one is obviously IO-intensive, so I'm thinking of introducing some sort of "sensitivity cache" that'd store the lists of checked paths and subtrees in a trie data structure, and would be used for one or multiple operations (probably one per comparer instance, with the ability of manual reset).

  2. Allow the paths to use different comparers; platform-default being used by default, as one giving the best precision while not losing performance dur to intensive IO.
@ForNeVeR ForNeVeR added enhancement New feature or request help wanted Extra attention is needed labels Apr 21, 2024
Kataane added a commit to Kataane/TruePath that referenced this issue Jul 20, 2024
@ForNeVeR
Copy link
Owner Author

Since @Kataane asked a question about the "file-system-aware" comparer in #84, I decided to elaborate on it here.

You see, in the real world, there is no such thing as a "case-sensitive operating system". There is a "case-sensitive path", or a "subtree", if you will. So, in the harsh reality, each path on the disk has its own comparison rules!

On Windows, you can control this on per-path basis using fsutil file setCaseSensitiveInfo, see details here.

On macOS there are some other crazy ways to switch this, and on Linux, this is obviously at least a per-mount point thing (as most common drivers try to support Windows case-insensitivity natively).

The third path comparer would request this information from the actual file systems that are inspected, during path comparison, and use it when needed.

In particular, let's imagine this scenario: you are on Windows, and have the following directory structure:

C:\ [case-insensitive, default]
C:\Path [case-sensitive]
C:\Path\Subpath [case-sensitive]
C:\Path\Subpath\Insensitive [case-insensitive, say it was manually restored after creating this dir]

And our comparer is asked a question: are paths C:\Path\SubPath\Insensitive and C:\Path\Subpath\Insensitive equal or not?

I imagine it should work like this:

  • Compare item by item
    • C:\: equal in both paths, good
    • Path: equal in both still good
    • Subpath vs SubPath: not equal, investigation required
      • check the case sensitivity contract for the path C:\Path\
      • store it in cache (for faster comparison in future)
      • return false: paths are different

So, as the result of comparing paths C:\Path\SubPath\Insensitive and C:\Path\Subpath\Insensitive, we get the result false, and the cache (that might be kept per comparer instance for now) gets information about C:\Path\ (that its children are stored in a case-sensitive way).

Obviously, this will require quite a lot of work from us, and it will be quite slow in practice (magnitudes slower than the default comparers). But I believe it is a "must have" feature of a file system path library.

babaruh added a commit to babaruh/TruePath that referenced this issue Sep 17, 2024
ForNeVeR added a commit that referenced this issue Oct 6, 2024
ForNeVeR added a commit that referenced this issue Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant