Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot handle certain normalized forms of unicode. Example: é (e + U+0301) vs é (U+00E9) #16

Open
Jwink3101 opened this issue Feb 27, 2023 · 24 comments

Comments

@Jwink3101
Copy link

I am using FUSE-T with rclone. I am, however, 99% sure this is a FUSE-T issue and not rclone.

I have a file with the following character: . That is e + U+0301 or UTF8 encoded: e\xcc\x81. This consistently breaks listing the directory. If I change it to the normalized form é which is (U+00E9) or \xc3\xa9 in UTF8, it works fine.

Rclone handles the name just fine in listing and I even see it in the rclone logs. I have the following files to demonstrate (I added around it to see if anything listed first):

Source Dir:

$ ls -1

before
test1 (e + U+0301) é.txt
test2 (U+00E9) é.txt
z-after

Mount Dir:

$ ls
ls: fts_read: Permission denied

The rclone -vv log is: rclog.log. This seems to indicate rclone isn't having the issue but rather FUSE-T.

Are there other tests I can do to assists? Additional logging? It is 100% reproducible on my machine so just let me know.

@macos-fuse-t
Copy link
Owner

I cannot reproduce your issue.
This is what I did:
Created a file with \xcc\x81 name encoding (é) on Linux and Mac machines
Mounted a Linux with sshfs and listed the folder. It worked as expected.
Mounted a Mac with loop fs (fusexmp_fh in libfuse examples). ls worked as expected.
I wonder which target platform are you using with rclone? I assume that fuse-t is the latest version.

@Jwink3101
Copy link
Author

I assume that fuse-t is the latest version.

According to brew I am when I run brew upgrade fuse-t

I wonder which target platform are you using with rclone?

Local to test. crypted-wrapped local is where I discovered the issue.


I do not know how to do the loopback fs example but am willing to try if you show me. I did install sshfs from brew install fuse-t-sshfs and didn't have the issue.

Maybe my assertion that this is FUSE-T and not rclone is incorrect.

I created a forum post with some additional detail. Included in that is the full log where I noticed the lines:

2023/02/27 07:13:23 DEBUG : Adding "-o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC" for macOS
2023/02/27 07:13:23 DEBUG : Local file system at /Users/jwinokur/Desktop/mount_test/source: Mounting with options: ["-o" "attr_timeout=1" "-o" "fsname=/Users/jwinokur/Desktop/mount_test/source" "-o" "subtype=rclone" "-o" "max_readahead=131072" "-o" "atomic_o_trunc" "-o" "daemon_timeout=600" "-o" "volname=Users jwinokur Desktop mount_test source" "-o" "noappledouble" "-o" "modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC"]

I wonder if those are causing issues. I will wait for the rclone developers to confirm if/how to disable that config but can you also test to see if enabling them causes the problem? And help deduce if it is (a) the appropriate flag and (b) the appropriate response?

@ncw
Copy link

ncw commented Feb 27, 2023

Does FUSE-T use macOS UTF-8 NFD form for UTF-8 internally like OSXFUSE does? If not then the "-o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC" will be doing the wrong thing.

If not then this makes it subtly incompatible with OSXFUSE.

This issue shows the NFD problem macfuse/macfuse#585 and the reason why rclone applies that iconv rule.

If FUSE-T only shows UTF-8 NFC in its external interfaces then that is probably a good design decision, but it is different to what OSXFUSE does.

@macos-fuse-t
Copy link
Owner

It looks like an encoding issue, although I think FUSE-T handles it correctly as tested with sshfs and loopback fs. I will look into the issue though.
I didn't know there are different UTF-8 encodings.

@macos-fuse-t
Copy link
Owner

Unicode conversion is handled by this go module: https://pkg.go.dev/unicode/utf8

@Jwink3101
Copy link
Author

For what it's worth, adding the -o modules=iconv,from_code=UTF-8,to_code=UTF-8 rclone flags to disable the conversion, as suggested by @ncw on the forum post fixes it.

I am not sure where this bug belongs at this point. Does FUSE-T treat incompatibility with OSXFUSE as a bug? Should a more modern (and awesome by the way!!!) project inherit the technical debt of its predecessor?

@macos-fuse-t
Copy link
Owner

Just checked and It's possible to normalize strings as UTF-8 NFD as opposed to NFC, but I'm not sure it should be a default option. macOS Finder is happy with whatever encoding I throw at it. I can add a mount flag "-o utf8-nfd", would that be helpful?

@ncw
Copy link

ncw commented Feb 28, 2023

Just checked and It's possible to normalize strings as UTF-8 NFD as opposed to NFC, but I'm not sure it should be a default option. macOS Finder is happy with whatever encoding I throw at it.

I think OSXFUSE needs to have the NFD form as that is what the kernel interfaces of macOS are expecting.

I'm guessing since FUSE-T plugs into the NFS layer that the NFS layer is dealing with the NFC to NFD translation for you?

I can add a mount flag "-o utf8-nfd", would that be helpful?

This should be the default if you want 100% compatibility with OSXFUSE I think.

I'm not sure that is a good idea though as NFD encoding is a pain to deal with.

We can write that you'll need -o modules=iconv,from_code=UTF-8,to_code=UTF-8 in the docs for rclone and FUSE-T or perhaps get rclone to auto detect FUSE-T somehow (any ideas?).

@macos-fuse-t
Copy link
Owner

I don't think NFD is needed for macOS anymore, perhaps that was true for older macOS but now you can create a file containing whatever encoding and it would be shown fine.

@ncw
Copy link

ncw commented Feb 28, 2023

I don't know a great deal about this, but I found a nice explainer here: https://eclecticlight.co/2021/05/08/explainer-unicode-normalization-and-apfs/

So I think you are right for APFS but HFS+ volumes will still require NFD. I've no idea on the relative popularity of these things (not a mac user) so maybe it is irrelevant now.

@kapitainsky
Copy link

kapitainsky commented Jun 19, 2023

This is still an issue on macOS.

-o modules=iconv,from_code=UTF-8,to_code=UTF-8 option only solves problem that NFC encoded folders and file names do not disappear but they are not accessible by Finder.

For some reason iconv and FUSE-T do not work as expected

Here my test served by rclone via mount (FUSE-T)

Original data NFC and NFD encoded folder and file:

drwxr-xr-x  1 kptsky  staff  0 Jun 19 12:07 NFCééééDIR
drwxr-xr-x  1 kptsky  staff  0 Jun 19 12:08 NFDééééDIR
-rw-r--r--  1 kptsky  staff  6 Jun 15 19:20 NFCéééFILE.txt
-rw-r--r--  1 kptsky  staff  4 Jun 15 07:10 NFDéééFILE.txt

This is also what I see in mount with -o modules=iconv,from_code=UTF-8,to_code=UTF-8 but:

image

NFC file can not be open

NFD one works.

You can see also that NFC and NFD have different icons - NFC one is generic for txt as preview can not be generated.

When I try -o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC - only NFC encoded things are visible and are accessible from Finder

image

This is a bit of surprise as I would expect that this option should produce consistent NFD as in the example below:

$ echo -e "éé\c" | hexdump -C
00000000  c3 a9 65 cc 81                                    |..e..|

$ echo -e "éé\c" | iconv -f UTF-8 -t UTF-8 | hexdump -C
00000000  c3 a9 65 cc 81                                    |..e..|

$ echo -e "éé\c" | iconv -f UTF-8 -t UTF-8-MAC | hexdump -C
00000000  65 cc 81 65 cc 81                                 |e..e..|

Now looking for possible solution I was thinking that:

  1. as @macos-fuse-t mentioned mount flag "-o utf8-nfd" could be added - I think it would be extremely useful - maybe also "-o utf8-nfc" ? It could potentially allow to develop fully working solution.

  2. As described here Apple is aware of NFC/NFD problems with NFS and suggests:

This a known NFS issue with precomposed and decomposed.
As mentioned, Linux systems preform precomposed file names (NFC), while macOS/iOS userspace frameworks all default decomposed (NFD).
So no matter what is provided to them (NFC or NFD) any pathname that comes in from an Apple framework will always be in NFD and the FS has to deal with.
You should mount your NFS share using “nfc” parameter to instruct the client to use precomposed instead of the default decomposed.
We were able to open both precomposed/decomposed files and folders while mounting with “nfc” enabled. Please let us know if it helps to resolve the issue.

mount_nfs manual page :
nfc Convert name strings to Unicode Normalization Form C (NFC) when sending them to the NFS server. This option may be used to improve interoperability with NFS clients and servers that typically use names in the NFC form.

FUSE-T uses NFS so maybe issues comes from this problem.

Would it be possible to add this as optional mount flag "-o nfc"?

With these new flags I could try to find working rclone/fuse-t solution.

@alexfs
Copy link

alexfs commented Jun 19, 2023

I can definitely add "-o nfc" mount flag. Would that be enough to make rclone working or nfc-nfd/nfd-nfc conversion also needed?

@kapitainsky
Copy link

I can definitely add "-o nfc" mount flag. Would that be enough to make rclone working or nfc-nfd/nfd-nfc conversion also needed?

It would be great if possible - I think that really needed are:

"-o nfc"
"-o utf8-nfd"

And this would be nice to have:

"-o utf8-nfc"

@alexfs
Copy link

alexfs commented Jun 19, 2023

I'm a bit confused. There are two peers: macos and user
on macos side "-o nfc" flag instructs whether nfc or nfd encoding is used. If not given nfd is the default.
I think what you want is a user peer flag encoding "user-utf8-nfc". If not given nfd is the default.

So there would be four cases:

  1. No flags meaning no conversion
  2. "-o nfc" and "user-utf8-nfc". mount executed with nfc. No conversion performed by fuse-t
  3. "-o nfc". mount executed with nfc and nfc-nfd conversion performed between macos/user
  4. "user-utf8-nfc". nfd-nfc conversion performed between macos/user

Is this correct analysis?

@kapitainsky
Copy link

Yes your logic is correct and I wish it works like that - however reality seems to be different and I do not have all answers yet.

In theory -o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC should do the trick and send to rclone everything in NFD - but it does not. As in my example above something strange happens and NFC names are converted to NFD (and work on macOS) but any NFD ones are just gone....

The new flags would be different approach to the old problem - which I hope we can at last tackle completely.

We might have to make some changes on rclone side as well - I am not sure about it yet.

@kapitainsky
Copy link

kapitainsky commented Jun 19, 2023

With no conversion all NFD files work but only when full path is NFD as well. So if I put NFD file into NFC folder it can't be opened in Finder:

image

(white icons are files you can open)

@kapitainsky
Copy link

And where it gets really confusing is that with no conversion all files are accessible in bash/zsh - it is similar to the issue with NFS I related earlier to.

https://openradar.appspot.com/FB8957502

@alexfs
Copy link

alexfs commented Jun 19, 2023

Ok, let's start with "-o nfc" option for now. Later I will add more options if needed. "-o nfs" will just passthrough characters between both ends.

@kapitainsky
Copy link

Other question is how you can programmatically distinguish between OSXFUSE and FUSE-T? OSXFUSE will slowly become history but for now still a lot of people use it - ideally rclone could detect what is used and apply correct options.

@kapitainsky
Copy link

Thank you for adding -o nfc. Preliminary testing shows that now I can safely save NFD encoded files from macOS to mount and they end up in filesystem as NFC.

Now only problem is with NFD files already in the filesystem. I can't assume that they will be always NFC - as users can upload them using other means than mount. For this I hope -o utf8-nfd will do the trick.

@alexfs
Copy link

alexfs commented Jun 20, 2023

What would -o utf8-nfd do?

@kapitainsky
Copy link

cloud ---(NFC,NFD )---> rclone ---(NFC,NFD )---> fuse-t ---(NFD) ---> macOS

@alexfs
Copy link

alexfs commented Jun 21, 2023

Afraid it won't work. You can't mix both NFC and NFD. With -o nfc flag macos converts everything to NFC meaning fuse-t won't know if the original file is NFC or NFD. What can be done in case -o nfc is not specified is to convert the rclone side to NFD.

@kapitainsky
Copy link

Thanks for your help. Indeed now there are two options - with -o nfc rclone side should normalize all to NFC or without extra options to NFD.

But still if possible and easy to add I think -o utf8-nfd would give us more options:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants