Preserve filenames on NFS between Windows/Linux

3.6k Views Asked by At

Is there any way of configuring the NFSClient or how the share is mounted on Windows or Linux so that I can preserve filenames across systems?

Currently we have a large number of files that were written on Windows and have now been moved to Google Filestore (NFSv3) so that they can be accessed from other servers. The problem is that many of the files have swedish characters in the name (Å Ä Ö) and when these files are listed in the opposite system to which they were created, the filename becomes unreadable (There is no problem with file contents, just the name)

Currently I am planning on programmatically renaming all the files to remove the offending characters, but would prefer to not have to do this if possible.

Below is an example of how it looks from the Windows and Linux sides. The Linux file being creted on Linux and the Windows one created on windows.

Linux

enter image description here

Windows

enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

This answer may not help you fix the problem, but I thought I'd give some theoretical overview that might help your (and other people's) research.

You might also want to read this.

Anyway, Here we go:

There's a whole lot at play here.

  • There's the encoding used by NTFS and whatever filesystem Google Filestore is using.
  • There's the encoding supported by the programs you're using to create & view those file names.
  • There's the encoding supported by the terminal programs you're using.
  • There's the encoding supported by NFSv3.

Filesystems

On Linux, file names only have 2 rules: They cannot contain a slash (/), and they cannot contain the null byte (\0). ASCII and UTF-8 are compatible with this rule, and those are basically the encodings that linux filesystems support.

Windows might have different ideas. There might be some configuration that's needed to have the windows filesystem emit characters in a different encoding.

Creating & Listing files

On Linux, your file names are almost always encoded in UTF-8. Then, ls and kin generally don't think too much and just assume the above rule that filesystems require.

Windows' dir obviously knows how to work with NTFS' character encoding, but Can it read Linux' UTF-8 file names? To my best understanding, it supports it with some configuration.

Terminal

Modern Linux terminal programs are all UTF-8, but support for other character sets (because Windows) might need to be installed.

On Windows, it seems to have not been fully supported as of last year. Maybe that's changed, or maybe you'd need another terminal. The above configuration might help.

NFS

NFSv4.1 and up have explicit support for UTF-8 and an explicit goal of Unix <-> Windows interoperability.

NFSv3 does not have any of that, and support for anything non-ASCII is not guaranteed.

I found one implementation which supports UTF-8 over NFSv3, but Google Filestore's documentation only says "supports any NFSv3-compatible client".

What to Do

Go ahead and rename the files. Interoperability has even more issues, e.g. different conceptions of what characters are reserved, there are so many limitations, that your best bet is to make sure all file names are simple plain ASCII, and I would even avoid things like spaces in file names, it makes life a whole lot easier.