If I do this from command line on my Mac (UTF-8 in terminal and so is the file):
tr -cd '[:print:]\n' < infile > outfile
I get a different result in the outfile than I am running the same command on a Linux system (UTF-8 in terminal and so is the file).
What can be the reason for this?
This is a sample character that is still there when running the command on Mac: š (the character is an extended ASCII character 0x9A/s with caron). The same character is removed when running the command on Linux.
Unfortunately, as Karol C has shown below in the
trsource, it does not support Unicode at all, so the behavior on Linux for a UTF-8 file is just not going to work if the file contains any multibyte sequences.According to this database, the U+009A character is a control character and not a printable character. The name of the character is "SINGLE CHARACTER INTRODUCER". It appears that the glyph as rendered on that page visually matches the description that you've provided, but that is not how it is being displayed on Linux. However there is another character that is "s with a caron". Unicode can be complicated.
According to Wikipedia, the "š" (s with caron) character is actually U+0161 for the lower case and U+0160 for the capital.
This also aligns with this database: