streamwriter and special characters & ’

56 Views Asked by At

To my understanding when I write a file using notepad++ I can write the symbols ’ and & without a problem in a text file. Both are valid ASCI symbols and they are not that exotic either. & = 38 decimal 26 in hex ’ = 44 decimal 2C in hex

I try to write both out in streamwriter (.net core) I used various text encodings, but somehow it fails One of them depending of the encoding gets broken to \uxxxx Is there an encoding type that works for both ?

my code


filedata = "& Test ’   ";
// filedata = filedata.Replace("\\u0026", "&"); //extra added should not be needed with Test
// filedata = filedata.Replace("\\u2019", "’"); //extra added should not be needed with Test

// Write the updated content back to the file with exclusive access
using (var fileStream = new FileStream(filePath, FileMode.Truncate, FileAccess.Write, FileShare.None)) {

    // I used various combinations for below, also Encoding.ASCI, ...UTF.. ,  Encoding.Asci  etc..
    using (var writer = new StreamWriter(fileStream, new UTF8Encoding(true))) {
        await writer.WriteAsync(filedata);
        }
    }

The other application needs it, I think its valid UTF-8 Bom which should not translate these symbols. But maybe its something else, notepad++ shows me in clear text the \uxxx when i write the file with C# While if i type the symbols in notepad++ and open the file i dont see the \uxxxx
I trust notepad++ a bit more.

Notably over the wire it all goes fine. It's the file saving causing me a headache literally.

1

There are 1 best solutions below

2
Marc Gravell On

I believe this is simply a misreading of the ASCII table; 44/0x2C is not , it is , (comma). Character (code-point 8217, sometimes called ’ in HTML etc) is a non-ASCII character, and in UTF-8 will be written with 3 bytes: E2-80-99. I suspect your text viewer is configured to interpret non-ASCII (or possibly just "not valid in the selected encoding") characters with \u escaping, but that's just the tool trying to help you; that isn't the actual bytes. To comment on the bytes, only a raw hex viewer will suffice.