How to replace the invisible in plain text 'alt-255' symbols

2.8k Views Asked by At

I have a plain text that after being opened with the Text Document works fine and there is nothing weird with it.But when I open it with MS Word and when I click on the "Show paragraphs" options, some of the spaces show up like a symbol similar to the Degrees symbol(a little cirlce. You can see it if you type alt+255 in a MS Word document).I am wondering how to get rid of it.It shows up because at some places of the outgoing string I had $nbsp's that I removed but I guess that there is after-effect.

I hope that someone can help.It is really annoying.

1

There are 1 best solutions below

0
On

The problem is likely to be one of character sets. In my testing alt-number didn't work in windows so I did it in a text editor called Scite and copy and pasted into windows. The character alt-255 when copy and pasted created the degrees symbol with the "show paragraphs" option" but saved as character A0. This is the windows-1252 character for a non-breaking space (which seems to match up to what is expected given they appear where you had non breaking spaces).

C# by default uses unicode as its string encoding so if I were to load my file into c# it would assume that it was unicode unless I told it otherwise. in my case my file is 61 A0 62 A0 63 which is "a b c" (where the spaces are actually non breaking spaces). When c# loads this is reads the a, b and c correctly but A0 is not a valid unicode character (or the beginning of one) so as a result it loads it as unicode character 65533 (REPLACEMENT CHARACTER) which is what is used when it finds an uninterpretable character.

With my test if I load it and specify the encoding is codepage 1252 then it correctly loads the nonbreaking space and I can then use string.replace to replace it.

        result = File.ReadAllText("testfile.txt", System.Text.Encoding.GetEncoding(1252));
        result = result.Replace((char)160, ' ');

The bottom line is to ensure that when you load this file you use the correct encoding so that it interprets the character correctly. Assuming that you have generated the file yourself you should know what encoding it is using.

One last note is that as I mentioned in comment your problem sounds like it might be that you are not stripping out the non-breaking spaces as you think you are since they seem to be in your saved file. Though the above answers the question of how to get rid of them in a file you would be better off dealing with the problem at source and never putting them in the file in the first place. Perhaps open up another question with details of how you are creating your file asking why it is saving out the non-breaking spaces.