Converting string from CP866 to UTF8

3.9k Views Asked by At

I have database(MSSQL) and it has a table with translations for Product names. One of the languages is russian.

Example of a database entry ¸ą¤®åą ­Øā«ģ using Universal Cyrillic decoder I managed to find out that it is Прдохранитль as well as that the source encoding is CP866 and I need it to get WIndows-1257 or utf-8.

How to do this in C#?

I tried something like

string line = "¸ą¤®åą ­Øā«ģ";

Encoding cp866 = Encoding.GetEncoding("CP866");
Encoding w1257 = Encoding.GetEncoding("windows-1257");
byte[] cp866Bytes = cp866.GetBytes(line);
byte[] w1257Bytes = Encoding.Convert(cp866, w1257, cp866Bytes);
var lineFinal = w1257.GetString(w1257Bytes);

Could anyone help me?

The result for the given code is ?a?¤Raa -Oa?<g

2

There are 2 best solutions below

0
On BEST ANSWER

Leaving aside questions about how such string could end up in the database in first place, you can convert it like this:

string line = "¸ą¤®åą ­Øā«ģ";
Encoding w1257 = Encoding.GetEncoding("windows-1257");
Encoding cp866 = Encoding.GetEncoding("CP866");            
var lineFinal = cp866.GetString(w1257.GetBytes(line));

Because your original string appears to use 1257 code page, and you need CP866.

Note that this specific string is a big damaged still, it results in Предохр нитель and correct word is Предохранитель (so we have space instead of а at index 8). However, original string also contains space at this position, so this damage is not result of decoding (probably you just copied it wrong into the question).

0
On

Your problem is that you are doing it the other way around. line does not show Cyrillic. The characters you are looking at are Windows-1257 characters. When you save a string as an encoding, you are matching the symbols to that encoding, not interpreting them as that encoding, meaning this will only corrupt it further.

Also realize that text in .Net has no encoding (or, no encoding you need to care about, anyway). A String is just a String, a series of unicode characters. Encoding only becomes relevant when you need it as bytes.

Since we know that those characters, when in the Windows-1257 encoding, will contain the correct byte values needed to view them in CP866, but at this moment they are pure-unicode String and not Windows-1257, you need to first convert it to windows-1257 bytes, and then interpret those bytes as being CP866.

String line = "¸ą¤®åą ­Øā«ģ";
Encoding cp866 = Encoding.GetEncoding("CP866");
Encoding w1257 = Encoding.GetEncoding("windows-1257");
Byte[] w1257Bytes = w1257.GetBytes(line);
String lineFinal = cp866.GetString(w1257Bytes);