MySQL: mix of encoding in same column

157 Views Asked by At

I use OCS Inventory for inventorying of our PCs. It works almost fine, but... but 99% of these PCs works under Windows(R): XP and Seven. And there no any standard for WMI information.

For example, OCS Agent (resident program on target PC, which collects data and sends them to OCS Server) may get response as "Realtek PCIe GBE Family Controller (1 Gb/s)" (pure English, encoding is not an issue), or as "¨£ ¡¨â­®¥ á¥â¥¢®¥ ¯®¤ª«î祭¨¥ Intel(R) 82566DM-2 (100 Mb/s)" (incorrect, Russian, CP1251), or as "Realtek RTL8169/8110 Family Gigabit Ethernet NIC - Ìèíèïîðò ïëàíèðîâùèêà ïàêåòîâ (100 Mb/s)" (incorrect, Russian, CP866). Or "VIA Rhine III Fast Ethernet Adapter - Минипорт планировщика пакетов (100 Mb/s)" (correct, Russian, UTF8 or another Unicode)

I have no way to make Windows return these results in only one encoding: it seems to be up to the minds of driver manufacturers: which encoding was used in the driver's .INF file, that encoding will be returned by WMI.

Mostly it is not a problem, but my bosses very dislike to see "hieroglyphs" in quarterly reports about the state of our IT infrastructure. And they seems to be right.

Is there any way to re-encode single fields from one encoding into another on-the-fly? I will not re-encode the entire column because of mixed contents. I can roughly determine which encoding was used, but I don't know how to re-encode using only MySQL's SQL language to produce correct reports.

Any working suggestions please?

1

There are 1 best solutions below

0
On

It looks like the only way was writing the stored procedure. I made it. Left column (castbin) is the text before processing, right (converted) is the same text after processing

enter image description here

Let see the code

...
#above and below are regular MySQL statements
@castbin:=cast(networks.description as char character set binary) as castbin,
@convv:=convert(repcxaxex(@castbin) using cp866) as converted
...

@castbin variable used just for clarification and readability. All the job done by repCxAxEx function, named after "replace 0xCx 0xAx with 0xCx 0xEx". Here's the very inoptimal but working code of this function (taken from MySQL Studio)

See corrected text of the function below in update section

First we looking if the string is convertable, then doing two conversions - for 0xC2 and 0xC3 prefixes, because these prefixes means different encoding errors.

UPDATE: Thorough testing catched some mistakes.

Somewhere inside the query, I use group_concat operator to merge the values of several properties of same PC into one long string with "\n" as separator. In such cases the conversion may not work.

-- --------------------------------------------------------------------------------
-- Routine DDL
-- Note: comments before and after the routine body will not be stored by the server
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE DEFINER=`root`@`%` FUNCTION `repcxaxex`(s2c text charset binary) RETURNS blob
begin
/*исправляем проблемы с кодировками*/
    set @i:=0;
    set @s:=s2c;
    set @altered:='0';
    if ((left(@s,1)=char(0xc2)) or instr(@s,char(32,0xc2)) or instr(@s,char(0x0a,0xc2))) then
    while @i<16 do
        set @s:=replace(@s,char(0xc3,(160+@i)),char(0xd3,(224+@i)));
        set @i:=@i+1;
    end while;
    set @i:=1;
    while @i<4 do
        set @s:=replace(@s,char(0xc0+@i),'');
        set @s:=replace(@s,char(0xd0+@i),'');
        set @i:=@i+1;
    end while;
    set @altered:='1';
    end if;
    set @i:=0;
    if ((left(@s,1)=char(0xc3)) or instr(@s,char(32,0xc3)) or instr(@s,char(0x0a,0xc3))) then
    while @i<16 do
        set @s:=replace(@s,char(0xc3,(176+@i)),char(0xd3,(224+@i)));
        set @i:=@i+1;
    end while;
    set @i:=1;
    while @i<4 do
        set @s:=replace(@s,char(0xc0+@i),'');
        set @s:=replace(@s,char(0xd0+@i),'');
        set @i:=@i+1;
    end while;
    set @altered:='2';
    end if;
/*Добавляем 0 или 1 в начало строки, чтобы показать, конвертировали ее или нет
выводить надо будет, начиная со второго символа*/
    set @s=concat(@altered,@s);
    return @s;
end

On exit the function adds one digit into the beginning of returned string. The digit itself is determined by the @altered variable, and its code is self-explaining.

More correct form of invocation is:

If (left(repcxaxex(string-to-convert),1)='0',string-to-convert,mid(convert(repcxaxex(string-to-convert) using cp866),2))