portable way to sort column according to chinese pinyin in rails

1.2k Views Asked by At

Is there a portable way to sort columns according to Chinese pinyin (汉语拼音) in rails using either Activerecord or standard SQL statement, regardless of the underlying database configuration. If this is not possible, what is recommend way to perform it on postgresql.

Here provide an approach for mysql database with CHARSET as gb2312.

    SQL code

        mysql> create table t_cosler( 
             -> f_PY char primary key, 
             -> cBegin SMALLINT UNSIGNED not null, 
             -> cEnd SMALLINT UNSIGNED not null -> ); 
        Query OK, 0 rows affected (0.09 sec) 

        mysql> insert into t_cosler values 
        -> ('A',0xB0A1,0xB0C4), 
        -> ('B',0xB0C5,0xB2C0), 
        -> ('C',0xB2C1,0xB4ED), 
        -> ('D',0xB4EE,0xB6E9), 
        -> ('E',0xB6EA,0xB7A1), 
        -> ('F',0xB7A2,0xB8C0),  
        -> ('G',0xB8C1,0xB9FD), 
        -> ('H',0xB9FE,0xBBF6), 
        -> ('J',0xBBF7,0xBFA5),  
        -> ('K',0xBFA6,0xC0AB), 
        -> ('L',0xC0AC,0xC2E7), 
        -> ('M',0xC2E8,0xC4C2), 
        -> ('N',0xC4C3,0xC5B5), 
        -> ('O',0xC5B6,0xC5BD), 
        -> ('P',0xC5BE,0xC6D9), 
        -> ('Q',0xC6DA,0xC8BA), 
        -> ('R',0xC8BB,0xC8F5), 
        -> ('S',0xC8F6,0xCBF9), 
        -> ('T',0xCBFA,0xCDD9), 
        -> ('W',0xCDDA,0xCEF3), 
        -> ('X',0xCEF4,0xD188), 
        -> ('Y',0xD1B9,0xD4D0), 
        -> ('Z',0xD4D1,0xD7F9);

         Query OK, 23 rows affected (0.16 sec) Records: 23 Duplicates: 0 Warnings: 0 
        mysql> select * from o_personnel;
        +------+------------+ 
        | A_Id | A_UserName | 
        +------+------------+ 
        | 1 | 首先 | 
        | 2 | 检查 | 
        | 3 | 我们 | 
        | 4 | 的二 | 
        | 5 | 进制 | 
        | 6 | 是否 | 
        | 7 | 适合 | 
        | 8 | 你的 | 
        | 9 | 平台 | 

        +------+------------+ 9 rows in set (0.00 sec) 
     mysql> select p.*,c.*
    -> from o_personnel p , t_cosler c
    -> where  CONV(HEX(left(A_UserName,1)),16,10) between c.cBegin and c.cEnd;

+------+------------+------+--------+-------+
| A_Id | A_UserName | f_PY | cBegin | cEnd  |
+------+------------+------+--------+-------+
|    4 | 的二       | D    |  46318 | 46825 |
|    2 | 检查       | J    |  48119 | 49061 |
|    5 | 进制       | J    |  48119 | 49061 |
|    8 | 你的       | N    |  50371 | 50613 |
|    9 | 平台       | P    |  50622 | 50905 |
|    1 | 首先       | S    |  51446 | 52217 |
|    6 | 是否       | S    |  51446 | 52217 |
|    7 | 适合       | S    |  51446 | 52217 |
|    3 | 我们       | W    |  52698 | 52979 |
+------+------------+------+--------+-------+
9 rows in set (0.00 sec)

mysql>
2

There are 2 best solutions below

0
On BEST ANSWER

To solve this problem, I have just written a gem toPinyin, simply gem install toPinyin

require 'toPinyin'
words = "
检查
我们
的二
进制
是否
适合
你的
平台".split("\n")

words.sort! {|a ,b|   a.pinyin.join <=> b.pinyin.join }
0
On

I'm going to go out on a limb and say "No, not regardless of the database configuration using SQL". I'm not sure what controls Rails sorting.

The sort order in SQL database management systems is controlled by a collation. Depending on the dbms, you might be able to set the collation at the server, database, table, or column level. And, depending on the dbms, you might even be able to specify a collation to be used at run time in a query.

I'm pretty sure that SQL Server supports all those levels. PostgreSQL does not. I don't know about MySQL.

To sort pinyin, choose an appropriate collation. (I don't know which one is appropriate.) Some of the details are under "Locale Support"--not under "collation"--in the PostgreSQL docs.

I'm not sure to what extent ruby relies on anything from the operating system--locale settings, code pages, character encoding--to do its own sorting. But if it relies on anything from the OS, I'd have to say it's probably not 100% reliable. (In that different implementations under different operating systems might have slightly different results.)