Sort Multiple language data in SQL Server by collation (Why my query does not work)

2k Views Asked by At

I have a table column which have multiple language data.

I need to sort according to my specified language like Hindi.

How to achieve it though collation or something else?

  • Column name: Comments
  • Type : Nvarchar(MAx)

Sample column data

    This is an example
    To je příklad.
    هذا مثال على ذلك.
    उदाहरण है.
    यह एक उदाहरण है.
    ಈ ಒಂದು ಉದಾಹರಣೆಯಾಗಿದೆ.

Output after 'Hindi' sorting should be :

उदाहरण है.
यह एक उदाहरण है.
This is an example
To je příklad.
هذا مثال على ذلك.
ಈ ಒಂದು ಉದಾಹರಣೆಯಾಗಿದೆ.

Script used:

CREATE TABLE dbo.Data
( unicodeData NVARCHAR(200)
)
GO
INSERT INTO dbo.Data ( unicodeData)
VALUES
 ( N'This is an example')
, (N'यह एक उदाहरण है.')
, (N'उदाहरण है.')
, (N'ಈ ಒಂದು ಉದಾಹರಣೆಯಾಗಿದೆ.')
, (N'एक उदाहरण है.')
, (N'هذا مثال على ذلك.')
, (N'To je příklad.');
GO

select * from dbo.unicodeData
order by unicodedata Collate Indic_General_100_CI_AI

2

There are 2 best solutions below

4
On

Use Order by Column COLLATE syntax. Need to use Indic_General_90_CI_AS collation. Since it seems that

The reference says that Hindi (deprecated in this(2005) release) discontinued and replaced by Indic_General_90_CI_AS (Unicode only).

SELECT Comments from TABLEX
ORDER BY Comments 
COLLATE Indic_General_90_CI_AS ASC;

Collation and sort normally used one language only. You want multiple language sorting, you need to add custom values to your data. I added languageSortingPriority column to your table, then used languageSortingPriority column for your sorting.

CREATE TABLE dbo.unicodeData
( 
  languageUsed VARCHAR(50)
  ,languageSortingPriority int
, unicodeData NVARCHAR(200)
, nonUnicodeData VARCHAR(200) -- same data in a normal VARCHAR column for comparison
, comments VARCHAR(100)
)
GO
INSERT INTO dbo.unicodeData (languageUsed, languageSortingPriority,unicodeData, nonUnicodeData, comments)
VALUES
 ('English', 2, N'This is an example', N'This is an example', NULL)
, ('Hindi', 1,N'यह एक उदाहरण है.', N'यह एक उदाहरण है.', 'Using the preceding N in both strings but VARCHAR is still a ?')
, ('Hindi', 1, N'यह एक उदाहरण है.', 'यह एक उदाहरण है.', 'Not using the preceding N in both strings so both are a ?')
, ('Kannada' ,3, N'ಈ ಒಂದು ಉದಾಹರಣೆಯಾಗಿದೆ.', N'ಈ ಒಂದು ಉದಾಹರಣೆಯಾಗಿದೆ.', NULL)
, ('Arabic', 3,N'هذا مثال على ذلك.', N'هذا مثال على ذلك.', NULL)
, ('Czech', 2 ,N'To je příklad.', N'To je příklad.', NULL);
GO

select * from dbo.unicodeData
order by languageSortingPriority,unicodedata Collate Indic_General_100_CI_AI DESC

See sql fiddle here.

0
On

Good day, I have already answered the OP question at MSDN forums. basically you need to sort by each one of the collations.

you can see the original thread here, with the answer: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/ccc1d16f-926f-46c8-8579-b2eecf661e7c/sort-miultiple-language-data-in-sql-serevr-by-collation?forum=transactsql

You can see an article on this issue on TechNet here: http://social.technet.microsoft.com/wiki/contents/articles/31194.t-sql-sort-data-by-multiple-languages.aspx

I hope this useful :-)