After reading "What is the difference between char, nchar, varchar, and nvarchar in SQL Server?" I have a question.
I'm using MS SQL Server 2008 R2
DECLARE @T TABLE
(
C1 VARCHAR(20) COLLATE Chinese_Traditional_Stroke_Order_100_CS_AS_KS_WS,
C2 NVARCHAR(20) COLLATE Chinese_Traditional_Stroke_Order_100_CS_AS_KS_WS
)
INSERT INTO @T VALUES (N'中华人民共和国',N'中华人民共和国')
SELECT LEN(C1) AS [LEN(C1)],
DATALENGTH(C1) AS [DATALENGTH(C1)],
LEN(C2) AS [LEN(C2)],
DATALENGTH(C2) AS [DATALENGTH(C2)]
FROM @T
Returns
LEN(C1) DATALENGTH(C1) LEN(C2) DATALENGTH(C2)
----------- -------------- ----------- --------------
7 12 7 14
Why the second DATALENGTH(C1) is 12?
In your
INSERTyou are converting text from unicode to chinese codepage forC1. Most likely this process alters the text and something may be lost.Here is SQL Fiddle.
You can see that the second character
华is stored as3Finvarchar. You can also see that the last character国is also stored as3Finvarchar.3Fis a code for?. When Windows tries to convert text from unicode to the codepage and certain character can't be represented in the given codepage, the conversion function (most likelyWideCharToMultiByte) puts?for such characters .One more example. The last, but one character
和is encoded asA94Dinvarcharand8C54innvarchar. If you look it up in Character Map it will show these codes (unicode and codepage):See also:
What does it mean when my text is displayed as Question Marks?
https://www.microsoft.com/middleeast/msdn/Questionmark.aspx
This is exactly what is happening when you store a unicode literal
N'中华人民共和国'in avarcharcolumn. The unicode text is converted to multi-byte and some characters can't be represented in that code page and they are replaced by question marks?.