After reading "What is the difference between char, nchar, varchar, and nvarchar in SQL Server?" I have a question.
I'm using MS SQL Server 2008 R2
DECLARE @T TABLE
(
C1 VARCHAR(20) COLLATE Chinese_Traditional_Stroke_Order_100_CS_AS_KS_WS,
C2 NVARCHAR(20) COLLATE Chinese_Traditional_Stroke_Order_100_CS_AS_KS_WS
)
INSERT INTO @T VALUES (N'中华人民共和国',N'中华人民共和国')
SELECT LEN(C1) AS [LEN(C1)],
DATALENGTH(C1) AS [DATALENGTH(C1)],
LEN(C2) AS [LEN(C2)],
DATALENGTH(C2) AS [DATALENGTH(C2)]
FROM @T
Returns
LEN(C1) DATALENGTH(C1) LEN(C2) DATALENGTH(C2)
----------- -------------- ----------- --------------
7 12 7 14
Why the second DATALENGTH(C1) is 12?
In your
INSERT
you are converting text from unicode to chinese codepage forC1
. Most likely this process alters the text and something may be lost.Here is SQL Fiddle.
You can see that the second character
华
is stored as3F
invarchar
. You can also see that the last character国
is also stored as3F
invarchar
.3F
is a code for?
. When Windows tries to convert text from unicode to the codepage and certain character can't be represented in the given codepage, the conversion function (most likelyWideCharToMultiByte
) puts?
for such characters .One more example. The last, but one character
和
is encoded asA94D
invarchar
and8C54
innvarchar
. If you look it up in Character Map it will show these codes (unicode and codepage):See also:
What does it mean when my text is displayed as Question Marks?
https://www.microsoft.com/middleeast/msdn/Questionmark.aspx
This is exactly what is happening when you store a unicode literal
N'中华人民共和国'
in avarchar
column. The unicode text is converted to multi-byte and some characters can't be represented in that code page and they are replaced by question marks?
.