I need urgent help. I can't compare charset strings. A string written to a database table1 is utf-8 charset
but looks still strange: SADI
However a string written to table2 in the same database is SADI
which is normal.
whenever I compare both, it gives false.
Any idea how comparison can be made? (actually comparison should give true result)
Any idea how I can insert SADI as
SADI
to a database.
Either will be a solution hopefully.
In your strings,
SADI
is standard ASCII string, butSADI
is using full-width Unicode characters.For example,
S
is U+FF33 'FULLWIDTH LATIN CAPITAL LETTER S' (UTF-8:0xEF 0xBC 0xB3
),but
S
is standard ASCII U+0053 'LATIN CAPITAL LETTER S' (UTF-80x53
).Other characters are also similar extended Unicode characters, which look like standard Latin script, but in reality are not.
How did they get there - that's a good question. Probably somebody got really creative and copy-pasted something from Word? Who knows.
You can convert these strange characters back to normal ones by applying Unicode NFKC (Unicode Normalization Form KC) by using this Perl script as a filter (it accepts UTF-8 and outputs normalized UTF-8):
In php:
Requires the intl extension