I use mb_convert_encoding function to convert UTF8 characters to SJIS characters.
Before conversion:でんぱ組 出会いの歌26 カミソヤマ ユニ
After conversion: て?んは?組 出会いの歌26 カミソヤマ ユニ
Non-convertible characters: て?んは?
Code used to convert :
$str = mb_convert_encoding('でんぱ組 出会いの歌26 カミソヤマ ユニ', "SJIS", "UTF-8");
でas 1 grapheme is only a rendering of composing the 2 Unicode codepointsてand◌゙(not to be confused with the codepoint゛that can't be combined) - the former can be translated from UTF-8 to Shift-JIS, the latter not.Same thing with:
ぱ- it's combined fromはand◌゚instead of being one single character:e3 81 a6e3 81 a6e3 82 99e3 81 b1e3 81 afe3 82 9aor CP932
82 c582 c482 cf82 cdJust because you see 1 grapheme (f.e. で or ぱ) in Unicode (f.e. in UTF-8) it doesn't mean it is build from 1 codepoint. You can neither trust your eyes, nor your user's input - it can either be really 1 codepoint or not. You have to normalize your UTF-8 text (f.e. to the NFC form) before converting it to Shift-JIS, as then those 2 codepoints (U+3067 and U+3099) for 1 grapheme also become 1 codepoint (U+3066), which can then also be translated to Shift-JIS without problems (
82 c5).In PHP the extension
intlmust be installed, then you can usenormalizer_normalize()- the result of that function can then be fully converted to Shift-JIS.