I'm updating a PHP app which imports CSV encoded in UTF-16 (from Google Keyword Planner) and the values are converted to UTF-8.
Until PHP 8 it's working as expected, but from PHP 8.1 there is a ?
added to the values after the conversion from UTF-16 to UTF-8:
var_dump(mb_convert_encoding("\0008\0008\0000\000", "UTF-8", "UTF-16"));
// Output with PHP 8.1.3 - 8.1.13, 8.2.0:
// string(4) "880?"
// Output with PHP 7.4.32, 8.0.8 - 8.0.26:
// string(3) "880"
Your source equals to
"\x00\x38\x00\x38\x00\x30\x00"
, which is 7 bytes and as such an invalid length for UTF-16, which always needs 2 or 4 bytes per character.Solution: provide proper input. Maybe it's also because you misunderstood the octal notation and would see it much better without mixing notation and literals altogether:
'880'
)'8800'
"\x00\x38\x00\x38\x00\x30"
"\x00\x38\x00\x38\x00\x30\x00\x30"
"\x008\x008\x000"
"\x008\x008\x000\x000"
"\000\070\000\070\000\060"
"\000\070\000\070\000\060\000\060"
"\0008\0008\0000"
"\0008\0008\0000\0000"
"\x00". '8'. "\x00". '8'. "\x00". '0'
"\x00". '8'. "\x00". '8'. "\x00". '0'. "\x00". '0'