mb_convert_encoding() with UTF-16 input in PHP > 8.1

657 Views Asked by Daniel At 17 August 2025 at 22:45

I'm updating a PHP app which imports CSV encoded in UTF-16 (from Google Keyword Planner) and the values are converted to UTF-8.

Until PHP 8 it's working as expected, but from PHP 8.1 there is a ? added to the values after the conversion from UTF-16 to UTF-8:

var_dump(mb_convert_encoding("\0008\0008\0000\000", "UTF-8", "UTF-16"));

// Output with PHP 8.1.3 - 8.1.13, 8.2.0:
// string(4) "880?"

// Output with PHP 7.4.32, 8.0.8 - 8.0.26:
// string(3) "880"

Original Q&A

There are 2 best solutions below

AmigoJack On 29 December 2022 at 17:45 BEST ANSWER

Your source equals to "\x00\x38\x00\x38\x00\x30\x00", which is 7 bytes and as such an invalid length for UTF-16, which always needs 2 or 4 bytes per character.

You're lucky enough PHP7 did silently accept the first 6 bytes and drop the 7th,
while PHP8 now produces a more correct output as per UTF-16 LE and wants to tell you that there is an imcomplete 4th character, because there's only 1 byte for it.

Solution: provide proper input. Maybe it's also because you misunderstood the octal notation and would see it much better without mixing notation and literals altogether:

approach	only 6 bytes (value `'880'`)	make it 8 bytes (value `'8800'`
full hexadecimal notation	`"\x00\x38\x00\x38\x00\x30"`	`"\x00\x38\x00\x38\x00\x30\x00\x30"`
mixed hexadecimal notation	`"\x008\x008\x000"`	`"\x008\x008\x000\x000"`
full octal notation	`"\000\070\000\070\000\060"`	`"\000\070\000\070\000\060\000\060"`
mixed octal notation	`"\0008\0008\0000"`	`"\0008\0008\0000\0000"`
concatenated string to make it more clear	`"\x00". '8'. "\x00". '8'. "\x00". '0'`	`"\x00". '8'. "\x00". '8'. "\x00". '0'. "\x00". '0'`

Rick James On 17 September 2023 at 19:27

Avoid PHP, simply use MySQL and its LOAD DATA INFILE. Be sure to set the character set to utf16 or utf16le, depending on the "endian-ness".

mb_convert_encoding() with UTF-16 input in PHP > 8.1

There are 2 best solutions below

Related Questions in PHP

Related Questions in UTF-8

Related Questions in UTF-16

Related Questions in OCTAL

Related Questions in MB-CONVERT-ENCODING

Trending Questions

Popular # Hahtags

Popular Questions