Convert � � to Emoji in HTML using PHP

5.5k Views Asked by At

We have a bunch of surrogate pair (or 2-byte utf8?) characters such as �� which is the prayer hands emojis stored as UTF8 as 2 characters. When rendered in a browser this string renders as two ??

example: ��

I need to convert those to the hands emjoi using php but I simply cannot find a combination of iconv, utf8_decode, html_entity_decode etc to pull it off.

This site converts the �� properly:

http://www.convertstring.com/EncodeDecode/HtmlDecode

Paste in there the following string

Please join me in this prayer. ��❤️

You will notice the surragate pair �� (��) converts to

This site is claiming to use HTMLDecode but I cannot find anything inside php to pull this off. I have tried: iconv html_entity_decode and a few public libraries.

I admit I am no expert when it comes to converting character encodies around!

2

There are 2 best solutions below

3
Tyler F On BEST ANSWER

I was not able to find a function to do this, but this works:

$str = "Please join me in this prayer. ��❤️";
$newStr = preg_replace_callback("/&#.....;&#.....;/", function($matches){return convertToEmoji($matches);}, $str);
print_r($newStr);
function convertToEmoji($matches){
    $newStr = $matches[0];
    $newStr = str_replace("&#", '', $newStr);
    $newStr = str_replace(";", '##', $newStr);
    $myEmoji = explode("##", $newStr);
    $newStr = dechex($myEmoji[0]) . dechex($myEmoji[1]);
    $newStr = hex2bin($newStr);
    return iconv("UTF-16BE", "UTF-8", $newStr);
}
0
mickmackusa On

I'd like to take a moment to clean up TylerF's working code.

Code: (3v4l.org Demo)

$str = "Please join me in this prayer. ��❤️";
echo preg_replace_callback(
         "/&#(\d{5});&#(\d{5});/",
         function($m) {
             return iconv("UTF-16BE", "UTF-8", hex2bin(dechex($m[1]) . dechex($m[2])));
         },
         $str
     );

Original Output:

Please join me in this prayer. ❤️

Current Output:

Warning: iconv(): Wrong encoding, conversion from "UTF-16BE" to "UTF-8" is not allowed
  • Changed dots to digit character matching and employed capture groups to simplify subsequent processes.
  • No more str_replace() or explode() calls in the custom function.
  • No single-use variable declarations.

Same technique with PHP7.4 arrow function syntax (Sandbox demo that actually works):

$str = "Please join me in this prayer. ��❤️";
var_export(
    preg_replace_callback(
        "/&#(\d{5});&#(\d{5});/",
        fn($m) => iconv("UTF-16BE", "UTF-8", hex2bin(dechex($m[1]) . dechex($m[2]))),
        $str
    )
);