How to detect MacRoman encoding in PHP?

382 Views Asked by At

PHP's mb_detect_encoding() doesn't understand the MacRoman encoding. My app allows users to upload data in csv format and I need to convert it to utf8 because the users are not tech-savvy. I will never be able to get all of them to understand how to do it and control their encoding.

This is what I'm doing:

$encoding_detection_order = array('UTF-8', 'UTF-7', 'ASCII', 'ISO-8859-1', 'EUC-JP', 'SJIS', 'eucJP-win', 'SJIS-win', 'JIS', 'ISO-2022-JP', );

$encoding = mb_detect_encoding($value, $detection_order, true);

$converted_value = iconv($encoding, 'UTF-8//TRANSLIT', $value);

This works great for most situations, but if my user is on a Mac and they save the CSV in MacRoman encoding, then the above code will usually wrongly detect the text as ISO-8859-1 which causes the iconv() to produce bad output.

For example, the accented-e in Jaimé has a hex value of 0x8e in MacRoman. In ISO-8859-1, the 0x8e character is Ž and so when I covert it to utf8, I just get the utf8 version of Ž when I should be getting é.

enter image description here

I need to be able to dynamically differentiate MacRoman from other encodings so that I convert it properly.

0

There are 0 best solutions below