How to produce iso-8859-1 output on some requests in PHP?

707 Views Asked by At

I have a PHP application which generates a simple CSV file using league/csv. Some of the columns contains names/addresses which might have non-ANSI values. My client is requiring that the output CSV file be encoded in iso-8859-1 instead of utf-8 as it is currently.

I believe my problem can be reduced to the following (where response is from laravel):


        $headers = [
            'Content-type' => "text/csv; charset=iso-8859-1",
            'Content-Disposition' => 'attachment; filename="CLI.csv"'
        ];
        return response()->stream(function() {

            $fh = fopen('php://output', 'wb');
            fwrite($fh, "Vià Cittè\n");
            fwrite($fh, mb_convert_encoding("Vià Cittè\n", 'iso-8859-1'));
            fwrite($fh, mb_convert_encoding("Vià Cittè\n", 'iso-8859-1', 'utf-8'));
            fwrite($fh, iconv('utf-8', 'iso-8859-1', "Vià Cittè\n"));
            fwrite($fh, utf8_decode("Vià Cittè\n"));
            fwrite($fh, utf8_encode("Vià Cittè\n"));
            fclose($fh);

        }, 200, $headers);

I would expect at least some of the lines to be Vià Cittè\n encoded in iso-8859-1 but they all end up wrong. This is what I see when I open the output file using iso-8859-1 as encoding:

enter image description here

It appears that the output gets reencoded as utf-8 for some reason.

Can someone tell me how I can avoid having this reencoding issue?


In my real code I'm not writing directly using fopen, I use league/csv with its Writer and CharsetConverter. I have made various attempts but the result is the same as described above.

Note: I'm currently using PHP 7.3 on linux. The php server is inside a docker container behind an nginx proxy (which is in a different docker container).

1

There are 1 best solutions below

3
Álvaro González On

You have several valid conversions, together with obvious random attempts. It's all a matter of doing some proper testing.

Raw Unicode UTF-8 ISO-8859-1
à U+00E0 LATIN SMALL LETTER A WITH GRAVE C3 A0 E0
è U+00E8 LATIN SMALL LETTER E WITH GRAVE C3 A8 E8
$utf8 = "\u{00E0}\u{00E8}";
var_dump($utf8, bin2hex($utf8));

$latin1 = [
    utf8_decode($utf8),
    iconv('UTF-8', 'ISO-8859-1', $utf8),
    mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8'),
];
var_dump(array_map('bin2hex', $latin1));

Assuming everything is configured to use UTF-8 (we aren't cavemen living in 1995) you'll see:

string(4) "àè"
string(8) "c3a0c3a8"
array(3) {
  [0]=>
  string(4) "e0e8"
  [1]=>
  string(4) "e0e8"
  [2]=>
  string(4) "e0e8"
}

I'd skip utf8_decode() because of its extremely confusing name (nobody checks the manual to see what it actually does). The other ones mainly differ on how they handle missing characters:

$utf8 = "€";
var_dump($utf8);

$latin1 = [
    iconv('UTF-8', 'ISO-8859-1', $utf8), # Notice: iconv(): Detected an illegal character in input string
    iconv('UTF-8', 'ISO-8859-1//IGNORE', $utf8),
    iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $utf8),
    mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8'),
];
var_dump(array_map('bin2hex', $latin1));
string(3) "€"
array(4) {
  [0]=>
  string(0) ""
  [1]=>
  string(0) ""
  [2]=>
  string(6) "455552" ------> EUR
  [3]=>
  string(2) "3f" ----------> ?
}