What encoding does ZipArchive use to store file names inside the created archive?

6.9k Views Asked by At

I'm using the php ZipArchive class in order to generate a zip archive. I use the second parameter of the addFile method in order to set the name of the file in the archive (since the real file on disk has a different name). Some of the names must contain french accents (such as é). When I download the archive, the accents aren't correctly displayed in the file name. What encoding should I use for the file names ? (the application uses UTF-8)

4

There are 4 best solutions below

3
On

Zip files don't have a specified encoding; the archive tool must guess (or assume) the encoding used. Try CP1252 first, then go from there.

6
On

Use DOS encoding. My file names has cyrillic characters, so I'm encoding the file names from cp1251 (Windows) to cp866 (DOS), upon passing the filename to $zip->addFile().

EDIT 2024-02-20: here is what I'm doing to convert UTF-8 that contain Cyrillic characters into DOS chartable on Linux.

function utf8cp866($t) {
    if (stristr(PHP_OS, 'WIN')) return $t; // don't need to convert it on Windows.

    // fixing for Ukrainian "Ii" and quotes.
    return iconv('utf-8', 'cp866',
         str_replace('і', 'i', // Ukrainian i to latin. They look identically in unicode, but different characters. DOS cp866 table duesn’t support this character.
         str_replace('І', 'I',
         str_replace('"', '', // quotes can’t be unzipped with correct path :(
         str_replace('«', '', // these characters are not exist in DOS table
         str_replace('»', '',
         $t))))));
}
1
On

It is php bug #53948, see official bug report.

Suggested workaround (worked for me):

$zip->addFile($file, iconv("UTF-8", "CP852", $local_name));
0
On

Depends on the Windows system e.g French, internal zip of Windows use IBM850 as encoding.