How to write a FileTypeDetector for zip archives?

1.6k Views Asked by At

For this package, one of my next steps is to write a series of FileTypeDetector to have the method Files.probeContentType() be smarter than what is is by default (the default provided file type detector relies on "file name extensions" only).

As the javadoc of the aforementioned method mentions, this method relies on instances of FileTypeDetectors be declared in a META-INF/services file.

I have first tested with a simple provider to detect PNG files using the file header:

public final class PngFileTypeDetector
    extends FileTypeDetector
{
    private static final byte[] PNG_HEADER = {
        (byte) 0x89,
        (byte) 0x50, (byte) 0x4E, (byte) 0x47,
        (byte) 0x0D, (byte) 0x0A,
        (byte) 0x1A,
        (byte) 0x0A
    };

    private static final int PNG_HEADER_SIZE = PNG_HEADER.length;

    @Override
    public String probeContentType(final Path path)
        throws IOException
    {
        final byte[] buf = new byte[PNG_HEADER_SIZE];

        try (
            final InputStream in = Files.newInputStream(path);
        ) {
            if (in.read(buf) != PNG_HEADER_SIZE)
                return null;
        }

        return Arrays.equals(buf, PNG_HEADER) ? "image/png" : null;
    }
}

It works. Now, after a quick glance at the API, I thought this would be a good way to detect whether a file was a zip:

public final class ZipFileTypeDetector
    extends FileTypeDetector
{
    @Override
    public String probeContentType(final Path path)
        throws IOException
    {
        // Rely on what the JDK has to offer...
        try (
            final InputStream in = Files.newInputStream(path);
            final ZipInputStream z = new ZipInputStream(in);
        ) {
            z.getNextEntry();
            return "application/zip";
        } catch (ZipException ignored) {
            return null;
        }
    }
}

The content of META-INF/services/java.nio.file.spi.FileTypeDetector was this:

com.github.fge.filesystem.ftd.PngFileTypeDetector
com.github.fge.filesystem.ftd.ZipFileTypeDetector

With the current tests, it worked; for the zip I created an empty zip file, for the PNG test I used this image.

Full test:

public final class FileTypeDetectorTest
{
    private FileSystem fs;
    private Path path;

    @BeforeMethod
    public void initfs()
        throws IOException
    {
        fs = MemoryFileSystemBuilder.newLinux().build("testfs");
        path = fs.getPath("/foo");
    }

    @DataProvider
    public Iterator<Object[]> samples()
    {
        final List<Object[]> list = new ArrayList<>();

        String resourcePath;
        String mimeType;

        resourcePath = "/ftd/sample.png";
        mimeType = "image/png";
        list.add(new Object[] { resourcePath, mimeType });

        resourcePath = "/ftd/sample.zip";
        mimeType = "application/zip";
        list.add(new Object[] { resourcePath, mimeType });

        return list.iterator();
    }

    @Test(dataProvider = "samples")
    public void fileTypeDetectionTest(final String resourcePath,
        final String mimeType)
        throws IOException
    {
        @SuppressWarnings("IOResourceOpenedButNotSafelyClosed")
        final InputStream in
            = FileTypeDetectorTest.class.getResourceAsStream(resourcePath);

        if (in == null)
            throw new IOException(resourcePath + " not found in classpath");

        try (
            final InputStream inref = in;
        ) {
            Files.copy(inref, path);
        }

        assertThat(Files.probeContentType(path)).isEqualTo(mimeType);
    }

    @AfterMethod
    public void closefs()
        throws IOException
    {
        fs.close();
    }
}

However...

If I invert the list of implementations in the services file, that is the file now is:

com.github.fge.filesystem.ftd.ZipFileTypeDetector
com.github.fge.filesystem.ftd.PngFileTypeDetector

then the PNG file is detected as being a zip file!

After some debugging I noticed that:

  • opening the PNG as a ZipInputStream did not fail...
  • ... and .getNextEntry() returned null!

I'd have expected at least .getNextEntry() to throw ZipException.

Why didn't it? How can I detect reliably whether a file is a zip?

Further note: this is for Paths; therefore anything File is unusable.

1

There are 1 best solutions below

1
On

Why didn't it?

Well, the JavaDoc for getNextEntry() says that a ZipException or IOException occurs,

if a ZIP file error has occurred

if an I/O error has occurred

respectively.

Based on that wonderfully helpful information (cough), we can't make any assumptions that it will throw an exception if it encounters an invalid entry.

How can I detect reliably whether a file is a zip?

The ZIP file format specification, which was originally PKZip, can be found here. While its all a good read :), take a look at section 4; 4.3.16 in particular. It specifies the "End of central directory record", which all ZIP files have (even empty ones).