efficiently get count from ZipInputStream in Kotlin

56 Views Asked by At

Note, I DO NOT have direct access to the file, just an inputStream. Right now, I read through the ZipInputStream twice. The first time just to get the item count (Btw, having the total size would be even better). I'm using the following for unpacking the zip. Note it creates a sequence. But it's a sequence that can only iterated through once.

            ZipInputStream(inputStream).use { zipInputStream ->
            generateSequence { zipInputStream.nextEntry }.forEach { entry ->

                val file = File(outputDirectory, entry.name)
                val canonicalPath = file.canonicalPath
                val path = file.path
                if (!canonicalPath.startsWith(outputDirStr) && !path.startsWith(outputDirStr)) {
                    throw SecurityException("Dangerous Zip entry: $entry")
                }

                if (entry.isDirectory) {
                    file.mkdirs()
                } else {
                    file.parentFile?.mkdirs()
                    file.outputStream().use { zipOutputStream ->
                        zipInputStream.copyTo(zipOutputStream)
                    }
                }
            }
        }

Right now I do the loop above where I don't unpack anything and just run count on the sequence. Then run the loop above to unpack. But its slow. It goes away for quite a long time just getting the count. The idea is provide progress on the unpacking (I'm unpacking rather large zips in an Android app). So the question really is: What's the most efficient way to get count. It seems like there should be nice way to create a single sequence that I can walk twice, once to get the count and once to unpack.

1

There are 1 best solutions below

0
johngray1965 On

Since I'm constructing the zip file, I just write the file count into a file and add it to the zip as the first file, here's an extract of the zipper:

        val count = inputDirectory.walkTopDown().count()
        var current = 0
        var lastPercent = 0
        val outputZipFile = File(outputDirectory, BACKUP_FILE)
        val countFile = File(inputDirectory, COUNT_FILE)
        IOUtils.writeString(countFile.toOkioPath(), count.toString())
        ZipOutputStream(BufferedOutputStream(FileOutputStream(outputZipFile))).use { zos ->
            addFileToZip(countFile, inputDirectory, zos)
            inputDirectory.walkTopDown().forEach { file ->
                if (file.name != COUNT_FILE && file.name != BACKUP_FILE) {
                    current++
                    val percent = (current * PERCENT_MULTIPLIER) / count
                    if (percent != lastPercent) {
                        update(percent)
                        lastPercent = percent
                    }
                    addFileToZip(file, inputDirectory, zos)
                }
            }
        }

IOUtils.writeString, as you might guess, just writes the count out to a file. addFileToZip is just a few lines of code that constructs data to the zip entry and adds into the zipOutputStream.

the unzip now looks this:

        var count = 0
        var current = 0
        var lastPercent = 0

        ZipInputStream(inputStream).use { zipInputStream ->
            generateSequence { zipInputStream.nextEntry }.forEach { entry ->

                current++
                if (count > 0) {
                    val percent = (current * PERCENT_MULTIPLIER) / count
                    if (percent != lastPercent) {
                        update(percent)
                        lastPercent = percent
                    }
                }

                val file = File(outputDirectory, entry.name)
                val canonicalPath = file.canonicalPath
                val path = file.path
                if (!canonicalPath.startsWith(outputDirStr) && !path.startsWith(outputDirStr)) {
                    throw SecurityException("Dangerous Zip entry: $entry")
                }

                if (entry.isDirectory) {
                    file.mkdirs()
                } else {
                    file.parentFile?.mkdirs()

                    if (file.name == COUNT_FILE) {
                        count = zipInputStream.bufferedReader().readText().toInt()
                        Timber.d("restoreFiles count: $count")
                    } else {
                        file.outputStream().use { zipOutputStream ->
                            zipInputStream.copyTo(zipOutputStream)
                        }
                    }
                }
            }
        }

It works, and there's virtually no cost to getting the count during the unzip.