Is there a way to get the possible compression ratio of a file just reading it?
You know, some files are more compressible then others... my software has to tell me the percentage of possible compression of my files.
e.g.
Compression Ratio: 50%
-> I can save 50% of my file's space if I compress it
Compression Ratio: 99%
-> I can save only 1% of my file's space if I compress it
Java - Calculate File Compression
3.9k Views Asked by Oneiros AtThere are 3 best solutions below

Not possible without examining the file. The only thing you can do is have an approximate ratio by file extension based on statistics gathered from a relative large sample by doing actual compression and measuring. For example a statistical analysis will likely show that .zip, .jpg are not heavily compressible, but files like .txt and .doc might be heavily compressible.
The results of this will be for rough guidance only and will probably be way off in some cases as there's absolutely no guarantee of compressible-ness by file extension. The file could contain anything no matter what the extension say it may or may not be.
UPDATE: Assuming you can examine the file then you can use the java.util.zip
APIs to read the raw file and compress it and see what the before/after difference is.

First, this will depend largely on the compression method you choose. And second, I seriously doubt it's possible without computation of time and space complexity comparable to actually doing the compression. I'd say your best bet is to compress the file, keeping track of the size of what you've already produced and dropping/freeing it (once you're done with it, obviously) instead of writing it out.
To actually do this, unless you really want to implement it yourself, it'll probably be easiest to use the java.util.zip class, in particular the Deflater
class and its deflate
method.
Firstly, you need to work on information theory. There are two theory about information theory field:
So, you can't find compressed size without evaluating actual compression. But, if you need an approximation, you can rely on Shannon's entropy theory and build a simple statistical model. Here is a very simple solution:
Your estimation will be more or less same as ZIP's default compression algorithm (deflate). Here is a more advanced version of same idea (be aware it uses lots of memory!). It actually uses entropy to determine blocks boundaries to apply segmentation for dividing file into homogeneous data.