The following code produces files which is deterministic (shasum is the same) for two strings.
try(
FileOutputStream fos = new FileOutputStream(saveLocation);
GZIPOutputStream zip = new GZIPOutputStream(fos, GZIP_BUFFER_SIZE);
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(zip, StandardCharsets.UTF_8));
){
writer.append(str);
}
Produces:
a.gz f0200d53f7f9b35647b5dece0146d72cd1c17949
However, if I take the file on the command line and re-zip it, it produces a different result
> gunzip -n a.gz ;gzip -n a ; shasum a.gz
50f478a9ceb292a2d14f1460d7c584b7a856e4d9 a.gz
How can I get it to match the original sha using /usr/bin/gzip and gunzip ?
I think that the problem is likely to be the Gzip file header.
The Gzip format has provision for including a file name and file timestamp in the file headers. (I see you are using the
-nwhen uncompressing and recompressing ... which is probably correct here.)The Gzip format also includes an "operating system id" in the header. This is supposed to identify the source file system type; e.g. 0 for FAT, 3 for UNIX, and so on.
Either of these could lead to differences in the Gzip files and hence different hashes.
If I was going to solve this myself, I would start by using
cmpto see where the compressed file differences start, and thenodto identify what the differences are. Refer to the Gzip file format spec to figure out what the differences mean:Assuming that the difference is the OS id, I don't think there is a practical way to solve this with the
gzipandgunzipcommands.I looked at the source code for
GZIPOutputStreamin Java 11, and it is not promising.The hard-wiring is in a
privatemethod and would be next to impossible to "fix" by subclassing or reflection. You could copy the code and fix it that way, but then you have to maintain your variantGZIPOutputStreamclass indefinitely.(I would be looking at changing the application ... or whatever ... so that I didn't need the checksums to be identical. You haven't said why you are doing this. It is for testing purposes only, try looking for a different way to implement the tests.)