org.apache.hadoop.io.compress.GzipCodec, in this class GzipOutputStream is not closed, so memory leak

266 Views Asked by gnnExplorer At 29 July 2025 at 05:59

org.apache.hadoop.io.compress.**GzipCodec, in this class GzipOutputStream is not closed, so memory leak.

How to close GzipOutputStream? Or other stream should also be closed? Is there a good alternative?

spark version is 2.1.0 and hadoop version is 2.8.4

sparkPairRdd.saveAsHadoopFile(outputPath, String.class, String.class, MultipleTextOutputFormat.class, GzipCodec.class);

Original Q&A

There are 1 best solutions below

Stephen C On 17 July 2020 at 10:28

If I am understanding the GzipCodec class correctly, its purpose is to create various compressor and decompressor streams and return them to the caller. It is not responsible for closing those streams. That is the responsibility of the caller.

How to close a GzipOutputStream?

You simply call close() on the object. If saveAsHadoopFile is using GzipCodec to create a GzipOutputStream, then that method is responsible for closing it.

Or other stream should also be closed?

The same as for a GzipOutputStream. Call close() on it.

Is there a good alternative?

To calling close explicitly?

As an alternative, you could manage a stream created by GzipCodec using try with resources.

But if you are asking if there is a way to avoid managing the streams properly, then the answer is No.

If you are actually encountering a storage leak that is (you think) due to saveAsHadoopFile not closing the streams that it opens, please provide a minimal reproducible example that we can look at. It could be a bug in Hadoop ... or you could be using it incorrectly.

org.apache.hadoop.io.compress.GzipCodec, in this class GzipOutputStream is not closed, so memory leak

There are 1 best solutions below

Related Questions in JAVA

Related Questions in HADOOP

Related Questions in GZIPOUTPUTSTREAM

Trending Questions

Popular # Hahtags

Popular Questions