Spark empty _metadata file in parquet output

337 Views Asked by At

I am using an Oozie workflow to generate a parquet file. Occasionally, when I try to read the file using spark, I get the following exception

java.io.IOException: Could not read footer: java.lang.RuntimeException: hdfs://ip-10-1-2-243.ec2.internal:8020/path/to/file/_metadata is not a Parquet file (too small)

After deleting the metadata file, I can read in the rest of the files normally. I would like to know what causes Spark to output an empty _metadata file, and how I can avoid it in the future.

0

There are 0 best solutions below