I am using an Oozie workflow to generate a parquet file. Occasionally, when I try to read the file using spark, I get the following exception
java.io.IOException: Could not read footer: java.lang.RuntimeException: hdfs://ip-10-1-2-243.ec2.internal:8020/path/to/file/_metadata is not a Parquet file (too small)
After deleting the metadata file, I can read in the rest of the files normally. I would like to know what causes Spark to output an empty _metadata file, and how I can avoid it in the future.