Parquet file not writing correctly

552 Views Asked by At

I've recently upgraded Parquet.Net to version 4.6.0, which necessitated changing a lot of the method calls to their asynchronous versions.

This code creates a file without any thrown errors:

        string file = @"c:\temp\test.parquet";
        var dataFields = new DataField[2];
        dataFields[0] = new DataField("dtUTC", DataType.Int64);
        dataFields[1] = new DataField("val", DataType.Double);
        var schema = new ParquetSchema(dataFields);
        var dtUTC = new long[3];
        var val = new double[3];

        using (Stream fileStream = System.IO.File.OpenWrite(file))
        {
            using (var parquetWriterTask = ParquetWriter.CreateAsync(schema, fileStream))
            {
                parquetWriterTask.Wait();
                var parquetWriter = parquetWriterTask.Result;
                using (ParquetRowGroupWriter groupWriter = parquetWriter.CreateRowGroup())
                {
                    var col0 = Array.CreateInstance(typeof(long), dtUTC.Length);
                    for(int i=0;i< dtUTC.Length;i++) col0.SetValue(dtUTC[i], i);
                    var writeT = groupWriter.WriteColumnAsync(new Parquet.Data.DataColumn(dataFields[0], col0));
                    writeT.Wait();
                    var col1 = Array.CreateInstance(typeof(double), val.Length);
                    for (int i = 0; i < val.Length; i++) col1.SetValue(val[i], i);
                    var writeT2 = groupWriter.WriteColumnAsync(new Parquet.Data.DataColumn(dataFields[1], col1));
                    writeT2.Wait();
                }
            }
        }

However, when I try to read that file, either using a 3rd party application, or my own code shown below, I get errors along the lines of

Destination is too short

or

Unable to read beyond the end of the stream

(Depending on which application I am using to read the file)

The code I use to read is:

        using (Stream file = System.IO.File.OpenRead(fileName))
        {
            using (var readerTask = ParquetReader.CreateAsync(file))
            {
                readerTask.Wait();
                var reader = readerTask.Result;
                if (reader.RowGroupCount != 1) throw new Exception("reader.RowGroupCount = " + reader.RowGroupCount);
                var dataFields = reader.Schema.GetDataFields();
                using (var gr = reader.OpenRowGroupReader(0))
                {
                    Parquet.Data.DataColumn[] columns = parquet.getCols(gr, dataFields);// dataFields.Select(gr.ReadColumn).ToArray();
                    output($"\tGroup 0: columns.Length={columns.Length}");
                }
            }
        }

What am I doing wrong when writing this parquet file?

0

There are 0 best solutions below