How can I convert parquet-dotnet's columns to individual models?

1.2k Views Asked by At

parquet-dotnet has an example I'm trying to work with that looks like this:

using (Stream fileStream = System.IO.File.OpenRead("c:\\test.parquet"))
{
   using (var parquetReader = new ParquetReader(fileStream))
   {
      DataField[] dataFields = parquetReader.Schema.GetDataFields();

      for(int i = 0; i < parquetReader.RowGroupCount; i++)
      {
         using (ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))
         {
            DataColumn[] columns = dataFields.Select(groupReader.ReadColumn).ToArray(); 
         }
      }
   }
}

The concern I have is with the columns line. If I have data that looks like this, from a table perspective:

ID Name
1 Test1
1 Test2

I want to map this data from the parquet file to a model that looks exactly like that. The issue that I have is that the data comes out from columns looking like this:

columns[0].Data[0] - 1
columns[0].Data[1] - 1

columns[1].Data[0] - Test1
columns[1].Data[1] - Test2

This might be a little hard to understand, but essentially, the columns variable is a collection of properties that has an array of values. That array is every value in the table for that column. So I'm having a hard time trying to figure out how to match the data in each array position with the data in the same array position in a different column and still keep everything together.

Also, I'm unable to do the normal deserialize because I have properties in the parquet file that look weird like __$something, so I can't map those to a similarly named property. Any ideas?

0

There are 0 best solutions below