How to create a dynamic schema for parquet from a SQL Server dynamic query of tables?

852 Views Asked by At

This question is about Parquet.Net:

https://github.com/aloneguid/parquet-dotnet

I am trying to use Parquet.Net to build a parquet file from a couple of database tables that I have joined together. The table columns can be dynamic, so I can create a dynamic query using

// The @TableName can be read in from my list of tables and a loop
// can then get all the names in a string of columnNames
// (e.g. TableId, ColumnName1, ... , ColumnNameN
SELECT name
FROM sys.columns
WHERE object_id = OBJECT_ID(@TableName)
  AND name <> N'ID';

Once I have the column names in a string, I am stuck on how to dynamically create a schema for Parquet.Net to write to, as the examples below has it predefined:

//create data columns with schema metadata and the data you need
var idColumn = new DataColumn(
    new DataField<int>("id"),
    new int[] { 1, 2 });

var cityColumn = new DataColumn(
    new DataField<string>("city"),
    new string[] { "London", "Derby" });

// create file schema
var schema = new Schema(idColumn.Field, cityColumn.Field);

using (Stream fileStream = System.IO.File.OpenWrite("c:\\test.parquet"))
{
    using (var parquetWriter = new ParquetWriter(schema, fileStream))
    {
            // create a new row group in the file
            using (ParquetRowGroupWriter groupWriter = parquetWriter.CreateRowGroup())
            {
                groupWriter.WriteColumn(idColumn);
                groupWriter.WriteColumn(cityColumn);
            }
    }
}

Is this possible to do in .NET, perhaps using a reflection class or something else?

0

There are 0 best solutions below