Exception trying to read parquet data from azure blob storage (Using ChoETL)

279 Views Asked by At

I currently using the ChoETL library to read parquet data, this is the code:

BlobServiceClient blobServiceClient = new BlobServiceClient(azureStorage);
            BlobContainerClient container = blobServiceClient.GetBlobContainerClient(contenedor);
            var blobs = container.GetBlobs().Where(x => x.Name.Contains(".parquet"));


            try
            {
                foreach (var item in blobs)
                {
                    var blob = container.GetBlobClient(item.Name);
                    await blob.OpenReadAsync();
//Here i'm trying to read the parquet file, as is shown in the official documentation https://github.com/Cinchoo/ChoETL/wiki/QuickParquetLoad
                    foreach (dynamic e in new ChoParquetReader(outStream))
                    {
                        Console.WriteLine("Id: " + e.Id + " FormNumber: " + e.FormNumber);
                    }

                }
            }
            catch (Exception ex)
            {

                throw ex;
            }

Trying to executing it, throws an error in this line:

foreach (dynamic e in new ChoParquetReader(outStream))
                    {
                        Console.WriteLine("Id: " + e.Id + " FormNumber: " + e.FormNumber);
                    }

enter image description here

Is there any solution? I tried parquet.net but i don't like it

2

There are 2 best solutions below

1
On

I cannot find where outStream is defined in your code, but I think that is the problem. You need to use the Stream provided by blob.OpenReadAsync():

BlobServiceClient blobServiceClient = new BlobServiceClient(azureStorage);
BlobContainerClient container = blobServiceClient.GetBlobContainerClient(contenedor);
var blobs = container.GetBlobs().Where(x => x.Name.Contains(".parquet"));

try
{
    foreach (var item in blobs)
    {
        var blob = container.GetBlobClient(item.Name);
        using var stream = await blob.OpenReadAsync();
        //Here i'm trying to read the parquet file, as is shown in the official documentation https://github.com/Cinchoo/ChoETL/wiki/QuickParquetLoad
        foreach (dynamic e in new ChoParquetReader(stream))
        {
            Console.WriteLine("Id: " + e.Id + " FormNumber: " + e.FormNumber);
        }

    }
}
catch (Exception ex)
{

    throw ex;
}
1
On

I was also getting this error trying to read data from a Parquet file I was reading from an S3 bucket:

. System.MethodAccessException: 'Attempt by method 'ChoETL.ChoParquetReader`1.Create(System.IO.StreamReader)' to access method 'Parquet.ParquetReader..ctor(System.IO.Stream, Parquet.ParquetOptions, Boolean)' failed.'

I tracked it down to a referenced project having Parquet.Net in its dependencies, which seems to conflict with ChoETL. All good after removing any references to Parquet.Net in any projects in the solution.