Creating a file as a stream and uploading to Azure

1.4k Views Asked by At

I am using the ChoETL and ChoETL.Parquet library to create a parquet file based on some other data. I can create the file just fine locally.

  using (ChoParquetWriter parser = new ChoParquetWriter($"..\\..\\..\\parquet_files\\{club}_events.parquet"))
       {
           parser.Write(events);
       }

In this code snippet, events is a list of objects containing strings. They will be converted to parquet data.

So far I have written the code to upload to Azure, but it needs a local file as input.

BlobServiceClient BlobServiceClient = new BlobServiceClient("REDACTED");
var containerClient = BlobServiceClient.GetBlobContainerClient("base-test");
BlobClient blobClient = containerClient.GetBlobClient($"Base/{RequestTime.Year}/{RequestTime.Month}/{RequestTime.Day}/{RequestTime.Hour}/{RequestTime.Minute}/events.parquet");
using FileStream uploadFileStream = File.OpenRead("..\\..\\..\\events.parquet"); 
await blobClient.UploadAsync(uploadFileStream, true);
uploadFileStream.Close();

I need it to be created in memory then uploaded to Azure blob storage. How can I do this? For clarification: I would need the parquet file to be uploaded.

1

There are 1 best solutions below

0
On BEST ANSWER

Regarding the issue, you can use the method BlockBlobClient.OpenWriteAsync to get a stream and provide the stream for ChoParquetWriter. Then the writer will directly write things to Azure blob.

For example

  List<EmployeeRecSimple> objs = new List<EmployeeRecSimple>();

            EmployeeRecSimple rec1 = new EmployeeRecSimple();
            rec1.Id = 1;
            rec1.Name = "Mark";
            objs.Add(rec1);

            EmployeeRecSimple rec2 = new EmployeeRecSimple();
            rec2.Id = 2;
            rec2.Name = "Jason";
            objs.Add(rec2);

            BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
            var desContainer = blobServiceClient.GetBlobContainerClient("output");
            var desBlob= desContainer.GetBlockBlobClient("my.parquet");
            var options = new BlockBlobOpenWriteOptions {
                HttpHeaders = new BlobHttpHeaders {
                    ContentType = MimeMapping.GetMimeMapping("parquet"),
                },
                // progress updates about data transfers
                ProgressHandler = new Progress<long> (
                    progress => Console.WriteLine("Progress: {0} bytes written", progress))
                    
                
            };

            using (var outStream = await desBlob.OpenWriteAsync(true, options).ConfigureAwait(false))
            using (ChoParquetWriter parser = new ChoParquetWriter(outStream)) {

                parser.Write(objs);
            }

public partial class EmployeeRecSimple
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }

enter image description here