Processing Event Hub Capture AVRO files with Azure Data Lake Analytics

1.3k Views Asked by At

I'm attempting to extract data from AVRO files produced by Event Hub Capture. In most cases this works flawlessly. But certain files are causing me problems. When I run the following U-SQL job, I get the error:

USE DATABASE Metrics;
USE SCHEMA dbo;

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
REFERENCE ASSEMBLY [Avro];
REFERENCE ASSEMBLY [log4net];

USING Microsoft.Analytics.Samples.Formats.ApacheAvro;
USING Microsoft.Analytics.Samples.Formats.Json;
USING System.Text;

//DECLARE @input string = "adl://mydatalakestore.azuredatalakestore.net/event-hub-capture/v3/{date:yyyy}/{date:MM}/{date:dd}/{date:HH}/{filename}";
DECLARE @input string = "adl://mydatalakestore.azuredatalakestore.net/event-hub-capture/v3/2018/01/16/19/rcpt-metrics-us-es-eh-metrics-v3-us-0-35-36.avro";


@eventHubArchiveRecords =
    EXTRACT Body byte[], 
            date DateTime, 
            filename System.String
    FROM @input
    USING new AvroExtractor(@"
        {
            ""type"":""record"",
            ""name"":""EventData"",
            ""namespace"":""Microsoft.ServiceBus.Messaging"",
            ""fields"":[
                {""name"":""SequenceNumber"",""type"":""long""},
                {""name"":""Offset"",""type"":""string""},
                {""name"":""EnqueuedTimeUtc"",""type"":""string""},
                {""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
                {""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}},
                {""name"":""Body"",""type"":[""null"",""bytes""]}
            ]
        }
    ");

@json =
    SELECT Encoding.UTF8.GetString(Body) AS json
    FROM @eventHubArchiveRecords;

OUTPUT @json
TO "/outputs/Avro/testjson.csv"
USING Outputters.Csv(outputHeader : true, quoting : true);

I get the following error:

Unhandled exception from user code: "The given key was not present in the dictionary."

An unhandled exception from user code has been reported when invoking the method 'Extract' on the user type 'Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor'

Am I correct in assuming the problem is within the AVRO file produced by Event Hub Capture, or is there something wrong with my code?

3

There are 3 best solutions below

0
Florian Mader On

The current implementation only supports primitive types, not complex types of the Avro specification at the moment.

0
Manu Cohen Yashar On

You have to build and use an extractor based on apache avro and not use the sample extractor provided by MS. We went the same path

0
Jim Lane On

The Key Not Present error is referring to the fields in your extract statement. It's not finding the data and filename fields. I removed those fields and your script runs correctly in my ADLA instance.