How can I write to delta from protobuf encoded events?

45 Views Asked by At

I have an Azure Event Hub - which offers a Kafka compliant interface - with protobuf encoded events on. I’d like to find an efficient way to continuously react to those events and write them to delta.

I can use databricks for this but it’s too costly for such a simple operation - I don’t need big data tooling.

I’ve also looked at Azure’s Stream Analytics but it has a relatively high cost still for such a simple operation.

I found this “highly efficient daemon” called Kafka Delta Ingest which would be perfect but only works with avro or json.

How can I write to delta without using costly big data tooling?

1

There are 1 best solutions below

2
bazza On

If you're receiving Protobuf encoded messages (events), you have the option of re-encoding them as JSON which you would then be able to pass on to Delta. The pattern would be:

  1. pull an event off azure event hub
  2. decode it using google protocol buffer parse method
  3. format / re-serialise the resultant object as JSON to a local string variable.
  4. push the string variable into Kafka-delta-ingest

The way in which you format the object as JSON might vary by language. It was certainly added to the C++ generated code as far back as version 3.11.2. Java has com.google.protobuf.util.JsonFormat. C# gets JsonFormatter, JsonParser, Go gets protojson.

I don't know if the JSON format is stable / standardised, i.e. will the JSON output by the C# JSON formater comply with the C++ JSON parser. I would hope that it is, but you may want to check. I'm slightly wary because of the use of terms like "format" instead of "serialise", as if it's intended for pretty output rather than a formal contract between sender and receiver.