I am able to use the avro-tools-1.7.7.jar to take json data and avro schema and output a binary Avro file as shown here https://github.com/miguno/avro-cli-examples#json-to-avro. However, I want to be able to do this programmatically using the Avro python api: https://avro.apache.org/docs/1.7.7/gettingstartedpython.html.
In their example they show how you can write a record at a time into a binary avro file.
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
schema = avro.schema.parse(open("user.avsc").read())
writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()
My use case is writing all of the records at once like the avro-tools jar does from a json file, just in python code. I do not want to shell out and execute the jar. This will be deployed to Google App Engine if that matters.
This can be accomplished with
fastavro
. For example, given the schema in the link:twitter.avsc
And the json file:
twitter.json
You can use something like the following script to write out an avro file: