Changing schema of avro file when writing to it in append mode

400 Views Asked by At

I'm looking for a way to modify the schema of an avro file in python. Taking the following example, using the fastavro package, first write out some initial records, with corresponding schema:

from fastavro import writer, parse_schema

schema = {
    'name': 'test',
    'type': 'record',
    'fields': [
        {'name': 'id', 'type': 'int'},
        {'name': 'val', 'type': 'long'},
    ],
}
records = [
    {u'id': 1, u'val': 0.2},
    {u'id': 2, u'val': 3.1},
]
with open('test.avro', 'wb') as f:
    writer(f, parse_schema(schema), records)

Uhoh, I've got some more records, but they contain None values. I'd like to append these records to the avro file, and modify my schema accordingly:

more_records = [
    {u'id': 3, u'val': 1.5},
    {u'id': 2, u'val': None},
]
schema['fields'][1]['type'] = ['long', 'null']

with open('test.avro', 'a+b') as f:
    writer(f, parse_schema(schema), more_records)

Instead of overwriting the schema, this results in an error:

ValueError: Provided schema {'type': 'record', 'name': 'test', 'fields': [{'name': 'id', 'type': 'int'}, {'name': 'val', 'type': ['long', 'null']}], '__fastavro_parsed': True, '__named_schemas': {'test': {'type': 'record', 'name': 'test', 'fields': [{'name': 'id', 'type': 'int'}, {'name': 'val', 'type': ['long', 'null']}]}}} does not match file writer_schema {'type': 'record', 'name': 'test', 'fields': [{'name': 'id', 'type': 'int'}, {'name': 'val', 'type': 'long'}], '__fastavro_parsed': True, '__named_schemas': {'test': {'type': 'record', 'name': 'test', 'fields': [{'name': 'id', 'type': 'int'}, {'name': 'val', 'type': 'long'}]}}}

Is there a workaround for this? The fastavro docs for this suggest it's not possible, but I'm hoping someone knows of a way!

Cheers

1

There are 1 best solutions below

0
On

The append API in fastavro does not currently support this. You could open an issue in that repository and discuss if something like this makes sense.