How do I get fastavro to support logical types?

1.1k Views Asked by At

I am trying to use a logical type in Avro, with the Python fastavro library to read and write, but the logicalType annotation seems to have no effect at all. The code below is taken from the fastavro page; I have altered the time field in the schema definition by annotating it with the logical type time-millis, in accordance with the current Avro specification. (On a side note, I have seen people use TIMESTAMP_MILLIS, but I do not know why, since the Avro page has time-millis.) When I run this code, the output I see in stdout is exactly the same as the output of the same code without the logical type annotation. I was expecting to see something that looked like a time--e.g., 13:14:15.1234. Yet the fastavro page cited above claims that fastavro now supports Avro logical types. How can I get it to do so? Thanks!

from fastavro import writer, reader, parse_schema

schema = {
    'doc': 'A weather reading.',
    'name': 'Weather',
    'namespace': 'test',
    'type': 'record',
    'fields': [
        {'name': 'station', 'type': 'string'},
        {'name': 'time', 'type': 'int', 'logicalType': 'time-millis'},
        {'name': 'temp', 'type': 'int'},
    ],
}
parsed_schema = parse_schema(schema)

# 'records' can be an iterable (including generator)
records = [
    {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
    {u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
    {u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
    {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
]

# Writing
with open('weather.avro', 'wb') as out:
    writer(out, parsed_schema, records)

# Reading
with open('weather.avro', 'rb') as fo:
    for record in reader(fo):
        print(record)

The output to stdout, whether the logicalType annotation is present or removed, is the same:

'station': '011990-99999', 'time': 1433269388, 'temp': 0}

{'station': '011990-99999', 'time': 1433270389, 'temp': 22}
{'station': '011990-99999', 'time': 1433273379, 'temp': -11}
{'station': '012650-99999', 'time': 1433275478, 'temp': 111}

I can see that the schemas in the output files are different between the two versions:

With logicalType specified:

"fields": [{"name": "station", "type": "string"}, {"logicalType": "time-millis", "name": "time", "type": "int"}, {"name": "temp", "type": "int"}]

Without logicalType specified:

"fields": [{"name": "station", "type": "string"}, {"name": "time", "type": "int"}, {"name": "temp", "type": "int"}]

But this makes no difference in the output.

1

There are 1 best solutions below

0
On

OK, the answer is that the type specification must be treated as, itself, a schema, so the syntax is different. In the above example, the schema should be defined as follows:

schema = { 'doc': 'A weather reading.', 'name': 'Weather', 'namespace': 'test', 'type': 'record', 'fields': [ {'name': 'station', 'type': 'string'}, {'name': 'time', 'type': {'type': 'int', 'logicalType': 'time-millis'}}, {'name': 'temp', 'type': 'int'}, ],