I am trying to use a logical type in Avro, with the Python fastavro library to read and write, but the logicalType
annotation seems to have no effect at all. The code below is taken from the fastavro page; I have altered the time field in the schema definition by annotating it with the logical type time-millis, in accordance with the current Avro specification. (On a side note, I have seen people use TIMESTAMP_MILLIS, but I do not know why, since the Avro page has time-millis.) When I run this code, the output I see in stdout is exactly the same as the output of the same code without the logical type annotation. I was expecting to see something that looked like a time--e.g., 13:14:15.1234. Yet the fastavro page cited above claims that fastavro now supports Avro logical types. How can I get it to do so? Thanks!
from fastavro import writer, reader, parse_schema
schema = {
'doc': 'A weather reading.',
'name': 'Weather',
'namespace': 'test',
'type': 'record',
'fields': [
{'name': 'station', 'type': 'string'},
{'name': 'time', 'type': 'int', 'logicalType': 'time-millis'},
{'name': 'temp', 'type': 'int'},
],
}
parsed_schema = parse_schema(schema)
# 'records' can be an iterable (including generator)
records = [
{u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
{u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
{u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
{u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
]
# Writing
with open('weather.avro', 'wb') as out:
writer(out, parsed_schema, records)
# Reading
with open('weather.avro', 'rb') as fo:
for record in reader(fo):
print(record)
The output to stdout, whether the logicalType
annotation is present or removed, is the same:
'station': '011990-99999', 'time': 1433269388, 'temp': 0}
{'station': '011990-99999', 'time': 1433270389, 'temp': 22}
{'station': '011990-99999', 'time': 1433273379, 'temp': -11}
{'station': '012650-99999', 'time': 1433275478, 'temp': 111}
I can see that the schemas in the output files are different between the two versions:
With logicalType
specified:
"fields": [{"name": "station", "type": "string"}, {"logicalType": "time-millis", "name": "time", "type": "int"}, {"name": "temp", "type": "int"}]
Without logicalType
specified:
"fields": [{"name": "station", "type": "string"}, {"name": "time", "type": "int"}, {"name": "temp", "type": "int"}]
But this makes no difference in the output.
OK, the answer is that the type specification must be treated as, itself, a schema, so the syntax is different. In the above example, the schema should be defined as follows:
schema = { 'doc': 'A weather reading.', 'name': 'Weather', 'namespace': 'test', 'type': 'record', 'fields': [ {'name': 'station', 'type': 'string'}, {'name': 'time', 'type': {'type': 'int', 'logicalType': 'time-millis'}}, {'name': 'temp', 'type': 'int'}, ],