Data Conversion for a field using AVRO

607 Views Asked by At

I am new to AVRO. We have started using AVRO schema to read data.

Now we have a use case where I need to truncate the data while reading.

Suppose my avro schcema is like this

{
    "name": "table",
    "namepsace": "csd",
    "type": "record",
    "fields": [
        {"name": "CustId", "type":"string"},
        {"name": "ProductId", "type":"string"},
        {"time": "time", "type":"long"}
     ]
}

Now the data is like this.

{
    "CustId" : "abc1234"
    "ProductID" : "ABC1234567"
    "time" : 123456789
}

When I read the data I want to truncate the field ProductID. In the above example when I read ProductID which is ABC1234567 I want to truncate it to 5 characters ABC12

Is there any thing I can specify in the schema to truncate it?

1

There are 1 best solutions below

0
On

This is a possible start. The SpecificDatumReader contains the following conversion logic. It depends upon your generated class to override the conversion method. The Schema compiler would need to have hooks to inject the conversion object. I've been looking for the hook.

@Override
protected void readField(Object r, Schema.Field f, Object oldDatum,
                       ResolvingDecoder in, Object state)
  throws IOException {
if (r instanceof SpecificRecordBase) {
  Conversion<?> conversion = ((SpecificRecordBase)).getConversion(f.pos());

  Object datum;
  if (conversion != null) {
    datum = readWithConversion(
        oldDatum, f.schema(), f.schema().getLogicalType(), conversion, in);
  } else {
    datum = readWithoutConversion(oldDatum, f.schema(), in);
  }