Spark Encoders.bean not working for generic class

35 Views Asked by At

I have a generic Java class

public class Person<T> {
    private String name;
    private List<T> attributes;
}

public class AttributeOne {
    // some fields
}

public class AttributeTwo {
    // some fields
}

and I want to convert from a Spark DataSet to a list of Java Person<T> objects. (each record of the input DataSet is converted to a Person<T> object).

I first tried passing the generic type "T" to the Encoder. Code is something like:

val ds = inputDS.map(<logic to convert input dataset to Person object>)(Encoders.bean(classOf[Person[T]]))

ds.write.json(...)

Code compiles and runs without error. But the output shows only the name field is successfully encoded, the generic attributes field is not. The output data looks like:

{"name": "name-1", "attributes": [{}, {}, {}]}
{"name": "name-2", "attributes": [{}, {}]}
...

This means the Encoder already recognized how many "attribute" elements because the attributes list is not empty, but each "attribute" element is empty {}.

I realized that Encoders needs to know the exact types at compile time to generate the serialization/deserialization code. A generic type parameter T is not enough information.

So I explicitly passed the exact type to the Encoder:

val ds = inputDS.map(<logic to convert input dataset to Person<AttributeOne> object>)(Encoders.bean(classOf[Person[AttributeOne]]))

ds.write.json(...)

However, result is exactly the same with passing generic type "T" - the name field is encoded but the generic attributes field is not (though the Encoder recognized how many "attribute" elements).

Any suggestions?

0

There are 0 best solutions below