I am looking for a way to refactor this code and make it cleaner. I am certain that there is a way but haven't been able to figure it out.
I am working with avro4s, I need to enable serialisation for Kafka and I have two criteria:
- provide serializers for different case classes (say UserCreated and UserDeleted)
- for each case class, provide both binary and json serialisation
The solution I came up with is creating two serialisers that take the parent trait as input, and in each of them I first pattern match on the input and then choose a binary or json serialization.
In the code below, the key issues are those AvroOutputStream.binary[T] or AvroOutputStream.json[T]. I can't find a way to reduce the code duplication.
trait KafkaMessage
case class UserDeleted(name: String, age: Int) extends KafkaMessage
case class UserCreated(name: String, age: Int) extends KafkaMessage
class KafkaMessageBinarySerializer extends Serializer[KafkaMessage] with Serializable {
override def serialize(topic: String, data: KafkaMessage): Array[Byte] = {
val byteStream = new ByteArrayOutputStream()
// TODO: optimise this
val output = data match {
case e: UserCreated =>
val output = AvroOutputStream.binary[UserCreated].to(byteStream).build()
output.write(e)
output
case e: UserDeleted =>
val output = AvroOutputStream.binary[UserDeleted].to(byteStream).build()
output.write(e)
output
}
output.close()
byteStream.toByteArray
}
}
class KafkaMessageJsonSerializer extends Serializer[KafkaMessage] with Serializable {
override def serialize(topic: String, data: KafkaMessage): Array[Byte] = {
val byteStream = new ByteArrayOutputStream()
// TODO: optimise this
val output = data match {
case e: UserCreated =>
val output = AvroOutputStream.json[UserCreated].to(byteStream).build()
output.write(e)
output
case e: UserDeleted =>
val output = AvroOutputStream.json[UserDeleted].to(byteStream).build()
output.write(e)
output
}
output.close()
byteStream.toByteArray
}
}
The problem really isn't much with serialization or Kafka, as much as with types and pattern matching.
My solution screams to be improved, but my fiddling around didn't produce anything nice I would love to get some advice on how to improve this.
Do you mean something like this:
This logic would let you create an lot of different serializers only having to implement dispatch in each of them. Since
Dispatched
use path-dependent types there is no issue with making sure that Builder matches the value you want to put inside.To reduce it further, you can look inside and see what these .json and .binary method do:
They basically pass 2 implicit parameters and hardcode 1 flag. So you could do:
This would let you share code even further, but at the expense of even more common code. So whether it makes sense or not depends on how many of these codecs you'd have to implement. If you needed to write a lot of them, and each of them would have a long list of subtypes to dispatch to, then something like this:
could make sense.
But if there is only few of them then I'd leave the code as is.