Is there a straightforward way to dynamically re-format a Kafka topic from JSON Schema to AVRO

19 Views Asked by At

Due to framework restrictions certain Producers can only produce data encoded with JSON Schema. However, AVRO is a standard for our platform and I want to be able to provide all topics (also) as AVRO.

First of all, I know JSON Schema and AVRO schema do not offer the same range of semantics, e.g. they behave differently regarding compatibility, but also regarding data types. I am aware of that and that there might not be a full-fledged solution to this, but maybe I'm overlooking a possibility to cover what is relevant for us (as we are also by far not using the full range of those schema languages).

Secondly, the re-formatting has to work dynamically and without managing data models ourselves. It needs to allow for adding (optional) fields to the source schema which will be added them to the target schema.

I know ksqlDB can do this "easily" with a CSAS statement, however there are some limitations for productive, operative use cases. The ksqlDB statement (even with SELECT *) would have to be re-run everytime a change happens in the source to "pull" the new fields, it does not automatically evolve with the source schema. There is a possibility that the evolution cannot be performed on ksqlDB level, as only fields added in the end of a projection are allowed - yet on Schema Registry level other evolution is/was allowed.

Another thought was to utilize Kafka Streams, consuming with KafkaJsonSchemaSerde and producing with GenericAvroSerde. I would ideally cache the schema ID and whenever a change in JSON Schema on Consumer side is detected (new schema ID for source), I would programatically check for the changes and based on this, trigger a schema evolution for the target AVRO schema, before continuing to write the data with the new, derived schema to the AVRO topic. This would as well only possible within set boundaries of what kind of evolution is allowed at source level, but that is more likely enforcable, since less strict.

Is there a more straightforward way to approach this?

0

There are 0 best solutions below