parallel writes different topics from single stream topic

341 Views Asked by At

I have a stream which gives messages map to two different map() call and further is filtered and written to two different topics.

KStream<String, byte[]>[] stream = builder.<String, byte[]>stream("source-topic");

stream.map(logic1OnData).filter(
                (key, value) -> {
                    if (key == null || value == null)
                        return false;
                    return value.data() != null;
                }).to("topic1", Produced.with(Serdes.String(), Serdes.String())

stream.map(logic2OnData).filter(
                (key, value) -> {
                    if (key == null || value == null)
                        return false;
                    return value.data() != null;
                }).to("topic2", Produced.with(Serdes.String(), Serdes.String())

Is there a way I can run stream.map(logc1OnData)... and stream.map(logic2OnData) parallel? Looks like they are running one after other i.e. the first map is executed and written to topic1 and then second map is executed and written to topic2 FYI.. I don't want num.threads.count as my stream input is from single topic and I am running multiple instances of the same application to read from source-topic topic to achieve parallelism while consuming.

What I am looking is parallelism while executing and writing to different topics

1

There are 1 best solutions below

3
On

What you are looking at is the order in which your operations are added to the topology. Once the topology is executed the recorder will flow through the otpology in the order they arrive but logic2OnData will not wait for logic1OnData to finish processing before it runs.

If you are concerned about performance you can look into stream threads if you want to get more parallelism.

EDIT: it seems I may have miss-understood the question.

A single sub-topology does not let you run each branch with parallelism. However you can use repartition() to make a logic2OnData into its own sub-topology and everything after the repartition() call will be able to run in parallel with everything before it.