I am trying to generate stream data, to simulate a situation where I receive two values, Integer type, in a different time range, with timestamps, and Kafka as connector.
I am using Flink environment as a consumer, but I don't know which is the best solution for the producer. (Java syntax better than Scala if possible)
Should I produce the data direct from Kafka? If yes, what is the best way to do it? Or maybe is better if I produce the data from Flink as a producer, send it to Kafka and consume it at the end by Flink again? How can I do that from flink? Or perhaps there is another easy way to generate stream data and pass it to Kafka.
If yes, please put me on the track to achieve it.
As David also mentioned, you can create a dummy producer in simple Java using KafkaProducer APIs to schedule and send messages to Kafka as per you wish. Similarly you can do that with Flink if you want multiple simultaneous producers. With Flink you will need to write a separate job for producer and consumer. Kafka basically enables an ASync processing architecture so it does not have queue mechanisms. So better to keep producer and consumer jobs separate.
But think a little bit more about the intention of this test:
In this case, you need simultaneous producers for same topic, with null or non-null key in the message.
In this case, you need only one producer, few internal scenarios could be back pressure test by making producer push more messages than consumer can handle.
In this case, single or multiple producers but key of message should be non-null, you can test how Flink executors are connecting with individual partitions and observe their behavior.
There are more ideas you may want to test and each of these will need something specific to be done in producer or not to be done.
You can check out
https://github.com/abhisheknegi/twitStream
for pulling tweets using Java APIs in case needed.