Kafka wordcount not updating counts

1k Views Asked by At

I'm starting experimenting Kafka Streams. I follow https://kafka.apache.org/0110/documentation/streams/quickstart.

My sandbox is a box running Ubuntu 16.04.2 LTS, Kafka 0.11.0.0 and Scala 2.11.11.

As explained in the Kafka Streams Quickstart guide, here are the steps I followed :

echo -e "all streams lead to kafka\nhello kafka streams\njoin kafka summit" > file-input.txt

bin/kafka-topics.sh --create \
    --zookeeper localhost:2181 \
    --replication-factor 1 \
    --partitions 1 \
    --topic streams-file-input

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-file-input < file-input.txt

bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
    --topic streams-wordcount-output \
    --from-beginning \
    --formatter kafka.tools.DefaultMessageFormatter \
    --property print.key=true \
    --property print.value=true \
    --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
    --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer

When looking at the streams-wordcount-output by using the latter command, my standard output shows the following :

all 1
streams 1
lead    1
to  1
kafka   1
hello   1
kafka   2
streams 2
join    1
kafka   3
summit  1

Then, without interrupting the bin/kafka-console-consumer.sh command, I re-run the console producer as follow :

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic streams-file-input < file-input.txt

I'm surprised the standard output doesn't change to reflect the changes in troduced by this new addition. In my understanding, the file-input.txt was used to produce additional data, and so the word count should have refreshed (all tokens shall now be counted twice). What is wrong with my reasoning ?

1

There are 1 best solutions below

0
On BEST ANSWER

The word count demo is designed to stop after 5 seconds as seen in the source: https://github.com/apache/kafka/blob/0.11.0.0/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountDemo.java#L88

See the latest version of the above source for one that doesn't stop after 5 seconds, but only when you hit ctrl-c: https://github.com/apache/kafka/blob/0.11.0/streams/examples/src/main/java/org/apache/kafka/streams/examples/wordcount/WordCountDemo.java