Find the word having the maximum count using a Trident topology

46 Views Asked by At

How can I find the word which has the maximum count in a word count topology using Trident topology? Here's the link to the Trident word count topology. https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/trident/TridentWordCount.java

1

There are 1 best solutions below

0
DP63 On

Trident API provides max & maxBy operations that return maximum value on each partition of a batch of tuples in a trident stream.

So after calculating the count of each word like below:

Stream wordCountsStream = topology.newStream("spout1", spout).parallelismHint(16).each(new Fields("sentence"),
        new Split(), new Fields("word")).groupBy(new Fields("word")).persistentAggregate(new MemoryMapState.Factory(),
        new Count(), new Fields("count")).parallelismHint(16).newValuesStream();

Use maxBy to get the word having the maximum count:

 wordCountsStream.maxBy(new Fields("count"))