storm topology: one to many (random)

440 Views Asked by At

I'm using the KafkaSpout spout to read from all (6) partitions on a kafka topic. The first bolt in the topology has to convert the byte stream into a struct (via IDL definition), lookup a value in a db and pass these values to a second bolt which writes it all into cassandra.

There are several issues occurring:

  1. Many fail(s) from the kafka spout.
  2. The first bolt reports "capacity" of > 2.0 from the storm ui.

I've tried to increase the parallelism but it appears that storm will only accept 1:1 from the kafkaspout to the first bolt. I'm guessing that #1 is a result of timeouts from the first bolt.

What I want to do: have the kafkaspouts (limited to 1 / kafka partition) able to send their bits to a random first bolt so that I can run many more of these than the # of spouts. The first and second bolts would be 1:1 but the spout to first bolt should be 1:many.

Currently I'm using the LocalOrShuffleGrouping to connect between spout->bolt->bolt.


Edit:

(Re)reading the storms docs I see this passage:

Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.

Yet when I look at the load on the executors for my first bolt I see everything concentrated on 6 of them - seemingly ignoring the other 24.

I'm missing some large clue here.

0

There are 0 best solutions below