storm + kafka: understanding ack, fail and latency

929 Views Asked by At

I'm using the KafkaSpout to consume from 2 Kafka topics each of which has 6 partitions. The spout goes to a single bolt to unpack the relevant bytes and then to a second bolt for further processing.

When I look at the storm-ui the numbers aren't making much sense and I'm hoping someone can shed some light.

  1. The Kafka spout says it 'acked' ~3600 tuples and failed ~73M. Looking at the bolts in the next group I see that some have acked ~73M with 0 failed while other have acked ~1.3M (no fails). Shouldn't these numbers line up somehow?

  2. The 'complete latency' in the row for the spout is ~2500ms while the execute and process latency for the bolts ranges from <1ms to ~50ms. Again - what's the correspondence?

Yes - this topology has some major issues (see this related question).

I'm attaching an image of the UI in hopes of someone helping me understand it.

enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

It sounds like your tuples are timing out and the spout is being notified. The zero fails reported for the bolts means the bolts didn't explicitly fail any tuples, which is why I think they must have timed out.

Per your other question, because only some bolts are getting all the work, you're getting severe backlogs which are the likely cause of the tuple timeouts.

Per the comment in your other question, you can change to shuffle grouping to spread the workload to all bolts and you could probably increase the timeout setting in your config.