Maximum spout capacity

262 Views Asked by At

I'm using Heron for performing streaming analytics on IoT data. Currently in the architecture there is only one spout with parallelism factor 1.

I'm trying to benchmark the stats on the amount of data Heron can hold in the queue which it internally uses at spout.

I'm playing around with the method setMaxSpoutPending() by passing value to it. I want to know if there is any limit on the number which we pass to this method?

Can we tweak the parameter method by increasing system configuration or providing more resource to the topology?

2

There are 2 best solutions below

2
On BEST ANSWER

So if you have one spout and one bolt, then max spout pending is the best way to control the number of pending tuples. Max Spout pending can be increased indefinitely. However increasing it beyond a certain amount increases the probability of timeout errors happening and in the worst case there could be no forward progress. Also higher msp typically require more heap required for spout and other components of the topology.

0
On

MSP is used to control the topology ingestion rate; it tells Storm the maximum number of tuples that may be unacknowledged at any given time. If the MSP is lower than the parallelism of the topology, it can be a bottle neck. On the other hand, increasing MSP beyond the topology parallelism level can lead to the topology being 'flooded' and unable to keep up with the inbound tuples. In such a situation the 'message timeout' of the topology will be exceeded and Storm will attempt to replay them while still feeding new tuples. Storm will stop feeding new inbound tuples only when the MSP limit is reached.

So yes, you can tweak it but keep an eye out for increasing timed out tuples indicating that your topology is overwhelmed.

BTW, if you're processing IoT events you may be able to increase parallelism by grouping the spout tuples by the device id (tuple stream per device) using field grouping.