I need to process data from a set of streams, applying the same elaboration to each stream independently from the other streams.
I've already seen frameworks like storm, but it appears that it allows the processing of static streams only (i.e. tweets form twitter), while I need to process data from each user separately.
A simple example of what I mean could be a system where each user can track his gps location and see statistics like average velocity, acceleration, burnt calories and so on in real time. Of course, each user would have his own stream(s) and the system should process the stream of each user separately, as if each user had its own dedicated topology processing his data.
Is there a way to achieve this with a framework like storm, spark streaming or samza?
It would be even better if python is supported, since I already have a lot of code I'd like to reuse.
Thank you very much for your help
Using Storm, you can group data using fields-grouping connection pattern if you have a user-id in your tuples. This ensures, that data is partitioned by user-id and thus you get logical substreams. Your code only needs to be able to process multiple groups/substreams, because a single bolt instance gets multiple groups for processing. But Storm supports your use case for sure. It also can run Python code.