Per-user stream processing

273 Views Asked by Marco DallaG At 07 June 2025 at 13:05

I need to process data from a set of streams, applying the same elaboration to each stream independently from the other streams.

I've already seen frameworks like storm, but it appears that it allows the processing of static streams only (i.e. tweets form twitter), while I need to process data from each user separately.

A simple example of what I mean could be a system where each user can track his gps location and see statistics like average velocity, acceleration, burnt calories and so on in real time. Of course, each user would have his own stream(s) and the system should process the stream of each user separately, as if each user had its own dedicated topology processing his data.

Is there a way to achieve this with a framework like storm, spark streaming or samza?

It would be even better if python is supported, since I already have a lot of code I'd like to reuse.

Thank you very much for your help

Original Q&A

There are 3 best solutions below

Matthias J. Sax On 17 June 2015 at 09:35

Using Storm, you can group data using fields-grouping connection pattern if you have a user-id in your tuples. This ensures, that data is partitioned by user-id and thus you get logical substreams. Your code only needs to be able to process multiple groups/substreams, because a single bolt instance gets multiple groups for processing. But Storm supports your use case for sure. It also can run Python code.

Sajith Eshan On 02 May 2018 at 12:39

You can use WSO2 Stream Processor to achieve this. You can partition the input stream by user-name and process events pertain to each user separately. The processing logic has to be written in Siddhi QL which is a SQL like language.

WSO2 SP also has a python wrapper to, it will allow you do perform administrative tasks such as submitting, editing jobs. But you can't write processing logic using python code.

Jakob Homan On 14 July 2015 at 04:12

In Samza, similar to Storm, one would partition the individual streams on some user ID. This would guarantee that the same processor would see all the events for some particular user (as well as other user IDs that the partition function [a hash, for instance] assigns to that processor). Your description sounds like something that would more likely run on the client's system rather than being a server-side operation, however.

Non-JVM language support has been proposed for Samza, but not yet implemented.

Per-user stream processing

There are 3 best solutions below

Related Questions in APACHE-SPARK

Related Questions in STREAM

Related Questions in SPARK-STREAMING

Related Questions in APACHE-STORM

Related Questions in APACHE-SAMZA

Trending Questions

Popular # Hahtags

Popular Questions