StreamInsight Performance Issue

752 Views Asked by At

I'm using StreamInsight 2.1 and running into unexpected performance problems.

I have one input adapter of Financial Data coming in with anywhere from 5,000 to 10,000 events per second. I then have a large number of queries operating against that input. Each query hooks up to the exact same passthrough query, so I have 1000 queries using the exact same input data.

To test that the system would be able to handle this, I created 1000 queries that did nothing but passthrough (from d in fullStream select d) the events to an output adapter which only Releases the event.

When I run 1,000 queries this way, the system cannot keep up with the stream. It falls farther and farther behind. If I trim it to 100 queries, the system keeps up perfectly.

Have I simply run into the performance wall with StreamInsight? Is it not able to handle the type of solution I am building? Or am I doing something stupid here.... Any help would be great, not sure what else to try to make it faster. I need it to be able to execute way more than 1000 queries and I need to run way more complicated queries than this.

3

There are 3 best solutions below

2
On

I think you maybe having performance issues because of your current approach.

First off, let's cover the differences between the editions of StreamInsight. Standard edition has only 1 scheduler thread while Premium has one per core. The Evaluation edition is equivalent to Premium.

I think the way to fix this is to reduce the number of queries you have. If you are creating 1000 queries (each with their own instance of an output adapter) I can see where you are going to have issues. On a quad-core machine, you are going to have 4 scheduler threads trying to run 1000 queries.

Are your queries that are arranged "horizontally" doing the same thing? If so, see if you can consolidate them. For instance, if I needed to do a query like the "Price>5 Vol<2k" for 5 different stocks, I would write it in such a way that I can handle all 5 stocks in a standing query that sends all the results to 1 output adapter. If a client is "subscribing" to results from a query, that's something that can/should be handled by your output adapter. You could also turn results on and off for certain stocks by streaming in reference data.

Take a look at the sample below. "sourceStream" is going to be my raw stock data coming from the data source. "referenceStream" is going to be some configuration streamed in from a reference data source (i.e. SQL). The success or failure of the join will throttle the events that get passed on for further processing.

var myPrice5Vol2kSourceStream = from s in sourceStream
join r in referenceStream
on s.StockSymbol equals r.StockSymbol
select s;
0
On

This does sound like a scale-out problem. You have established that you can run 100 queries on your server without any problem. Then, in your comments to other answers you are talking about tens of thousands of customers adding thousands of queries. With that many customer, I suspect that you will be able to afford to add new servers to meet the demand of these masses of customers.

So increase the throughput by spreading the load, through - I don't know - some form of distributed computing perhaps?

1
On

Each query needs a thread to execute. You have 1000 queries. So you need how many threads? Right. Actually, StreamInsight will use the thread pool to limit the number of threads created. So ... you'll have a limited number of threads to execute your queries. You'll wind up spending more time doing context switches than actually executing your queries.

I don't understand why you even need 1000 queries. We've built apps that take 100's of sensors in from multiple sources and analyze them together ... and gotten over 100K events/sec. At the end of it, it's poor design of your app, not poor performance on StreamInsight's part, that is causing the issue.

You really need to take some time and rethink how you are going about this. No matter how you slice it, your current approach is going to cause you issues. And ... consider this ... is each input adapter creating its own thread to listen to the inbound and enqueue events? How many threads do you think that adds to the mix?