Here's my stream analytics topology
EventHubSource => Job A (HoppingWindow every second) => EventHubA
EventHubSource => Job B (HoppingWindow every second) => EventHubB
- Each job has a different consumer group in EventHubSource.
- Each job is embarrassingly parallel and consumes only 14% SU resources.
When testing the JobA and JobC, the difference between the windowEnd and the original Event Time is just some few millisecond (~300), which is ok (latency from my producer + eventhub + stream analytics processing time).
But when I join both streams in a new Job C like this:
EventHubA
\
=> Job C (Join Datediff = 0 and timestamp by windowEnd)
/
EventHubB
This produces some output, but the problems comes here:
The real events are multiple minutes apart even if they were pushed at the same time by Job A and B (same windowEnd)
When I inspect the data coming out from EventHub A and B, the difference between the windowEnd and the real event timestamp ranges between 39 and 44 minutes, for all of them. But when testing like mentionned above, it was only 300ms.
The worst part here is that when I run it in prod, it only emits some dozen events and stops, even if the input count is still in the thousands.
It's been weeks I'm working on this and everytime I'm dealing with some cryptic behavior from ASA, my topology is quite simple and I'm only using simple hopping windows of 1s hop, this shouldn't take weeks of tweaking and trial errors without even understanding what's happening.
For people who used ASA and AWS Kinesis analytics, did you find Kinesis analytics simpler to work with ? What annoys me here in ASA is the unpredictable behavior and issues without error messages (I activated log analytics and no error was there...)
Sorry to hear you encountered some issues with ASA. I see you have a 1 second hopping windows, but what is the total size of the windows and what is your approximate throughput?
Regarding the delay: Looking are your question, I think your ASA job may not have enough CPU resources, and then the event processing is delayed. Unfortunately this is not visible in the current SU% metric, but we plan to show metrics for both CPU and memory in the future. To confirm this is the root cause, you can check the number of backlogged events in the job diagram. If there are lot of events backlogged, you may need to increase the number of SUs for this job.
You also mentioned the job stops after a dozen output, do you see an error message in the logs?