Algorithm/Programming for processing

58 Views Asked by At

I am using spark streaming (coding in Java), and I want to understand how I can make the algorithm for the following problem. I am relatively new at map-reduce and need some assistance in designing the algorithm.

Here is the problem in details.

Problem Details

Input:

  1. Initial input of my text is:

(Pattern),(timestamp, message)

(3)(12/5/2014 01:00:01, message)

where 3 is a pattern type

  1. I have converted this into DStream with Key=P1,P2, where P1 and P2 are some pattern classes for the input line, and Value = Pattern, input, timestamp. Hence each tuple of the DStream is as follows:

Template: (P1,P2),(pattern_id, timestamp, string)

Example: (3,4),(3, 12/5/2014 01:00:01, message)

Here 3 and 4 are pattern types which are paired together.

  1. I have a model where each pair has a time difference associated with it. For example, the model is in HashMap with the key-values as :

Template: (P1,P2)(Time Difference)

Example: (3,4)(2:20)

where the time difference in the model is 2:20, and if we have two messages of pattern of type 3 and 4 respectively in the stream, the program should output an anomaly if the difference between the two messages is greater than 2:20.

What would be the best way to model this in spark streaming?

What I have tried so far

  1. I created a DStream as shown in step 2 from step 1
  2. Created a broadcast variable for sending the map learnt in the model (step 3 above) to all the workers
  3. I am stuck at trying to figure out the algorithm on how to generate anomaly stream in spark streaming. Cannot figure out how to make this an associative function to do a stream operation

Here is the current code: https://gist.github.com/nipunarora/22c8e336063a2a1cc4a9

0

There are 0 best solutions below