How to find probability of subsequences obtained from sequences in the given dataset?

168 Views Asked by At

I have a dataset(CSV file) of sequence of links with their order placed status for each sequence. I have got the subsequences with their count with the help of prefixSpan algorithm(as described here). But I also want to find probability of each subsequences in leading to order placed =1. Suppose links are a ,b,c,d and their sequences and order status are as follows in data frame:

   Link sequences   Order status
    a,b,c,a,c,c                 0
    a,c,b,c                       1
    a,b,d,c,b,c                 1
    a,c,b,c                       0

Subsequences I get if I put minimum Support =4 with help of prefixSpan algorithm

    Subsequences            Support
     [a]                                    4
     [a,b]                                 4
     [a,b,c]                              4
     [a,c]                                 4
     [a,c,c]                              4
     [b]                                   4
     [b,c]                                4
     [c]                                   4
     [c,c]                                4

What changes should I make in prefixSpan algorithm code as mentioned in above link to get probability also as following :

Subsequence   Support     Prob
 [a]                          4             0.5
 [a,b]                       4             0.5
 [a,b,c]                    4             0.5
 [a,c]                       4             0.5
 [a,c,c]                    4             0.5
 [b]                          4             0.5
 [b,c]                       4             0.5
 [c]                          4             0.5
 [c,c]                       4             0.5

The procedure used to calculate probability of the subsequence is:

Add order placed status of all sequences where the subsequence is present and divide it by count of sequences where it is present eg :

P(subsequence [a,c,c]) =( 0+1+1+0)/4 = 0.5
0

There are 0 best solutions below