About the Apriori Algorithm

236 Views Asked by At

I'm trying to find the frequent itemsets of given datas. In this case it is a simple example about the number of cars, if people in a age are married or not.

The sets of 1-itemsets and 2- itemsets are the following:

---Freq. 1-itemsets---> 9 times!
[
(1) [age:[20,24]]
(2) [age:[25,29]]
(1) [age:[30,34]]
(1) [age:[35,39]]
(2) [married:[no]]
(3) [married:[yes]]
(1) [num_cars:[0,0]]
(2) [num_cars:[1,1]]
(2) [num_cars:[2,2]]
]

---Freq. 2-itemsets---> 14 times!
[
(1) [age:[20,24],married:[no]]
(1) [age:[20,24],num_cars:[1,1]]
(1) [age:[25,29],married:[no]]
(1) [age:[25,29],married:[yes]]
(1) [age:[25,29],num_cars:[0,0]]
(1) [age:[25,29],num_cars:[1,1]]
(1) [age:[30,34],married:[yes]]
(1) [age:[30,34],num_cars:[2,2]]
(1) [age:[35,39],married:[yes]]
(1) [age:[35,39],num_cars:[2,2]]
(1) [married:[no],num_cars:[0,0]]
(1) [married:[no],num_cars:[1,1]]
(1) [married:[yes],num_cars:[1,1]]
(2) [married:[yes],num_cars:[2,2]]
]

(Don't care about the number in the parenthesis, it's just the frequent of this itemset; Given min_support in this example is 0.1)

Now I want to get the freq. 3-itemsets from the freq. 2-itemsets. In this case, I can combine two freq. 2-itemsets, whos intersection has one element. Now I have to check if all subsets (with size of 2) of this combination are element in the freq. 2-itemsets.

If I do this, I get the following:

---Freq. 3-itemsets---> 6 times!
[
(1) [age:[20,24],married:[no],num_cars:[1,1]]
(1) [age:[25,29],married:[no],num_cars:[0,0]]
(0) [age:[25,29],married:[no],num_cars:[1,1]]
(1) [age:[25,29],married:[yes],num_cars:[1,1]]
(1) [age:[30,34],married:[yes],num_cars:[2,2]]
(1) [age:[35,39],married:[yes],num_cars:[2,2]]
]

But now, as you can see, I get one freq. 3-itemset, which has a frequent of 0. Thus it shouldn't be in the set of freq. 3-itemsets.

If I let compute this example from e.g. Weka (http://www.cs.waikato.ac.nz/ml/weka/), the said itemset doesn't appear in the results.

But I can generate it from combining {age:[25,29],married:[no]} and {married:[no],num_cars:[1,1]}.

So my question is:

Do I make a mistake in generating the frequent itemsets (that my program builds this one above) or do I have simply to filter the generated candidates first if the subsets are element of freq. 2-itemsets and after that if the frequent is more than 0 ???

I hope I could explain my problem clearly...

Thank you for your help!!

1

There are 1 best solutions below

0
On

For an itemset of cardinality k to be frequent, it is necessary but not sufficient for all of its (k-1)-element subsets to be frequent. After you generate candidate k-element itemsets by combining (k-1)-element itemsets and checking the subsets, you need to scan the data again and throw away the candidates that are not truly frequent.