Apriori Algorithm for text

2.3k Views Asked by At

I have taken a data mining course and we have to run an apriori algorithm on a data set with text , ie strings.

['acornSquash', 'cottageCheese', 'laundryDetergent', 'oatmeal', 'onions', 'pizza', 'tomatoes', 'yogurt']
['bread', 'cinnamon', 'grapefruit', 'juiceBoxes', 'mayo', 'pastaSauce', 'pepper', 'waterBottles', 'yogurt']

Can i get any code or help to run the apriori algorithm?

Thanks in advance

1

There are 1 best solutions below

0
On

Below link contains source code for basic apriori implementation.

https://github.com/ak94/Apriori/

Go through readme file.

By basic implementation I mean to say , it do not implement any efficient algorithm like Hash-based technique , partitioning technique , sampling , transaction reduction or dynamic itemset counting.

The code scans the whole dataset every time.But it is memory efficient as it always read input from file rather than storing in memory.

As you are currently on this course I assume this code will be the first you will like to write own your own.

To read more about apriori algorithm I would recommend you to read http://www3.cs.stonybrook.edu/~cse634/lecture_notes/07apriori.pdf

Read , understand and try to implement on your own.

Now,lets talk about how to implement . When you go through the code from link I posted , It implements on numbers i.e., its input file contains itemset as number instead of text (as in your case)

what you can simply do is , write a program to map each text with a particular number.

For e.g.

Suppose your data set contained

[ 'oatmeal', 'onions', 'pizza', 'tomatoes', 'yogurt']
[ 'tomatoes', 'pepper', 'waterBottles', 'yogurt']

So it would look like

1 2 3 4 5 -1

4 6 7 5 -1

(-1 to represent end of particular transaction ,as in code )

then you use this input file for your code (either same as in link or your own in different language)

and after you get the frequent item set after execution of program , you can transform it back using map you used earlier.