Apriori Algorithm in Data Mining - How to resolve TyperError regarding the TransactionEncoder() in python?

985 Views Asked by At

I am trying to incorporate the apriori algorithm in a python program, but I have a TypeError for the line ‘te_ary = te.fit(dataset).transform(dataset)’. I believe it has something to do with the fact that I am reading my dataset from my computer, as opposed to manually typing it into jupyter notebook. I thought it might have dealt with my variables in the line where I declared ‘frequent_itemsets’, but the error is from line 3?

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori


from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

filename = '/Users/emitsch/Documents/Database 1.csv'

#loading the excel spreadsheet file with my database
dataset = pd.read_csv(filename, header = None)

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

And this is the error:

TypeError                                 Traceback (most recent call last)
<ipython-input-19-ff180148a5c5> in <module>
      1 te = TransactionEncoder()
----> 2 te_ary = te.fit(dataset).transform(dataset)
      3 df = pd.DataFrame(te_ary, columns=te.columns_)
      4 frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

    //anaconda3/lib/python3.7/site-packages/mlxtend/preprocessing/transactionencoder.py in fit(self, X)
         54         unique_items = set()
         55         for transaction in X:
    ---> 56             for item in transaction:
         57                 unique_items.add(item)
         58         self.columns_ = sorted(unique_items)

    TypeError: 'int' object is not iterable
1

There are 1 best solutions below

0
Sandipan Dey On

Here is a simple example with a tiny transactions dataset (which has 5 items with itemids 1 to 5, and 4 transactions):

df = pd.DataFrame([[1, 2, pd.NA, pd.NA], 
                   [1, 3, pd.NA, pd.NA], 
                   [2, 3, 4, 5],
                   [1, 4, 5, pd.NA]], columns=['item1','item2','item3','item4'])
df.head()
#    item1  item2   item3   item4
#0   1      2       <NA>    <NA>
#1   1      3       <NA>    <NA>
#2   2      3       4       5
#3   1      4       5       <NA>

TransactionEncoder accepts a list of list as dataset, so preprocess

dataset = [[item for item in row if item is not pd.NA] for row in df.values]
dataset
# [[1, 2], [1, 3], [2, 3, 4, 5], [1, 4, 5]]

Finally, fit a TransactionEncoder on the dataset and run apriori algorithm to compute frequent itemsets:

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
frequent_itemsets
#   support itemsets
#0  0.75    (1)
#1  0.75    (2)
#2  0.50    (3)
#3  0.50    (4)
#4  0.50    (5)
#5  0.50    (1, 2)
#6  0.50    (2, 3)
#7  0.50    (4, 5)