The transaction numbers related with the frequent itemsets created are not kept after using the apriori method in mlxtend. They are dropped.
How can i keep the transaction numbers?
df = pd.read_csv('association_rule_items_fullmech.csv')
basket = (df.groupby(['transaction_doc', 'text'])['transaction_doc'].sum().unstack().reset_index().fillna(0).set_index('transaction_doc'))
# one-hot encoding
def encode_units(x):
if x <= 0:
return 0
elif x >= 1:
return 1
basket_sets = basket.applymap(encode_units)
baskets_sets dataframe essentially looks like this (this is just an mini arbitrary example but the same structure):
transaction_doc - text | "string1" | "string2" | "string3" |
---|---|---|---|
0 | 1 | 0 | 1 |
1 | 1 | 1 | 0 |
2 | 0 | 0 | 1 |
i then apply the apriori function
frequent_itemsets = apriori(basket_sets, min_support=0.001, use_colnames=True)
however after this apriori funcion, the transaction_doc, which is where the indicator of which document the text comes from, disappears from the the idx column. I get a reseted index column with the frequent itemsets. I want to be able to retain the transaction_doc column after the apriori function is applied.