Why isn't my FP growth code returning the right set of prefix paths?

186 Views Asked by At

I'm working on implementing the FP growth algorithm, and currently I can get an FP tree set up from a set of transactions. The next step is mining the prefix paths and building trees from them. Here's my Node class:

class Node:
    def __init__(self, name, count, parent):
        self.name = name
        self.count = count
        self.parent = parent
        self.children = {}

There are more functions inside the class for creating the tree, but those aren't really relevant for my current problem (I think). Each node has a parent, and some nodes have children, and the tree is stored as just the object representing the root node. The children of each node are stored in a dictionary associated with the node whose keys are the items' names and values are the node objects.

I have the following function that works on single node instances for finding the prefix paths (basically the parent, then parent of the parent, and so on until the root node):

def get_prefix(node, path=[], level=0):
    if node.name not in path:
        path.append(node.name)
    # check if parent node is root node
    if node.parent.parent != None:
        path = get_prefix(node.parent, path, level=level+1)[0]
    if level == 0:
        path.remove(node.name)
    return [path, node.count, node.name]

This works for the following type of input:

get_prefix(root_node.children['2'].children['3'].children['1'].children['5'])

which returns

[['1', '3', '2'], 1, '5']

I guess I should note that I'm using this data set, from Data Mining: Concepts and Techniques by Han et al.:

transactions = [['1', '2', '5'],
                ['2', '4'],
                ['2', '3'],
                ['1', '2', '4'],
                ['1', '3'],
                ['2', '3'],
                ['1', '3'],
                ['1', '2', '3', '5'],
                ['1', '2', '3']]

min_support = 2

So far, all is well. But when I try to find all prefix paths with length greater than 1 in the whole tree, something always goes wrong. What I'm trying to do is write a function that takes in the root node and returns either a dictionary or list of all prefix paths for all nodes in the tree. I've attempted this a couple of times, in a couple of ways:

def get_all_prefixes(root, prefixes={}):
    for child in list(root.children.values()):
        print('On child node ' + child.name)
        # get prefix path info for current node
        prefix_list = get_prefix(child)
        # if item from node already in prefixes keys, then
        # append prefix path results to value
        if len(prefix_list[0]) > 1:
            if prefix_list[2] in prefixes.keys():
                print('Appending ' + str(prefix_list[0]) + ' for ' + str(child.name))
                prefixes[prefix_list[2]].append([prefix_list[0], prefix_list[1]])
            # otherwise create new key in prefixes
            else:
                print('Inititalizing ' + str(prefix_list[0]) + ' for ' + str(child.name))
                prefixes[prefix_list[2]] = [[prefix_list[0], prefix_list[1]]]
        print('Prefix path before recursion:')
        print(prefixes)
        prefixes = get_all_prefixes(child, prefixes)
        print('Prefix path after recursion:')
        print(prefixes)
#         get_all_prefixes(child, prefixes)
    
    return prefixes

This returns:

{'1': [[['2', '3'], 2], [['2', '3'], 2], [['2', '3'], 2]],
 '5': [[['2', '3'], 1], [['2', '3'], 1]],
 '4': [[['2', '3'], 1], [['2', '3'], 1]],
 '3': [[['2', '3'], 4], [['2', '3'], 2]]}

And another attempt:

def get_prefix_list(root, prefixes=[]):
    cur_prefix = get_prefix(root)
    prefixes.append([cur_prefix, root.name, root.count])
    for child in root.children.values():
#         prefixes.append(get_prefix_list(child)[0])
        get_prefix_list(child)
    return prefixes

which returns

[[[['2', '3'], 0, 'root'], 'root', 0],
 [[['2', '3'], 7, '2'], '2', 7],
 [[['2', '3'], 2, '1'], '1', 2],
 [[['2', '3'], 1, '5'], '5', 1],
 [[['2', '3'], 1, '4'], '4', 1],
 [[['2', '3'], 1, '4'], '4', 1],
 [[['2', '3'], 4, '3'], '3', 4],
 [[['2', '3'], 2, '1'], '1', 2],
 [[['2', '3'], 1, '5'], '5', 1],
 [[['2', '3'], 2, '3'], '3', 2],
 [[['2', '3'], 2, '1'], '1', 2]]

I'm not sure what I'm doing wrong, and I'm at the point where no matter how long I spend thinking about this I just end up doing the same things in a slightly different (and still wrong) way.

The recursive calls could very well be the problem, it's been a while since I've had to use recursion. What's the right way to go about traversing this tree and collecting prefix paths as I go?

EDIT: To clarify, FP growth is frequent pattern growth, as explained here: https://scholar.google.com/scholar_url?url=https://web.iitd.ac.in/~bspanda/fptree.pdf&hl=en&sa=X&ei=Wt1aYbjZHdWR6rQPoZmruAk&scisig=AAGBfm2FZzJEsEwGApOtUPol2gdVBtQ38Q&oi=scholarr

PARTIAL SOLUTION: I implemented the following bit of code:

def create_path(node, path):
    if node.parent != None:
        path.append(node.name)
        create_path(node.parent, path)

def get_prefix_list(root, prefix_list=[]):
    for child in root.children.values():
        path = []
        create_path(child, path)
        if len(path) > 1:
            prefix_list.append([path[1:], child.name, child.count])
        for x in get_prefix_list(child):
            if x not in prefix_list:
                prefix_list.append([x, child.name, child.count])
    return prefix_list

def get_prefix_dict(root, min_sup):
    prefix_list = get_prefix_list(root)
    prefix_dict = {}
    for x in prefix_list:
        if x[1] in prefix_dict.keys():
            prefix_dict[x[1]].append([x[0], x[2]])
        else:
            prefix_dict[x[1]] = [[x[0], x[2]]]
    return prefix_dict

This returns the correct dictionary of prefix paths for the initial FP tree, which is great. However, once I get to the step where I have to find prefix paths from the trees created from mining the prefix path, I run into problems. Here's one of the prefix path-generated trees, for item '5':

Name: root
Level: 0
Count: 0
Children: ['2']

   Name: 2
   Level: 1
   Count: 2
   Parent: root
   Children: ['1']

      Name: 1
      Level: 2
      Count: 2
      Parent: 2
      Children: ['3']

         Name: 3
         Level: 3
         Count: 1
         Parent: 1
         Children: []

This is what we expect, and looks good so far. But when I try to run my prefix-finding function on it:

get_prefix_dict(cond_fp_trees['5'], min_support)

I get this nonsense:

{'1': [[['2'], 2],
  [['3', '2'], 2],
  [['3'], 2],
  [['2'], 2],
  [['2'], 2],
  [['2'], 2]],
 '5': [[['1', '2'], 1], [['1', '3', '2'], 1]],
 '4': [[['1', '2'], 1], [['2'], 1]],
 '3': [[['2'], 4], [['1', '2'], 1], [['1', '2'], 1], [['1', '2'], 1]]}

Now I'm really confused, since there are no '4' items even in that tree. Where are these items coming from? Why doesn't my function behave on this tree the way it did on the initial FP tree?

0

There are 0 best solutions below