Grouping items into subsets (power set)

306 Views Asked by At

Let's say I have the following:

john: [a, b, c, d]
bob:  [a, c, d, e]
mary: [a, b, e, f]

Or reformatted slightly so you can easily see the groupings:

john: [a, b, c, d]
bob:  [a,    c, d, e]
mary: [a, b,       e, f]

What's the most common or most efficient algorithm to generate the following grouping?

[john, bob, mary]: [a]
[john, mary]:      [b]
[john, bob]:       [c,d]
[bob, mary]:       [e]
[mary]:            [f]
[john]:            []
[bob]:             []

After quickly googling, it seems like the above keys represent the "power set". So I was planning on the following impl:

1) Generate power set {{j, b, m}, {j, m}, {j, b} {b, m}, {m}, {j}, {b}} // j = john, b = bob, m = mary

2) Generate set of all letters: {a, b, c, d, e, f}

3) Iterate over subsets, and for each letter, see if letter exists in all elements of the subset

So...

subset = {j, b, m}

letter = a
    j contains a? true
    b contains a? true
    m contains a? true
        * add a to subset {j, b, m}

letter = b
    j contains b? true
    b contains b? false, continue

letter = c
    j contains c? true
    b contains c? true
    m contains c? false, continue
.....

subset = {j, m}
.....

Is there a better solution?

EDIT: The above algorithm is flawed. For instance, {j, m} would also contain "a", which I don't want. I suppose I can simply modify it so that in each iteration, I also check to see if this letter is "NOT IN" the elements not in this set. So in this case, I would also check:

if b does not contain a
2

There are 2 best solutions below

2
On BEST ANSWER

Step 3 (Iterate over subsets) would be inefficient, as it does either "j contain a" or "a not in j" for each element in your power set.

Here is what I would suggest:

1) Generate power set {{j, b, m}, {j, m}, {j, b} {b, m}, {m}, {j}, {b}}. You don't need this step is you don't care about empty mappings in the final output.

2) Iterate over all the elements in your original data structure and construct following:

[a] : [j, b, m]
[b] : [j, m]
[c] : [j, b]
[d] : [j, b]
[e] : [b, m]
[f] : [m]

3) Reverse the above structure and aggregate (using maps of [j, b,...] to list of [a,b...] should do the trick) to get this:

[j, b, m] : [a]
[j, m] : [b]
[j, b] : [c, d]
[b, m] : [e]
[m] : [f]

4) Compare 3 with 1 to fill in the remaining empty mappings.

Edit: Hers is the complete code in Java

    // The original data structure. mapping from "john" to [a, b, c..] 
    HashMap<String, HashSet<String>> originalMap = new HashMap<String, HashSet<String>>();

    // The final data structure. mapping from power set to [a, b...]
    HashMap<HashSet<String>, HashSet<String>> finalMap = new HashMap<HashSet<String>, HashSet<String>>();

    // Intermediate data structure. Used to hold [a] to [j,b...] mapping
    TreeMap<String, HashSet<String>> tmpMap = new TreeMap<String, HashSet<String>>();

    // Populate the original dataStructure
    originalMap.put("john", new HashSet<String>(Arrays.asList("a", "b", "c", "d")));
    originalMap.put("bob", new HashSet<String>(Arrays.asList("a", "c", "d", "e")));
    originalMap.put("mary", new HashSet<String>(Arrays.asList("a", "b", "e", "f")));

    // Hardcoding the powerset below. You can generate the power set using the algorithm used in googls guava library.
    // powerSet function in https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/Sets.java
    // If you don't care about empty mappings in the finalMap, then you don't even have to create the powerset
    finalMap.put(new HashSet<String>(Arrays.asList("john", "bob", "mary")), new HashSet<String>());
    finalMap.put(new HashSet<String>(Arrays.asList("john", "bob")), new HashSet<String>());
    finalMap.put(new HashSet<String>(Arrays.asList("bob", "mary")), new HashSet<String>());
    finalMap.put(new HashSet<String>(Arrays.asList("john", "mary")), new HashSet<String>());
    finalMap.put(new HashSet<String>(Arrays.asList("john")), new HashSet<String>());
    finalMap.put(new HashSet<String>(Arrays.asList("bob")), new HashSet<String>());
    finalMap.put(new HashSet<String>(Arrays.asList("mary")), new HashSet<String>());

    // Iterate over the original map to prepare the tmpMap.
    for(Entry<String, HashSet<String>> entry : originalMap.entrySet()) {
        for(String value : entry.getValue()) {
            HashSet<String> set = tmpMap.get(value);
            if(set == null) {
                set = new HashSet<String>();
                tmpMap.put(value, set);
            }
            set.add(entry.getKey());
        }
    }

    // Iterate over the tmpMap result and add the values to finalMap
    for(Entry<String, HashSet<String>> entry : tmpMap.entrySet()) {
        finalMap.get(entry.getValue()).add(entry.getKey());
    }

    // Print the output
    for(Entry<HashSet<String>, HashSet<String>> entry : finalMap.entrySet()) {
        System.out.println(entry.getKey() +" : "+entry.getValue());
    }

The output of the code above would be :

[bob] : []
[john] : []
[bob, mary] : [e]
[bob, john] : [d, c]
[bob, john, mary] : [a]
[mary] : [f]
[john, mary] : [b]
4
On

You could achieve this with two maps/dictionaries, one being the 'inverse' of the other. For the first map the 'key' would be the name, and the 'value' would be a list of characters. The second map has the letters as keys and a list of the names associated with it as the value.

In Python

nameDict = {'john' : ['a', 'b', 'c', 'd'], 'bob' : ['a', 'c', 'd', 'e'], 'mary' : ['a', 'b', 'e', 'f']}

reverseDict = {}
for key,values in nameDict.items():
    for v in values:
        if v in reverseDict.keys():
            reverseDict[v].append(key)
        else:
            reverseDict[v] = [key] # If adding v to dictionary for the first time it needs to be as a list element

# Aggregation
finalDict = {}
for key,values in reverseDict.items():
    v = frozenset(values)
    if v in finalDict.keys():
        finalDict[v].append(key)
    else:
        finalDict[v] = [key] 

Here, reverseDict contains the mapping you want with a -> [john, bob, mary], b -> [john, mary] etc. You can also check if john does not contain a by checking if the list returned by reverseDict['a'] contains john.

[EDIT] Added aggregation into finalDict.

You can use frozensets as dictionary keys so finalDict now contains the correct results. Printing out the dictionary:

frozenset({'bob', 'mary'})
['e']

frozenset({'mary'})
['f']

frozenset({'john', 'bob'})
['c', 'd']

frozenset({'john', 'mary'})
['b']

frozenset({'john', 'bob', 'mary'})
['a']