Constructing a modified powerset in Rust - filtering out certain subsets based on their content

50 Views Asked by At

What is the most efficient way (and/or most idiomatic) to make a modified powerset of vector of 2-tuples in Rust? By modified, what I mean is the following rule

(a,b),(c,b) not in any set, for any a,b,c.

For example if I have [(a,b)(c,b)] as the input set, the output set would be [[], [(a,b)], [(c,b)]], noting that the input set itself is excluded by this rule. The output will be of references to elements in the original list to avoid duplicate allocations.

One known property of the input list is that every pair will have exactly 1 partner with a matching 2nd element.

I have noticed some great responses on this thread but I am having trouble adapting any of the solutions to my needs in an efficient way.

Here are a few ideas I have for how to approach this:

  1. as each list is created or after all lists are created, filter by checking if they follow the rule. This checking seems very bad, like nesting 2 for loops of each element for a given list. Another option for checking would be storing the second element of each pair, the benefit being only 1 pass is needed per list, but with extra allocation needed.
  2. Making powersets in stages. Because each 2-tuple has exactly 1 pair with a matching 2nd element, I would somehow mark or separate these pairs in a preprocessing step. This might look something like making a powerset of each unmatched 2-tuple and then making powerset of those sets.

I feel like strategy number 2 has promise in terms of efficiency, but it is quite vague in its current form and I am having trouble working out concrete steps of how to make it work.

Any advice? I wonder if any of this work I am putting into making the construction be more complex is actually not gaining me any efficiency and that just the naive filtering (as described in 1) may be sufficient.

1

There are 1 best solutions below

0
Karl On

I believe I have a set of concrete steps that solves the problem using strategy number 2.

I have made the simple assumption that the input is now a list of the matched_pairs of length n i.e. the pairs whose 2nd coordinate match.

The strategy is to construct the set of all n length combinations of the pairs, doing so by:

  1. Generate all binary strings S of length n
  2. for s in S, and for integer i in s, make set I by including matched_pairs[i][0] if s[i]==0 and matched_pairs[i][1] if s[i]==1

Then take the power set of the resulting sets knowing none can contain 2 matched pairs, and all permutations of the other pairs are accounted for:

  1. Take a power set of each set in I, concatenating into the results
  2. Sort and remove duplicates from the results and return

Here are the implementation details:

Step 1&2: The following code returns sets of length n= matched_pairs.len() and 2^n of them. It is every combination using one of each pair. Not happy how I ended up using strings, I believe there is a prettier way to do it with bit manipulation, bit I could not produce such code.

fn one_of_each(matched_pairs:Vec<Vec<(usize,usize)>>) -> Vec<Vec<(usize,usize)>> {
    let n = matched_pairs.len();
    let m = 2_i32.pow(n as u32);
    let mut S = vec![];
    for x in 0..m {
        S.push(format!("{:0n$b}", x));
    }
    let mut result = vec![];
    for s in S {
        let mut I = vec![];
        for (i,c) in s.chars().enumerate() {
            if c == '0' {
                I.push(matched_pairs[i][0]);
            } else {
                I.push(matched_pairs[i][1]);
            }
        }
        result.push(I);
    }
    result
}

Step 3&4: The following code does the rest of the work. I believe it's possible to do this without making duplicates and then using dedup() but I couldn't produce such code. I believe it would involve stepping through matched_pairs and push onto duplicating lists, though I found accumulating such lists while keep them separate enough to not get duplicates or overwrite difficult.

fn find_f_set(s: Vec<Vec<(usize,usize)>>) -> Vec<Vec<(usize,usize)>> {
    //assume that s is a set of pairs
    let mut results = vec![];
    for set in one_of_each(s) {
        let powerset = powerset(set);
        for subset in powerset {
            results.push(subset);
        } 
    }
    results.sort();
    results.dedup();
    results
}

Here is the code in a working playground.

So this result works, and it produces less duplication than the naive method. That said, very open to improvements, especially one that does not produce duplicates.