Logic to find minimum number of strings for complete coverage of characters

1.6k Views Asked by At

I have set of strings

[abcd, efgh, abefg]

How to find the minimum number of strings that covers all the characters (abcdefgh)

Answer would be abcd and efgh. But what would be the algorithm to find this answer?

4

There are 4 best solutions below

0
On BEST ANSWER

Thanks everyone for the response, Finally completed it, have given the algorithm below in simple words as a refernce for others

Sub optimize_strings()

Capture list of strings in an array variable & number of strings in an integer

Initialize array of optimized strings as empty & pointer to it as zero

Get the list of all characters in an array & number of characters in a variable

Do While number of characters>0

Reset the frequency of all characters as zero & then calculate the frequency of all characters in uncovered strings in separate array

Reset the number of uncovered characters for each strings as zero & then calculate the number of uncovered characters in each strings in separate array

Sort the characters in characters array in ascending order based on their characters frequency array

Fetch list of strings that contains the character present in the top of the character array & place them in filtered strings array

Bubble sort filtered strings array in descending order based on the number of uncovered characters which was stored in step 2 of this loop

Store the Top of the filtered strings array in optimized strings array & increase its pointer to 1

Iterate through all the characters in the optimized string & remove all the characters present in it from characters array

Loop

Print the result of optimized strings present in optimized strings array

End Sub

3
On

The "set cover problem" can be reduced to your problem. You can read about it on Wikipedia link. There is no known polynomial solution for it.

@j_random_hacker: That's what I meant. Corrected.

@Yuvaraj: Check the following pseudo code:

str = input string
S = input set
for each subset s of S in ascending order of cardinality:
    if s covers str
        return s
return none
2
On

python

>>> a="abcd efgh abefg"
>>> set(a)
set(['a', ' ', 'c', 'b', 'e', 'd', 'g', 'f', 'h'])
>>> ''.join(set(a))
'a cbedgfh'
>>> ''.join(set(a)-set(' '))
'acbedgfh'
0
On

If you want to check every possible combination of strings to find the shortest combination which covers a set of characters, there are two basic approaches:

  1. Generating every combination of strings, and for each one, checking whether it covers the whole character set.
  2. For each character in the set, making a list of strings it appears in, and then combining those lists to find combinations of strings which cover the character set.

(If the number of characters or strings is too big to check all combinations in reasonable time, you'll have to use an approximation algorithm, which will find a good-enough solution, but can't guarantee to find the optimal solution.)

The first approach generates N! combinations of strings (where N is the number of strings) so e.g. for 13 strings that is more than 2^32 combinations, and for 21 strings more than 2^64. For large numbers of strings, this may become too inefficient. On the other hand, the size of the character set doesn't have much impact on the efficiency of this approach.

The second approach generates N lists of indexes pointing to string (where N is the number of characters in the set), and each of these lists holds at most M indexes (where M is the number of strings). So there are potentially M^N combinations. However, the number of combinations that are actually considered is much lower; consider this example with 8 characters and 8 strings:

character set: abcdefg
strings: 0:pack, 1:my, 2:bag, 3:with, 4:five, 5:dozen, 6:beige, 7:eggs
string matches for each character:
a: [0,2]
b: [2,6]
c: [0]
d: [5]
e: [4,5,6,7]
f: [4]
g: [2,6,7]
optimal combinations (size 4):
[0,2,4,5] = ["pack,"bag","five","dozen"]
[0,4,5,6] = ["pack,"five","dozen","beige"]

Potentially there are 2x2x1x1x4x1x3 = 48 combinations. However, if string 0 is selected for character "a", that also covers character "c"; if string 2 is selected for character "a", that also covers characters "b" and "g". In fact, only three combinations are ever considered: [0,2,5,4], [0,6,5,4] and [2,0,5,4].

If the number of strings is much greater than the number of characters, approach 2 is the better choice.

code example 1

This is a simple algorithm which uses recursion to try all possible combinations of strings to find the combinations which contain all characters.
Run the code snippet to see the algorithm find solutions for 12 strings and the whole alphabet (see console for output).

// FIND COMBINATIONS OF STRINGS WHICH COVER THE CHARACTER SET
function charCover(chars, strings, used) {
    used = used || [];
    // ITERATE THROUGH THE LIST OF STRINGS
    for (var i = 0; i < strings.length; i++) {
        // MAKE A COPY OF THE CHARS AND DELETE THOSE WHICH OCCUR IN THE CURRENT STRING
        var c = chars.replace(new RegExp("[" + strings[i] + "]","g"), "");
        // MAKE A COPY OF THE STRINGS AFTER THE CURRENT STRING
        var s = strings.slice(i + 1);
        // ADD THE CURRENT STRING TO THE LIST OF USED STRINGS
        var u = used.concat([strings[i]]);
        // IF NO CHARACTERS ARE LEFT, PRINT THE LIST OF USED STRINGS
        if (c.length == 0) console.log(u.length + " strings:\t" + u)
        // IF CHARACTERS AND STRINGS ARE LEFT, RECURSE WITH THE REST
        else if (s.length > 0) charCover(c, s, u);
    }
}

var strings = ["the","quick","brown","cow","fox","jumps","over","my","lazy","cats","dogs","unicorns"];
var chars = "abcdefghijklmnopqrstuvwxyz";
charCover(chars, strings);

You can prune some unnecessary paths by adding this line after the characters are removed with replace():

        // IF NO CHARS WERE DELETED, THIS STRING IS UNNECESSARY
        if (c.length == chars.length) continue;

code example 2

This is an algorithm which firsts creates a list of matching strings for every character, and then uses recursion to combine the lists to find combinations of strings that cover the character set.
Run the code snippet to see the algorithm find solutions for 24 strings and 12 characters (see console for output).

// FIND COMBINATIONS OF STRINGS WHICH COVER THE CHARACTER SET
function charCover(chars, strings) {
    // CREAT LIST OF STRINGS MATCHING EACH CHARACTER
    var matches = [], min = strings.length, output = [];
    for (var i = 0; i < chars.length; i++) {
        matches[i] = [];
        for (var j = 0; j < strings.length; j++) {
            if (strings[j].indexOf(chars.charAt(i)) > -1) {
                matches[i].push(j);
            }
        }
    }
    combine(matches);
    return output;
    // RECURSIVE FUNCTION TO COMBINE MATCHES
    function combine(matches, used) {
        var m = []; used = used || [];
        // COPY ONLY MATCHES FOR CHARACTERS NOT ALREADY COVERED
        for (var i = 0; i < matches.length; i++) {
            for (var j = 0, skip = false; j < matches[i].length; j++) {
                if (used.indexOf(matches[i][j]) > -1) {
                    skip = true;
                    break;
                }
            }
            if (! skip) m.push(matches[i].slice());
        }
        // IF ALL CHARACTERS ARE COVERED, STORE COMBINATION
        if (m.length == 0) {
            // IF COMBINATION IS SHORTER THAN MINIMUM, DELETE PREVIOUSLY STORED COMBINATIONS
            if (used.length < min) {
                min = used.length;
                output = [];
            }
            // CONVERT INDEXES TO STRINGS AND STORE COMBINATION
            var u = [];
            for (var i = 0; i < used.length; i++) {
                u.push(strings[used[i]]);
            }
            output.push(u);
        }
        // RECURSE IF CURRENT MINIMUM NUMBER OF STRINGS HAS NOT BEEN REACHED
        else if (used.length < min) {
            // ITERATE OVER STRINGS MATCHING NEXT CHARACTER AND RECURSE
            for (var i = 0; i < m[0].length; i++) {
                combine(m, used.concat([m[0][i]]));
            }
        }
    }
}

var strings = ["the","quick","brown","fox","jumps","over","lazy","dogs","pack","my","bag","with","five","dozen","liquor","jugs","jaws","love","sphynx","of","black","quartz","this","should","do"];
var chars = "abcdefghijkl";
var result = charCover(chars, strings);
for (var i in result) console.log(result[i]);

This algorithm can be further optimised to avoid finding duplicate combinations with the same strings in different order. Sorting the matches by size before combining them may also improve efficiency.