I stumbled upon this question and I'm unable to solve it. Any help is much appreciated.
You are given an array S made of N strings and an integer K. Choose at most K letters from the alphabet that will allow you to build as many strings from array S as possible. Any of the chosen letters can be used multiple times when building the strings. What is the maximum number of strings that can be built?
Example:
S = ["a","aa","ab","bb","bc","bd"] K = 1 answer = 2; pick letter 'a' to build strings ["a","aa"]
I seriously have no idea how to solve this...
I think this problem would be NP-hard, except that there are only 26 letters. That bound makes the problem easy.
The first step is to pick either the problem or its complement to solve. Let's say there are U letters that are actually used in strings.
If K <= U/2, then just try all C(U,K) ways to choose K of U letters and find the largest number of words you can build. There are at most 10,600,400 combinations.
If K > U/2. then try all C(U,U-K) ways to choose U-K letters, and find the combination that is used in the fewest number of words. The remaining words are the ones that can be built with the letters you don't choose. Again there are at most 10,600,400 combinations to try.
This takes linear time. There's a constant factor of 10 million, but with careful optimization you can make this quite fast.
I would probably transform each word into a bit-mask of the letters that it uses, make an array of bit masks for each chosen combination, and then match each word against each mask. If I'm trying to build words, I add 1 for every mask where
(word&mask)==word. If I'm counting hits, I add 1 for every mask where(word&mask)!=0. Precalculating some mask subsets can speed this up.