I want a string shuffling algorithm that takes the following arguments:
void CRLimitedShuffle(char* Str, uint8_t MaxConsecutiveRepetition);
The function should perform a random shuffle on the input string. However:
- a
MaxConsecutiveRepetitionrestriction must be observed, i.e., no more thanMaxConsecutiveRepetitionconsecutive identical characters are allowed in the result string. - all correct shuffling outcomes must appear with equal probability.
For example:
Str="aaaaabbbbb";
MaxConsecutiveRepetition=3;
For the arguments above, "aabbbaaabb" is a correct result while "aaabaabbbb" is incorrect because it has 4 consecutive "b"s, more than MaxConsecutiveRepetition.
A naive idea is to review the shuffled results and reshuffle them if they don't meet the requirements. However, for some extreme cases, performance may be extremely low. For example:
Str="aaaabbbbbbbbbb";
MaxConsecutiveRepetition=2;
In this example, only "bbabbabbabbabbabb" is the correct output, and all other randomly generated sequences are discarded. In this way, the function will continue to loop until the unique correct string is generated with a very small probability.
Another idea is to generate all possible permutations, then eliminate those that are wrong and draw randomly from the remaining correct permutations. But the complexity of this algorithm will be factorial.
The final idea is to make a mathematical modification of Fisher-Yates: start with a random character, and then multiply the probability of the previous character being drawn by an attenuation coefficient each time the next character is drawn. For example, if the previous character has been consecutively drawn for MaxConsecutiveRepetition times, the attenuation coefficient should become 0 to ensure that the next character cannot be the same as the previous one. However, I do not know how to calculate this attenuation coefficient exactly to ensure that all the correct permutations are generated with equal probability.
I need an algorithm that meets both requirements listed above.
OOPS
I accidentally solved the problem of limiting how many places there was a repeat. And not how many repeats there were in a row. So I solved the wrong problem.
I've left the original solution. But added the desired one at the end.
The idea sounded simpler than it was.
The problem is finding how many permutations have the desired pattern. That can be calculated recursively in terms of:
And then you can memoize to speed it up. That function looks like this (in Python, and using the fact that tuples of tuples can be used as a dictionary key):
It turns out to be a little more convenient from how we'll use it to start with a dictionary of frequency to chars. So we need the following little helper.
And now we can actually find our permutation.
And demonstrate it in action.
To quote Jason Calcanis, "We don't do it because it's easy. We do it because we thought it would be easy."
I seriously underestimated how fiddly this code was going to be. And then once I was going, I wanted to finish...
And now for the right problem.