A shuffling algorithm that limits the number of consecutive repetitions?

Question

A shuffling algorithm that limits the number of consecutive repetitions?

150 Views Asked by 埃博拉酱 At 06 September 2023 at 08:57

I want a string shuffling algorithm that takes the following arguments:

void CRLimitedShuffle(char* Str, uint8_t MaxConsecutiveRepetition);

The function should perform a random shuffle on the input string. However:

a MaxConsecutiveRepetition restriction must be observed, i.e., no more than MaxConsecutiveRepetition consecutive identical characters are allowed in the result string.
all correct shuffling outcomes must appear with equal probability.

For example:

Str="aaaaabbbbb";
MaxConsecutiveRepetition=3;

For the arguments above, "aabbbaaabb" is a correct result while "aaabaabbbb" is incorrect because it has 4 consecutive "b"s, more than MaxConsecutiveRepetition.

A naive idea is to review the shuffled results and reshuffle them if they don't meet the requirements. However, for some extreme cases, performance may be extremely low. For example:

Str="aaaabbbbbbbbbb";
MaxConsecutiveRepetition=2;

In this example, only "bbabbabbabbabbabb" is the correct output, and all other randomly generated sequences are discarded. In this way, the function will continue to loop until the unique correct string is generated with a very small probability.

Another idea is to generate all possible permutations, then eliminate those that are wrong and draw randomly from the remaining correct permutations. But the complexity of this algorithm will be factorial.

The final idea is to make a mathematical modification of Fisher-Yates: start with a random character, and then multiply the probability of the previous character being drawn by an attenuation coefficient each time the next character is drawn. For example, if the previous character has been consecutively drawn for MaxConsecutiveRepetition times, the attenuation coefficient should become 0 to ensure that the next character cannot be the same as the previous one. However, I do not know how to calculate this attenuation coefficient exactly to ensure that all the correct permutations are generated with equal probability.

I need an algorithm that meets both requirements listed above.

Original Q&A

There are 1 best solutions below

**btilly** · Answer 1 · 2023-09-06T23:24:00.557000

OOPS

I accidentally solved the problem of limiting how many places there was a repeat. And not how many repeats there were in a row. So I solved the wrong problem.

I've left the original solution. But added the desired one at the end.

The idea sounded simpler than it was.

The problem is finding how many permutations have the desired pattern. That can be calculated recursively in terms of:

The maximum repetitions left.
How many of the last used character there is.
A partition representing the distribution of frequency counts.

And then you can memoize to speed it up. That function looks like this (in Python, and using the fact that tuples of tuples can be used as a dictionary key):

answers_count = {}
def _count_answers(frequency_count, max_repetition, last_freq):
    partition = []
    for freq, count in sorted(frequency_count.items()):
        if 0 < count:
            partition.append((freq, count))

    if 0 == len(partition):
        if last_freq <= max_repetition + 1:
            return 1
        else:
            return 0

    key = (max_repetition, last_freq, tuple(partition))
    if key not in answers_count:
        # Better safe than sorry, make a copy.
        frequency_count = frequency_count.copy()

        answers = 0
        # Can we repeat here?
        if 1 < max_repetition and 1 < last_freq:
            answers += _count_answers(
                frequency_count, max_repetition - 1, last_freq - 1)

        # We will return this character to the pool.
        return_freq = last_freq - 1
        if 0 < return_freq:
            frequency_count[return_freq] = 1 + frequency_count.get(return_freq, 0)

        # The list is to avoid iterating as we are modifying.
        for freq, count in list(frequency_count.items()):
            # Adjust for one of these being taken
            if count == 1:
                frequency_count.pop(freq)
            else:
                frequency_count[freq] -= 1

            if freq == return_freq:
                if 1 < count:
                    answers += (count - 1) * _count_answers(frequency_count, max_repetition, freq)
            else:
                answers += count * _count_answers(frequency_count, max_repetition, freq)

            # Restore the key.
            frequency_count[freq] = count

        answers_count[key] = answers
    return answers_count[key]

It turns out to be a little more convenient from how we'll use it to start with a dictionary of frequency to chars. So we need the following little helper.

def count_answers(frequency_chars, max_repetition, last_freq):
    frequency_count = {}
    for freq, chars in frequency_chars.items():
        if 0 < len(chars):
            frequency_count[freq] = len(chars)
    return _count_answers(frequency_count, max_repetition, last_freq)

And now we can actually find our permutation.

def shuffle_limited_repetition (string, max_repetition):
    # Calculate the frequency of each characters.
    char_frequency = {}
    for char in string:
        char_frequency[char] = 1 + char_frequency.get(char, 0)

    # Turn that around to get the characters with a given frequency.
    frequency_chars = {}
    for char, freq in char_frequency.items():
        if freq in frequency_chars:
            frequency_chars[freq].append(char)
        else:
            frequency_chars[freq] = [char]

    # Find how many permutations there are and choose one.
    total_answers = count_answers(frequency_chars, max_repetition, 1)
    if total_answers == 0:
        return None
    chosen = random.randrange(total_answers)

    # We will build up an array of characters, and join it to get our answer.
    final_chars = []

    # Our state is:
    # - frequency_chars (which we have)
    # - the last character we used
    # - how many copies there were
    #
    # So initialize the rest of that state with...
    # "We put down nothing, and there are no more copies of it."
    last_char = None
    last_freq = 1

    # Until our permutation is as big as we want...
    while len(final_chars) < len(string):
        # Can we put down last_char again?
        if 0 < max_repetition and 1 < last_freq:
            # Is our choice the one where we put down last_char again?
            if chosen < count_answers(frequency_chars, max_repetition - 1, last_freq - 1):
                # It is, record that, update the state, then skip.
                final_chars.append(last_char)
                max_repetition -= 1
                last_freq -= 1
                continue
            else:
                # Skip past all of teh choices where we put down the last_char again.
                chosen -= count_answers(frequency_chars, max_repetition - 1, last_freq - 1)

        # Return last_char to frequency_chars.
        return_freq = last_freq - 1 # What it had been, minus the one we put down.
        if 0 == return_freq:
            pass # Just kidding, we're done with this character.
        elif return_freq in frequency_chars:
            # Add it to existing frequency.
            frequency_chars[return_freq].append(last_char)
        else:
            # This frequency now exists with 1 char in it.
            frequency_chars[return_freq] = [last_char]

        # The list(...) construct is to avoid modifying frequency_chars
        # as we're iterating over it.
        for freq, chars in list(frequency_chars.items()):
            # NOTE: chars is a reference to frequency_chars[freq]
            # So altering our copy, alters the original.

            # Simulate taking a character, how many solutions is that?
            tmp_char = chars.pop()
            count = count_answers(frequency_chars, max_repetition, freq)
            chars.append(tmp_char)

            # How many characters can we take?
            count_chars = len(chars)
            if freq == return_freq:
                # We can't take the one we just put back!
                count_chars -= 1

            # Are we going to take one of these characters?
            if chosen < count_chars * count:
                # We are, we need to figure out which one.
                i = 0
                while count <= chosen:
                    i += 1
                    chosen -= count
                # Now update our state to indicate our choice.
                last_char = chars.pop(i)
                last_freq = freq

                if 0 == len(chars):
                    # Remove this frequency if we're done with it.
                    frequency_chars.pop(freq)
                final_chars.append(last_char)

                # NOTE: Exit loop here, we made the choice.
                break
            else:
                # Skip past all of these choices.
                chosen -= count_chars * count

    return ''.join(final_chars)

And demonstrate it in action.

for i in range(100):
    print(shuffle_limited_repetition("aaaabbbbc", 3))

To quote Jason Calcanis, "We don't do it because it's easy. We do it because we thought it would be easy."

I seriously underestimated how fiddly this code was going to be. And then once I was going, I wanted to finish...

And now for the right problem.

import random

def count_answers(frequency_chars, max_consecutive, used_freq):
    frequency_count = {}
    for freq, chars in frequency_chars.items():
        if 0 < len(chars):
            frequency_count[freq] = len(chars)
    return _count_answers(frequency_count, max_consecutive, used_freq)

answers_count = {}
def _count_answers(frequency_count, max_consecutive, used_freq):
    partition = []
    for freq, count in sorted(frequency_count.items()):
        if 0 < count:
            partition.append((freq, count))

    if 0 == len(partition):
        if 0 == used_freq:
            return 1
        else:
            return 0

    key = (max_consecutive, used_freq, tuple(partition))
    if key not in answers_count:
        frequency_count = frequency_count.copy()
        answers = 0

        # We will return this character to the pool.
        if 0 < used_freq:
            frequency_count[used_freq] = 1 + frequency_count.get(used_freq, 0)

        # The list is to avoid iterating as we are modifying.
        for freq, count in list(frequency_count.items()):
            # Adjust for one of these being taken
            if count == 1:
                frequency_count.pop(freq)
            else:
                frequency_count[freq] -= 1

            for i in range(1, min(freq, max_consecutive) + 1):

                these_answers = 0
                this_answer_count = _count_answers(frequency_count, max_consecutive, freq - i)
                if freq == used_freq:
                    answers += (count - 1) * this_answer_count
                else:
                    answers += count * this_answer_count

            # replace the taken one.
            frequency_count[freq] = count

        answers_count[key] = answers
    return answers_count[key]

def shuffle_limited_repetition (string, max_consecutive):
    # Calculate the frequency of each characters.
    char_frequency = {}
    for char in string:
        char_frequency[char] = 1 + char_frequency.get(char, 0)

    # Turn that around to get the characters with a given frequency.
    frequency_chars = {}
    for char, freq in char_frequency.items():
        if freq in frequency_chars:
            frequency_chars[freq].append(char)
        else:
            frequency_chars[freq] = [char]

    # Find how many permutations there are and choose one.
    total_answers = count_answers(frequency_chars, max_consecutive, 0)
    #for k, v in answers_count.items():
    #    print(k, v)
    if total_answers == 0:
        return None
    chosen = random.randrange(total_answers)

    # We will build up an array of characters, and join it to get our answer.
    final_chars = []

    # Our state is:
    # - frequency_chars (which we have)
    # - the last character we used
    # - how many copies are left
    #
    # So initialize the rest of that state with...
    # "We put down nothing, and there are no more copies of it."
    used_char = None
    used_freq = 0

    # Until our permutation is as big as we want...
    while len(final_chars) < len(string):
        #print(final_chars, frequency_chars, used_char, used_freq)
        # Return used_char to frequency_chars.
        if 0 == used_freq:
            pass # Just kidding, we're done with this character.
        elif used_freq in frequency_chars:
            # Add it to existing frequency.
            frequency_chars[used_freq].append(used_char)
        else:
            # This frequency now exists with 1 char in it.
            frequency_chars[used_freq] = [used_char]

        # The list(...) construct is to avoid modifying frequency_chars
        # as we're iterating over it.
        for freq, chars in list(frequency_chars.items()):
            is_found = False
            for i in range(1, min(freq, max_consecutive) + 1):
                # NOTE: chars is a reference to frequency_chars[freq]
                # So altering our copy, alters the original.

                # Simulate taking a character, how many solutions is that?
                count = 0
                if 0 < len(chars):
                    tmp_char = chars.pop()
                    count = count_answers(frequency_chars, max_consecutive, freq - i)
                    chars.append(tmp_char)

                # How many characters can we take?
                count_chars = len(chars)
                if freq == used_freq:
                    # We can't take the one we just put back!
                    count_chars -= 1

                # Are we going to take one of these characters?
                if chosen < count_chars * count:
                    # We are, we need to figure out which one.
                    j = 0
                    while count <= chosen:
                        j += 1
                        chosen -= count
                    # Now update our state to indicate our choice.
                    used_char = chars.pop(j)
                    for _ in range(i):
                        final_chars.append(used_char)
                    used_freq = freq - i
                    is_found = True
                    break

                else:
                    # Skip past all of these choices.
                    chosen -= count_chars * count
            if is_found:
                break

    return ''.join(final_chars)

for i in range(100):
    print(shuffle_limited_repetition("aaaabbbb", 2))

A shuffling algorithm that limits the number of consecutive repetitions?

There are 1 best solutions below

OOPS

Related Questions in ALGORITHM

Related Questions in SHUFFLE

Related Questions in FISHER-YATES-SHUFFLE

Trending Questions

Popular # Hahtags

Popular Questions