Mutate an array of discrete probabilities by excluding one value in C

74 Views Asked by At

I am working in a project in C where I want to progressively mutate a uint32_t under the following conditions:

  1. The probability of a bit flip starts out with probability 1/2 for the least significant bit (LSB), then 1/4 for the next bit to the left, 1/8 for the next, and so on (see example array).
  2. After a bit k is flipped the value of probability(k) is redistributed to all other bits according to the distribution laid out in step one.
  3. probability(k) is then set to zero.

I imagine that these probabilities are best stored in a length 32 array of doubles and so a very useful answer would be a function which accepts a length 32 array of doubles and some integer for a bit to be excluded and returns a modified length 32 array.

Is this accomplishable by generating a length 31 array excluding k with the procedure from step 1, multiplying each value by the value of array[k], then creating a length 32 array with array[k] = 0 and adding that to the input array (after setting input[k] = 0?

A problem I imagine might happen but which I'm unsure how to solve:

  • In step one, those probabilities are all 1.) large enough to be represented at all by doubles and 2.) powers of 2 so they are exactly represented. However, there is no good reason why they would remain so. The example array below sums to one because they are all exactly representable. Again, I have no reason to assume that will be true for other values. How to preserve the rough pragmatic ability to choose in a way that is equivalent to drawing from a distribution that does sum to one is unclear to me.

Answers

The solution has to be in C because the rest of the code in the project is. Sorry, I'm sure there are very cool ways to solve this in other languages. Probably the binomial package in R will have something that does this, but that doesn't help. A C-like language which I can manually adapt code to work in C is also fine.

I'm on a desktop computer otherwise in control of the development environment, so any libraries which would make this easy are welcome. Thanks. Also I don't expect any performance constraints so code that is slow or needs to store tables and such is fine.

My example here uses doubles but that's not definitive. I'm coming here asking the question because I don't know how to do this. If you have an answer which works with integers entirely then I would love to see that.

example array

void create_array32(double array[32]) {
    int i;
    for (i = 0; i < 32; i++) {
        array[i] = pow(2, -(32 - i));
    }
}
// The output, if that is easier to work with
double example[32] = {
0.0000000002328306, 0.0000000004656613,
0.0000000009313226, 0.0000000018626451,
0.0000000037252903, 0.0000000074505806,
0.0000000149011612, 0.0000000298023224,
0.0000000596046448, 0.0000001192092896,
0.0000002384185791, 0.0000004768371582,
0.0000009536743164, 0.0000019073486328,
0.0000038146972656, 0.0000076293945312,
0.0000152587890625, 0.0000305175781250,
0.0000610351562500, 0.0001220703125000,
0.0002441406250000, 0.0004882812500000,
0.0009765625000000, 0.0019531250000000,
0.0039062500000000, 0.0078125000000000,
0.0156250000000000, 0.0312500000000000,
0.0625000000000000, 0.1250000000000000,
0.2500000000000000, 0.5000000000000000}
2

There are 2 best solutions below

0
On BEST ANSWER

Instead of maintaining an array of probabilities, maintain a corresponding array of selection frequencies:

uint32_t frequencies[32];

for (int i = 0; i < 32; i++) {
    frequencies[i] = (uint32_t) 1 << (31 - i);
}

If you like, you could pre-compute these starting frequencies and put them in an initializer instead of computing them at runtime.

Each time you want to make a selection,

  1. Compute an array of the cumulative sums of the frequencies:

    uint32_t cumulative[33] = {0};
    
    for (int i = 0; i < 32; i++) {
        cumulative[i + 1] = cumulative[i] + frequencies[i];
    }
    
  2. Generate a (uniformly distributed) random number x between 0 (inclusive) and cumulative[32] (exclusive).

  3. Find the value n such that cumulative[n] <= x && x < cumulative[n + 1]. This n is the selected bit number. You could use a binary search, but a linear search would be simpler, and for only 32 items, about as fast.

To remove bit n from further consideration, just set its frequency to 0:

frequencies[n] = 0;

When you compute the new cumulative sums for the next selection, that will naturally both exclude n from consideration and, by computing a revised total, adjust the probabilities of all the remaining options.

0
On

int choose_bit(double array[32]) {
  double cumsum[32] = { 0 };
  compute_cumulative_sum(array, cumsum);
  // https://stackoverflow.com/a/6219525
  double r = (double)rand() / (double)RAND_MAX;
  int i = 0;
  for (i = 0; i < 32; i++) {
    if (r <= cumsum[i]) {
      return i;
    }
  }
}

int mutate_and_advance(double array[32]) {
    double gapped[32];
    float chosen_prob;

    int bit = choose_bit(array);
    create_gapped_array32(gapped, bit);
    chosen_prob = array[bit];
    array[bit] = 0;
    multiply_array_by_scalar(gapped, chosen_prob);
    add_32_arrays(array, gapped, array);
    return bit;
}

I think the above does what I need. It returns an int for now so I can test to see if it cycles through the indices the way that I want.

Helper functions and libraries below, along with a (very) rough test:


#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

void create_array32(double array[32]) {
  int i;
  for (i = 0; i < 32; i++) {
    array[i] = ldexp(1, -(32 - i));
  }
}

void create_gapped_array32(double array[32], int location) {
    int i;
    for (i = 0; i < 32; i++) {
        if (i == location) {
            array[i] = 0;
        } else {
            array[i] = ldexp(1, -(32 - i));
        }
    }
}

void compute_cumulative_sum(double arr[32], double sum[32]) {
  sum[0] = arr[0];
  for (int i = 1; i < 32; i++) {
    sum[i] = sum[i - 1] + arr[i];
  }
}

void multiply_array_by_scalar(double array[32], double scalar) {
  int i;
  for (i = 0; i < 32; i++) {
    array[i] *= scalar;
  }
}

void add_32_arrays(double left[32], double right[32], double output[32]) {
  int i;
  for (i = 0; i < 32; i++) {
    output[i] += left[i] + right[i];
  }
}

// Test 

int main() {
  int k = 0;
  double probabilties[32] = { 0 };
  create_array32(probabilties);
  for (k = 0; k < 55; k++) {
    printf("Index: %d\n", mutate_and_advance(probabilties));
  }

  return 0;
}