C++ - Sieve of Atkin code optimizations

681 Views Asked by At

I was wondering if anyone had any suggestions on what I can do to improve the running speed of my code. I wrote a Sieve of Atkin function that returns a vector including all primes from [2, max] inclusive.

Here is my code.

void atkin_sieve (unsigned int max, std::vector <unsigned int> & primes) {
  // Sieve array up to max defaulted to false
  // Index's [0, max] correspond to numbers
  // Dynamic memory so all values default false
  bool* sieve = new bool [max + 1];
  // Square root of max number
  unsigned int sqrt_max = int (sqrt (max));
  // Unsigned integers declared to save time
  unsigned int n, x, y;

  // TODO - Explain this
  for (x = 1; x < sqrt_max; x++) {
    for (y = 1; y < sqrt_max; y++) {
      n = (4 * x * x) + (y * y);
      if (n <= max && (n % 12 == 1 || n % 12 == 5))
    sieve [n] = !sieve [n];
      n = (3 * x * x) + (y * y);
      if (n <= max && (n % 12 == 7))
    sieve [n] = !sieve [n];
      n = (3 * x * x) - (y * y);
      if (x > y && n <= max && (n % 12 == 11))
    sieve [n] = !sieve [n];
    }
  }

  // TODO - Explain this
  for (x = 5; x < sqrt_max; x++) {
    if (sieve [x]) {
      n = x * x;
      for (y = n; y <= max; y += n)
    sieve [y] = false;
    }
  }

  // Push primes 2, 3, and 5
  primes.push_back(2);
  primes.push_back(3);
  primes.push_back(5);
  // Start from prime 7, skip even numbers
  for (x = 7; x <= max; x += 2) {
    // If the number is prime
    if (sieve [x])
      // Push it into the vector
      primes.push_back(x);
  }

  // Delete the sieve array
  delete [] sieve;
}

I have a few questions about what I might be able to do better in order to optimize this function.

I initialized the sieve array as a dynamic array of booleans so that they would all default to false, is it faster to have the sieve be dynamic like this or should I keep it as a normal array?

I am storing the primes in a vector using a for loop after the algorithm has processed, is there a faster way that i can find all the primes in the sieve to store them in the vector?

Any other tips, tricks, hints, or code is welcome and very much appreciated, thank you.

2

There are 2 best solutions below

0
On

As the the other answer has suggested, profiling is the best way to work out where the time is going.

Some suggestions and comments, some of which are probably obvious

  • I don't believe you actually have a guarantee of zero initialisation on your allocation of sieve; to do so you'd need to use bool *sieve = new bool[max+1]();. If you find things are working correctly then you're either being lucky here or the running under a compiler that zero-initialises debug builds. If that's the case, try a release build with optimisations enabled and you'll see some speed up.

  • Your bool[] is most likely not going to be using up 1 bit per element, but probably a byte. You might find using a std::vector<bool> more efficient as it's typically specialised to store the booleans densely - 1 bit per bool. You'd be trading off a reduced memory footprint vs. increased complexity of reading/writing an individual boolean.

  • Try pre-sizing the primes array to max / log(max), with appropriate input validation, as an approximation to the number of primes less than max.

  • If the function is being called repeatedly in your application, try reusing previous sieve arrays to answer subsequent calls.

0
On

Depending on how many primes you are returning, using std::vector may be causing a problem, each time a vector has to grow to handle more objects than its capacity it will probably need to create a new array and copy all the values from the old array to the new. Consider using a std::list or a std::deque to avoid this issue. If you really need a vector it may be faster to loop through and count the number of primes, then reserve that much size in the vector, so that the vector never needs to grow.

You should probably do some profiling on the code - how you do this depends on your development environment. This should tell you where most of your code's time is spent. Without that info, you could spend ages optimizing one part of the code when it won't have a huge effect on the result.

At the very least add some timing code, so you can see whether any changes you make help, and by how much.

The best optimizations are usually changing the algorithm, usually doing things like loop unrolling etc are best left to the compiler, unless the code is particularly time critical (even then maybe not).

Also, make sure you are compiling with optimizations on - this can make a huge difference.