Strange performance problem

228 Views Asked by At

I have a container similar to this one.

template <typename Nat, typename Elt>
class NatMap {
 public:
  Elt& operator[] (Nat nat) { 
    return tab [nat.GetRaw()];
  }
 private:
  Elt tab [Nat::kBound];
};

I wanted to drop the requirement for Elt to have a default constructor:

template <typename Nat, typename Elt>
class NatMap {
 public:
  Elt& operator[] (Nat nat) { 
    return ((Elt*)tab) [nat.GetRaw()];
  }
 private:
  char tab [Nat::kBound * sizeof(Elt)];
};

I use g++-4.3 and this code works 25% slower in my application than the previous one. Unfortunately the slowdown does not manifest in a synthetic benchmark. I guess it is something about compiler optimizations, aliasing, aligning, or similar stuff.

What should I do to get my performance back? (while not needing the default constructor)

Update:

Just now I tried new g++-4.4 and it gave me a following warning for the latter code:

dereferencing pointer '<anonymous>' does break strict-aliasing rules
2

There are 2 best solutions below

8
On

Small suggestion: rather than trying to make educated guesses, like if the compiler optimizations are different, you could either single-step it, or find out with this unorthodox method.

2
On

You may be running into alignment problems. If Elt is some size other than the native alignment type, then allocating it via placement into a character array may involve a lot of unaligned reads that you don't see when the compiler aligns it for you. Or you may be running into a problem called a load-hit-store, which some processors manifest when they write a value to memory and then read it back immediately; in those processors, it can be a stall as long as a pipeline.

Or it may be something else entirely, some kind of pathological code generation by GCC.

Unfortunately stack traces don't help track down either of these issues, as they'd just look like a load operation (lw, lb, etc) that took forty cycles instead of one. The stall is in the microcode inside the CPU, not the x86 code you've written. But looking at the assembly with the -S commandline option can help you figure out what the compiler is really emitting, and how it differs between your two implementations. Maybe there's some bad operation cropping up in one version.