Allocating an array of aligned struct

523 Views Asked by At

I'm trying to allocate an array of struct and I want each struct to be aligned to 64 bytes.

I tried this (it's for Windows only for now), but it doesn't work (I tried with VS2012 and VS2013):

struct __declspec(align(64)) A
{
    std::vector<int> v;

    A()
    {
        assert(sizeof(A) == 64);
        assert((size_t)this % 64 == 0);
    }

    void* operator new[] (size_t size)
    {
        void* ptr = _aligned_malloc(size, 64); 
        assert((size_t)ptr % 64 == 0);
        return ptr;
    }

    void  operator delete[] (void* p)
    {
        _aligned_free(p);
    }
};

int main(int argc, char* argv[])
{
    A* arr = new A[200];
    return 0;
}

The assert ((size_t)this % 64 == 0) breaks (the modulo returns 16). It looks like it works if the struct only contains simple types though, but breaks when it contains an std container (or some other std classes).

Am I doing something wrong? Is there a way of doing this properly? (Preferably c++03 compatible, but any solution that works in VS2012 is fine).

Edit: As hinted by Shokwav, this works:

A* arr = (A*)new std::aligned_storage<sizeof(A), 64>::type[200];
// this works too actually:
//A* arr = (A*)_aligned_malloc(sizeof(A) * 200, 64);
for (int i=0; i<200; ++i)
    new (&arr[i]) A();

So it looks like it's related to the use of new[]... I'm very curious if anybody has an explanation.

1

There are 1 best solutions below

5
On

I wonder why you need such a huge alignment requirement, moreover to store a dynamic heap allocated object in the struct. But you can do this:

struct __declspec(align(64)) A
{
    unsigned char ___padding[64 - sizeof(std::vector<int>)];
    std::vector<int> v;

    void* operator new[] (size_t size)
    {
        // Make sure the buffer will fit even in the worst case
        unsigned char* ptr = (unsigned char*)malloc(size + 63);

        // Find out the next aligned position in the buffer
        unsigned char* endptr = (unsigned char*)(((intptr_t)ptr + 63) & ~63ULL);
        // Also store the misalignment in the first padding of the structure 
        unsigned char misalign = (unsigned char)(endptr - ptr);
        *endptr = misalign;
        return endptr;
    }

    void  operator delete[] (void* p)
    {
        unsigned char * ptr = (unsigned char*)p;
        // It's required to call back with the original pointer, so subtract the misalignment offset
        ptr -= *ptr;
        free(ptr);
    }
};

int main()
{
    A * a = new A[2];
    printf("%p - %p = %d\n", &a[1], &a[0], int((char*)&a[1] - (char*)&a[0]));
    return 0;
}

I did not have your align_malloc and free function, so the implementation I'm providing is doing this:

  1. It allocates larger to make sure it will fit in 64-bytes boundaries
  2. It computes the offset from the allocation to the closest 64-bytes boundary
  3. It stores the "offset" in the padding of the first structure (else I would have required a larger allocation space each time)
  4. This is used to compute back the original pointer to the free()

Outputs:

0x7fff57b1ca40 - 0x7fff57b1ca00 = 64

Warning: If there is no padding in your structure, then the scheme above will corrupt data, since I'll be storing the misalignement offset in a place that'll be overwritten by the constructor of the internal members. Remember that when you do "new X[n]", "n" has to be stored "somewhere" so when calling delete[], "n" calls to the destructors will be done. Usually, it's stored before the returned memory buffer (new will likely allocate the required size + 4 for storing the number of elements). The scheme here avoid this.

Another warning: Because C++ calls this operator with some additional padding included in the size for storing the array's number of elements, you'll might still get a "shift" in the returned pointer address for your objects. You might need to account for it. This is what the std::align does, it takes the extra space, compute the alignment like I did and return the aligned pointer. However, you can not get both done in the new[] overload, because of the "count storage" shift that happens after returning from new(). However, you can figure out the "count storage" space once by a single allocation, and adjust the offset accordingly in the new[] implementation.