iOS - C/C++ - Speed up Integral Image calculation

1.8k Views Asked by At

I have a method which calculates an integral image (description here) commonly used in computer vision applications.

float *Integral(unsigned char *grayscaleSource, int height, int width, int widthStep)
{
    // convert the image to single channel 32f
    unsigned char *img = grayscaleSource;

    // set up variables for data access
    int step = widthStep/sizeof(float);
    uint8_t *data   = (uint8_t *)img;
    float *i_data = (float *)malloc(height * width * sizeof(float));

    // first row only
    float rs = 0.0f;
    for(int j=0; j<width; j++)
    {
        rs += (float)data[j];
        i_data[j] = rs;
    }

    // remaining cells are sum above and to the left
    for(int i=1; i<height; ++i)
    {
        rs = 0.0f;
        for(int j=0; j<width; ++j)
        {
            rs += data[i*step+j];
            i_data[i*step+j] = rs + i_data[(i-1)*step+j];
        }
    }

    // return the integral image
    return i_data;
}

I am trying to make it as fast as possible. It seems to me like this should be able to take advantage of Apple's Accelerate.framework, or perhaps ARMs neon intrinsics, but I can't see exactly how. It seems like that nested loop is potentially quite slow (for real time applications at least).

Does anyone think this is possible to speed up using any other techniques??

1

There are 1 best solutions below

0
On

You can certainly vectorize the row by row summation. That is vDSP_vadd(). The horizontal direction is vDSP_vrsum().

If you want to write your own vector code, the horizontal sum might be sped up by something like psadbw, but that is Intel. Also, take a look at prefix sum algorithms, which are famously parallelizable.