Folowing snippet is from OpenCV find_obj.cpp which is demo for using SURF,
double
compareSURFDescriptors( const float* d1, const float* d2, double best, int length )
{
double total_cost = 0;
assert( length % 4 == 0 );
int i;
for( i = 0; i best )
break;
}
return total_cost;
}
As far as I can tell it checking the euclidian distance, what I do not understand is why is it doing it in groups of 4? Why not calculate the whole thing at once?
Usually things like this are done for making SSE optimizations possible. SSE registers are 128 bits long and can contain 4 floats, so you can do the 4 subtractions using one instruction, parallelly.
Another upside: you have to check the loop counter only after every fourth difference. That makes the code faster even if the compiler doesn't use the opportunity to generate SSE code. For example, VS2008 didn't, not even with -O2: