I'm looking at a paper named "Shape Based Image Retrieval Using Generic Fourier Descriptors", but only have rudimentary knowledge of Fourier Descriptors. I am attempting to implement the algorithm on page 12 of the paper, and have some results which I can't really make too much sense out of.
If I create an small image, take calculate the FD for the image, and compare the FD to the same image which has been translated by a single pixel in the x and y directions, the descriptor is completely different, except for the first entry - which is exactly the same. Firstly, a question is, is should these descriptors be exactly the same (as the descriptor is apparently scale, rotation, and translation invariant) between the two images?
Secondly, in the paper, it mentions that descriptors of two separate images are compared by a simple Euclidean distance - therefore, by taking the Euclidean distance between the two descriptors mentioned above, the Euclidean distance would apparently be 0.
I quickly put together some Javascript code to test out the algorithm, which is below.
Does anybody have any input, ideas, ways to move forward?
Thanks, Paul
var iShape = [
0, 0, 0, 0, 0,
0, 0, 255, 0, 0,
0, 255, 255, 255, 0,
0, 0, 255, 0, 0,
0, 0, 0, 0, 0
];
var ImageWidth = 5, ImageHeight = 5, MaxRFreq = 5, MaxAFreq = 5;
// Calculate centroid
var cX = 0, cY = 0, pCount = 0;
for (x = 0; x < ImageWidth; x++) {
for (y = 0; y < ImageHeight; y++) {
if (iShape[y * ImageWidth + x]) {
cX += x;
cY += y;
pCount++;
}
}
}
cX = cX / pCount;
cY = cY / pCount;
console.log("cX = " + cX + ", cY = " + cY);
// Calculate the maximum radius
var maxR = 0;
for (x = 0; x < ImageWidth; x++) {
for (y = 0; y < ImageHeight; y++) {
if (iShape[y * ImageWidth + x]) {
var r = Math.sqrt(Math.pow(x - cX, 2) + Math.pow(y - cY, 2));
if (r > maxR) {
maxR = r;
}
}
}
}
// Initialise real / imaginary table
var i;
var FR = [ ];
var FI = [ ];
for (r = 0; r < (MaxRFreq); r++) {
var rRow = [ ];
FR.push(rRow);
var aRow = [ ];
FI.push(aRow);
for (a = 0; a < (MaxAFreq); a++) {
rRow.push(0.0);
aRow.push(0.0);
}
}
var rFreq, aFreq, x, y;
for (rFreq = 0; rFreq < MaxRFreq; rFreq++) {
for (aFreq = 0; aFreq < MaxAFreq; aFreq++) {
for (x = 0; x < ImageWidth; x++) {
for (y = 0; y < ImageHeight; y++) {
var radius = Math.sqrt(Math.pow(x - maxR, 2) +
Math.pow(y - maxR, 2));
var theta = Math.atan2(y - maxR, x - maxR);
if (theta < 0.0) {
theta += (2 * Math.PI);
}
var iPixel = iShape[y * ImageWidth + x];
FR[rFreq][aFreq] += iPixel * Math.cos(2 * Math.PI * rFreq *
(radius / maxR) + aFreq * theta);
FI[rFreq][aFreq] -= iPixel * Math.sin(2 * Math.PI * rFreq *
(radius / maxR) + aFreq * theta);
}
}
}
}
// Initialise fourier descriptor table
var FD = [ ];
for (i = 0; i < (MaxRFreq * MaxAFreq); i++) {
FD.push(0.0);
}
// Calculate the fourier descriptor
for (rFreq = 0; rFreq < MaxRFreq; rFreq++) {
for (aFreq = 0; aFreq < MaxAFreq; aFreq++) {
if (rFreq == 0 && aFreq == 0) {
FD[0] = Math.sqrt(Math.pow(FR[0][0], 2) + Math.pow(FR[0][0], 2) /
(Math.PI * maxR * maxR));
} else {
FD[rFreq * MaxAFreq + aFreq] = Math.sqrt(Math.pow(FR[rFreq][aFreq], 2) +
Math.pow(FI[rFreq][aFreq], 2) / FD[0]);
}
}
}
for (i = 0; i < (MaxRFreq * MaxAFreq); i++) {
console.log(FD[i]);
}
There are three separate normalization techniques applied here in order to make the final descriptor invariant to 1) translation and 2) scale 3) rotation.
For the translation invariance part you need to find the centroid of the shape and calculate the vector of every contour point having the centroid as the origin. This is done by substracting the x and y coordinate of the centroid from each point's coordinates, respectively. So in your code the radius and theta of each point should be computes as follows:
For the scale invariance part you need to find the maximum magnitute(or radius as you say) of every vector (already normalized for translation invariance) and divide the magnitude of each point by the maximum magnitude value. An alternative way of achieving this is to divide every fourier coefficient with the zero-frequency coefficient (first coefficient) as the scale information is represented there. As I can see in you code and in the paper, this is implemented according to the second way I described.
Finally, the rotation invariance is achieved by only keeping the magnitude of the fourier coefficients as you can see in step 6 of the paper's pseudo-code.
In addition to all these, keep in mind that in order to apply the eucidean distance for the descriptor comparison, the length of the descriptor for every shape must be the same. In FFT, the number of the final coefficients depends on the number of the contour points of the shape. The solution I have found to this is to interpolate between points in order to reach a fixed number of points for every shape.
Hope I helped, Lazaros