the Principal Component Analysis of weka returns wrong instances

263 Views Asked by At

I am using weka to do Principal Component Analysis,but the result must be wrong.

my instances are as follow:

40.4,24.7,7.2,6.1,8.3,8.7,2.442,20
25,12.7,11.2,11,12.9,20.2,3.542,9.1
13.2,3.3,3.9,4.3,4.4,5.5,0.578,3.6
22.3,6.7,5.6,3.7,6,7.4,0.176,7.3
34.3,11.8,7.1,7.1,8,8.9,1.726,27.5
35.6,12.5,16.4,16.7,22.8,29.3,3.017,26.6
22,7.8,9.9,10.2,12.6,17.6,0.847,10.6
48.4,13.4,10.9,9.9,10.9,13.9,1.772,17.8
40.6,19.1,19.8,19,29.7,39.6,2.449,35.8
24.8,8,9.8,8.9,11.9,16.2,0.789,13.7
12.5,9.7,4.2,4.2,4.6,6.5,0.874,3.9
1.8,0.6,0.7,0.7,0.8,1.1,0.056,1
32.3,13.9,9.4,8.3,9.8,13.3,2.126,17.1
38.5,9.1,11.3,9.5,12.2,16.4,1.327,11.6
26.2,10.1,5.6,15.6,7.7,30.1,0.126,25.9

my java codes are as follow:

PrincipalComponents pca = new PrincipalComponents();
pca.buildEvaluator(instances);
pca.setVarianceCovered(0.9);
instances=pca.transformedData(instances);
System.out.println(instances);

the results are as follow:

-0.76617,2.661828,-0.543741,0
-0.970913,0.436367,1.69961,0
2.881824,-0.434979,0.32666,0
2.202041,-0.118079,-0.265614,0
-0.055269,0.917633,-0.825503,0
-3.389144,-0.661234,0.756936,0
0.326235,-0.94073,0.256852,0
-1.020299,0.939242,-0.408135,0
-5.193605,-0.979272,-0.020702,0
0.337214,-0.689053,-0.018816,0
2.413215,0.213961,0.314493,0
4.426397,-0.617956,0.288353,0
-0.373545,0.837791,0.108058,0
-0.347075,-0.059153,0.119701,0
-0.470905,-1.506368,-1.788153,0

but I am sure the the correct result is as follow:

0.76617,2.661828,0.543741,0
0.970913,0.436367,-1.69961,0
-2.881824,-0.434979,-0.32666,0
-2.202041,-0.118079,0.265614,0
0.055269,0.917633,0.825503,0
3.389144,-0.661234,-0.756936,0
-0.326235,-0.94073,-0.256852,0
1.020299,0.939242,0.408135,0
5.193605,-0.979272,0.020702,0
-0.337214,-0.689053,0.018816,0
-2.413215,0.213961,-0.314493,0
-4.426397,-0.617956,-0.288353,0
0.373545,0.837791,-0.108058,0
0.347075,-0.059153,-0.119701,0
0.470905,-1.506368,1.788153,0

The sign(positive number or negative number) of the first column and the third column(the first principal component and the third principal component) is reversed.

I have search for the clue of my mistake on stackoverflow but I can not find my mistake,so dose somebody can find out is there something wrong with my code or weka code?

1

There are 1 best solutions below

2
On

Principal component analysis is based on finding the eigenvectors and eigenvalues of the datasets, and eigenvectors are not unique - or put another way, the definition of eigenvector is ambidextrous.

In PCA terms, you can negate all the scores for any principal component, as long as you also make the corresponding change to the loadings.

Where have you got the 'correct result' from? It looks as if it's a program or package that just happens to use a different convention from Weka.