KernelPCA explained variance is (99-100)% for all the features of the dataset

90 Views Asked by At

I am trying to perform PCA+LDA on this dataset Structural Protein Sequences.

The problem is that every feature has explained variance at 99-100 % and to keep 95% of the information only 1 principal component is needed.

X_mean = np.mean(X_train, axis=0)

cov_mat = (X_train - X_mean).T.dot((X_train - X_mean)) / (X_train.shape\[0\]-1)

eig_vals, eig_vecs = np.linalg.eig(cov_mat)

total = sum(eig_vals)
exp_var = \[(i /total)\*100 for i in sorted(eig_vals,reverse=True)\]
sum_exp_var = np.cumsum(exp_var)

sum_exp_var

Output

array([ 99.99891234,  99.99947164,  99.99991711,  99.99999955,
        99.99999999, 100.        , 100.        , 100.        ,
       100.        , 100.        , 100.        , 100.        ,
       100.        , 100.        ])

I am trying to reduce the dimensionality from 15 to maybe 10 features.

The dataset's categorical features are encoded with ordinalencoder() and scaled with standardscaler()

Any ideas on why every feature is so significant ? Are there datasets that KernelPCA can not yield results ?

0

There are 0 best solutions below