Dimension Reduction

3.7k Views Asked by At

I'm trying to reduce a high-dimension dataset to 2-D. However, I don't have access to the whole dataset upfront. So, I'd like to generate a function that takes an N-dimensional vector and returns a 2-dimensional vector, such that if I give it to vectors that are close in N-dimensional space, the results are close in 2-dimensional space.

I thought SVD was the answer I needed, but I can't make it work.

For simplicity, let N=3 and suppose I have 15 datapoints. If I have all the data upfront in a 15x3 matrix X, then:

[U, S, V] = svd(X);
s = S; %s is a the reduced version of S, since matlab is case-sensitive.
s(3:end,3:end)=0;
Y=U*s;
Y=Y(1:2,:);

does what I want. But suppose I get a new datapoint, A, a 1x3 vector. Is there a way to use U, S, or V to turn A into the appropriate 1x2 vector?

If SVD is a lost cause, can someone tell me what I should be doing instead?

Note: This is Matlab code, but I don't care if the answer is C, Java, or just math. If you can't read Matlab, ask and I'll clarify.

3

There are 3 best solutions below

5
On BEST ANSWER

SVD is a fine approach (probably). LSA (Latent Semantic Analysis) is based around it, and has basically the same dimensionality approach. I've talked about that (at length) at: lsa-latent-semantic-analysis-how-to-code-it-in-php or check out the LSA tag here on SO.

I realize it's an incomplete answer. Holler if you want more help!

0
On
% generate some random data (each row is a d-dimensional datapoint)
%data = rand(200, 4);
load fisheriris
data = meas;        % 150 instances of 4-dim

% center data
X = bsxfun(@minus, data, mean(data));

% SVD
[U S V] = svd(X, 'econ');       % X = U*S*V''

% lets keep k-components so that 95% of the data variance is explained
variances = diag(S).^2 / (size(X,1)-1);
varExplained = 100 * variances./sum(variances);
index = 1+sum(~(cumsum(varExplained)>95));

% projected data = X*V = U*S
newX = X * V(:,1:index);
biplot(V(:,1:index), 'scores',newX, 'varlabels',{'d1' 'd2' 'd3' 'd4'});

% mapping function (x is a row vector, or a matrix with multiple rows vectors)
mapFunc = @(x) x * V(:,1:index);
mapFunc([1 2 3 4])
0
On

I don't think there's a built-in way to update an existing SVD within Matlab. I google'd for "SVD update" and found this paper among the many results.