Generating Data Set in Matlab

1.6k Views Asked by At

I wanted to ask how to generate a data set in Matlab. I need it to test Feature Selection Algorithms on high dimensional data... The data set should be synthetic, multivariate and contain INTERACTING features. Synthetic data sets like the MONKS problem is available on http://archive.ics.uci.edu/ml/datasets/MONK%27s+Problems .... unfortunately I have no clue how to visualize/generate and modify the data according to my need. The goal is to run an algorithm which detects interacting features. I will be very thankful for a kind reply.

1

There are 1 best solutions below

0
On

I'm not sure this is what you are looking for, but if I needed to do this, I would start by generating anonymous functions and generic variable names that I could apply randomly within a dataset.

For example, you could generate a dataset:

myData = rand(100,6);

and create a few functions which include interdependencies

interact = @(x) x*x;
interact2 = @(x) x*(x-1);

then create a random logical distribution y = round(rand(100,1)); %(100 rows of random 0's or 1's)

go through the dataset and use the interact function on only rows where y is true dataset(y == 1,:) = interact(dataset(y==1,:));

repeat the above with the other interaction functions you define if you desire. it would probably be useful to do this so that you can avoid row dependencies (see below) so generating a few datasets could be in order, i.e. dataset2(y==1,:) = interact2(dataset(y==1,:));

A similar approach might be taken with variables (in the example set it shows some categorical variables).

myVariable = repmat('data', 100, 1);
listofvariables = genvarname(cellstr(myVariable));

y = round(rand(100,1));     % logical index for the data

randomly select a generic variable to repeat applyvar = round(rand(1,1)*100); selectedVariable = listofvariables(applyvar);

replace indices of the variable list with your repeated variable listofvariables(y == 1) = selectedVariable;

put together the dataset(s) in some order of your choosing

    [cellstr(num2str(dataset(:,1))) listofvariables cellstr(num2str(dataset(:,2)) cellstr(num2str(dataset2(:,2))]