Create train and test variables from loaded arff file

827 Views Asked by At

I want perform multilabel classification. A have a dataset in arff format which I load. However I don't now how convert import data to X and y vectors in order to apply sklearn/train_test_split.

How can I get X and y?

data, meta = scipy.io.arff.loadarff('../yeast-train.arff')
df = pd.DataFrame(data)

#Get X, y
X, y = ??? <---

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
1

There are 1 best solutions below

0
On

Ok. Its a multilabel data in which features are in the columns Att1, Att2, Att3.... Att20 and targets are in the columns Class1, Class2, .... Class14.

So you need to use those columns for getting the X and y. Do it like this:

# Fill the .... with all other column names
feature_cols = ['Att1', 'Att2', 'Att3', 'Att4', 'Att5' ....   'Att20']
target_cols = ['Class1', 'Class2', 'Class3', 'Class4', ....   'Class14']

X, y = df[feature_cols], df[target_cols]