Convert dataset with string features into svmlight / libsvm format

416 Views Asked by At

I am trying to use sklearn.datasets.dump_svmlight_file function to convert a dataset into an svmlight file. The problem that I have is that some columns are of type string.

At this moment, my X array looks similar to this:

[['C' '2'
  'String1']
 ['String2' '1959-6' 'String3' 'SCnc'
  'Bld']
 ...
       ]

When I execute, this error appears:

ValueError: could not convert string to float: 'C'

Actual code:

qid = df['qid'].to_numpy()
y = df['relevance'].to_numpy()
X = df[df.columns.difference(['qid', 'relevance'])].to_numpy()
dump_svmlight_file(X,y,'dataset.dat',query_id=qid)

So my question is: how can I encode the strings in order to fix the error?

0

There are 0 best solutions below