I am trying to use sklearn.datasets.dump_svmlight_file function to convert a dataset into an svmlight file. The problem that I have is that some columns are of type string.
At this moment, my X
array looks similar to this:
[['C' '2'
'String1']
['String2' '1959-6' 'String3' 'SCnc'
'Bld']
...
]
When I execute, this error appears:
ValueError: could not convert string to float: 'C'
Actual code:
qid = df['qid'].to_numpy()
y = df['relevance'].to_numpy()
X = df[df.columns.difference(['qid', 'relevance'])].to_numpy()
dump_svmlight_file(X,y,'dataset.dat',query_id=qid)
So my question is: how can I encode the strings in order to fix the error?