I would like to save my dataframe in txt format with specific delimiters (libsvm format), to look like this:
1 qid:0 0:1.465648768921554 1:-0.2257763004865357 2:0.06752820468792384 3:-1.424748186213457 4:-0.5443827245251827
1 qid:0 0:1.465648768921554 1:-0.2257763004865357 2:0.06752820468792384 3:-1.424748186213457 4:-0.5443827245251827
2 qid:0 0:0.7384665799954104 1:0.1713682811899705 2:-0.1156482823882405 3:-0.3011036955892888 4:-1.478521990367427
Notice that first 2 columns are separated by space, and then separated by colons, where the integer before the colon is an identifier of that column.
This is my current dataset:
data = {'label': [2,3,2],
'qid': ['qid:0', 'qid:1','qid:0'],
'0': [0, 0, 0],
'0': [0.4967, 0.4967,0.4967],
'1': [1,1,1],
'1': [0.4967, 0.4967,0.4967],
'2': [2,2,2],
'2': [0.4967, 0.4967,0.4967],
'3': [3,3,3],
'2': [0.4967, 0.4967,0.4967],
'4': [4,4,4]}
df = pd.DataFrame(data)
Is there a way to save this as txt to match that format exactly?
For context, my machine learning model was trained on a dataset in this specific txt format, and I need to match it to use it for my own dataset.
A similar question was answered here, there is a specific sklearn method for this: dump_svmlight_file.
For this particular case, you need to add quid and remove the modifications to get the the qid to be just numeric integers and remove the additional integer columns: