I am new to python and I am trying to compute the condensed distance matrix of the elements from a dataframe column using pdist.
This is what the data looks like and I want to use the "Sequence" column :
In [90]: print(a_10)
Sequence Occurrences Size
12 FJGKFLDKFJ 4185 10
13 FJGKFLEKFJ 4074 10
15 FJGEELKJFD 3392 10
16 AFLJSFLSKD 3240 10
22 EOAIJFFEOF 2652 10
... ... ... ...
29963 ELFKAJLFKA 1 10
29975 VEOIAJSEIJ 1 10
29983 ELKSJFLSEK 1 10
29989 ESKJFSLEKF 1 10
30002 ECSKCJSOEC 1 10
[3369 rows x 3 columns]
First I reshape it:
v = a_10["Sequence"].to_numpy().reshape(-1,1)
And then I try to apply pdist:
matrix = pdist(v, "euclidean")
But I get the following error:
ValueError: could not convert string to float: 'FJGKFLDKFJ'
Does any one have a suggestion on how to overcome this? Thank you in advance.