"could not convert string to float" when using pdist

193 Views Asked by At

I am new to python and I am trying to compute the condensed distance matrix of the elements from a dataframe column using pdist.

This is what the data looks like and I want to use the "Sequence" column :

In [90]: print(a_10)
        Sequence  Occurrences  Size
12     FJGKFLDKFJ         4185    10
13     FJGKFLEKFJ         4074    10
15     FJGEELKJFD         3392    10
16     AFLJSFLSKD         3240    10
22     EOAIJFFEOF         2652    10
...           ...          ...   ...
29963  ELFKAJLFKA            1    10
29975  VEOIAJSEIJ            1    10
29983  ELKSJFLSEK            1    10
29989  ESKJFSLEKF            1    10
30002  ECSKCJSOEC            1    10

[3369 rows x 3 columns]

First I reshape it:

v = a_10["Sequence"].to_numpy().reshape(-1,1)

And then I try to apply pdist:

matrix = pdist(v, "euclidean")

But I get the following error:

ValueError: could not convert string to float: 'FJGKFLDKFJ'

Does any one have a suggestion on how to overcome this? Thank you in advance.

0

There are 0 best solutions below