Counting all words in a column of a dataset pandas

284 Views Asked by At

I am carrying out EDA on a dataset and want to count the total number of words in a column, before and after deleting duplicates.

Here is my code:

print(train_dataset['text'].apply(lambda x: len(x.split(' '))).sum())

It is throwing this error:

AttributeError: 'float' object has no attribute 'split'
1

There are 1 best solutions below

1
On

You could try to convert column values to string type before split:

train_dataset['text'] = train_dataset['text'].astype(str)
train_dataset['text'].apply(lambda x: len(x.split())).sum()
# or
train_dataset['text'].apply(lambda x: len(str(x).split())).sum()