How to remove Pandas rows which contain non English words using Python

132 Views Asked by Todd At 18 July 2022 at 21:57

I want to remove rows (each row includes already tokenized sentence) which contains non-English word(s).
This example data set has more than three rows, I simplified the original data set for the brevity.

data = {
  "col1": [['apartment', 'expectations', 'insinuate', 'welcome'], ['très', 'réactive', 'arrangeante', 'notre','place'],['buena', 'ubicación','you']]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df)

I tried using the below code but it only works for the version above 3.7.

#only works for the version above 3.7.
df[df.col1.map(lambda x: x.isascii())] 

AttributeError: 'list' object has no attribute 'isascii'

Is there an another way to address this issue to get the below output?
Many thanks.

expected output:
                                            col1
0  [apartment, expectations, insinuate, welcome]

Original Q&A

How to remove Pandas rows which contain non English words using Python

There are 0 best solutions below

Related Questions in PYTHON-3.X

Related Questions in PANDAS

Related Questions in NON-ENGLISH

Trending Questions

Popular # Hahtags

Popular Questions