How to use pyspellchecker to autocorrect spelling errors in a pandas column?

580 Views Asked by Tobes At 23 January 2023 at 22:52

I have the following dataframe:

df = pd.DataFrame({'id':[1,2,3],'text':['a foox juumped ovr the gate','teh car wsa bllue','why so srious']})

I would like to generate a new column with the fixed spelling errors using the pyspellchecker library.

I have tried the following but it did not correct any spelling errors:

import pandas as pd
from spellchecker import SpellChecker

spell = SpellChecker()

def correct_spelling(word):
    corrected_word = spell.correction(word)
    if corrected_word is not None:
        return corrected_word
    else:
        return word

df['corrected_text'] = df['text'].apply(correct_spelling)

Below is a dataframe for what the expected output should look like

pd.DataFrame({'id':[1,2,3],'text':['a foox juumped ovr the gate','teh car wsa bllue','why so srious'],
              'corrected_text':['a fox jumped over the gate','the car was blue','why so serious']})

Original Q&A

There are 2 best solutions below

Jason Baker On 25 January 2023 at 04:23

I don't know anything about this package (how to fix accuracy) but you can split the strings in each row into a list and then iterate over a list of lists. This example uses a list comprehension:

df["text"] = [[spell.correction(word) for word in row] for row in df["text"].str.split(" ").to_list()]
df["text"] = df["text"].apply(lambda x: " ".join(x))

Output (As you can see you would need to work on the accuracy):

   id                       text
0   1  a food jumped or the gate
1   2           the car was blue
2   3             why so serious

Joep On 03 March 2023 at 09:36

The accuracy is oké. Spellchecker can't read, only determine words that aren't spelled right. Spellchecker uses Levenhsteins method to determine the 'correct' word, based on the amount of corrections needed to correct the word. Foox is one-step away from fox but also from food. To 'solve' this problem, spellchecker uses a word-frequency list. If food has a higher frequency than fox, spellchecker will autocorrect to the first, which is probably the case. Constructing your own spellchecker dictionary with words common to your use will certainly improve the results.

How to use pyspellchecker to autocorrect spelling errors in a pandas column?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SPELL-CHECKING

Related Questions in AUTOCORRECT

Related Questions in PYSPELLCHECKER

Trending Questions

Popular # Hahtags

Popular Questions