Python : string to controlled vocabulary

77 Views Asked by At

I'm looking for a python library that could help me aligning user input to a controlled vocabulary that I've defined myself (i.e. I'm not trying to do a spell check)

Example :

controlled_voc = ['cat', 'dog', 'horse']
user_input = ['cats', 'dogo', 'orse'] #plural, similar form, spelling mistake
user_to_controlled = {'cats':'cat', 'dogo':'dog', 'orse':'horse'}

Does anyone know something that could help me? I've already looked into the classic NLP libs (NLTK, Spacy) but didn't find much.

Thanks in advance

1

There are 1 best solutions below

0
BlackMath On

Sounds like a "typos" problem, not really about NLP. Have a look at pyspellchecker.

pip install pyspellchecker

"It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results."

The currently supported dictionaries are:

  1. English
  2. Spanish
  3. French
  4. Portuguese
  5. German
  6. Russian

source: https://pypi.org/project/pyspellchecker/