I'm looking for a python library that could help me aligning user input to a controlled vocabulary that I've defined myself (i.e. I'm not trying to do a spell check)
Example :
controlled_voc = ['cat', 'dog', 'horse']
user_input = ['cats', 'dogo', 'orse'] #plural, similar form, spelling mistake
user_to_controlled = {'cats':'cat', 'dogo':'dog', 'orse':'horse'}
Does anyone know something that could help me? I've already looked into the classic NLP libs (NLTK, Spacy) but didn't find much.
Thanks in advance
Sounds like a "typos" problem, not really about NLP. Have a look at
pyspellchecker."It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results."
The currently supported dictionaries are:
source: https://pypi.org/project/pyspellchecker/