machine learning algorithm for spelling check

4.2k Views Asked by rohit At 22 August 2013 at 07:59

I have a list of medicine names(regular_list) and a list of new names(new_list).I want to check whether the names in the new_list are already present in the regular_list or not.The issue is that the names new_list could have some typo errors and I want those name to be considered as a match to the regular list. I know that using stringdist is a solution to the problem but I need a machine learning algorithm

Original Q&A

There are 1 best solutions below

lejlot On 22 August 2013 at 08:36

As it was already mentioned here machine learning to overcome typo errors , machine learning tools are too much for such task, but the simplest possibility would be to merge those approaches.

On one hand, you can compute the edit distance between given word x and each of the dictionary words d_i. Additionaly, you can traing per-word classifier

c(d_i, distance(x,d_i))

returning True (class 1) if a given edit distance has been learned to be sufficient to consider x a missspelled version of d_i. This can give you more general model then not using machine learning, as you can have different thresholds for each dictionary word (some words are more often misspelled then others), but obviously, you have to prepare a training set in form of (misspelled_word, correct_one) (and add also (correct_one, correct_one).

You can use any type of binary classifier for such task, which can work on "real" input data.

machine learning algorithm for spelling check

There are 1 best solutions below

Related Questions in TEXT

Related Questions in MACHINE-LEARNING

Related Questions in STRINGDIST

Trending Questions

Popular # Hahtags

Popular Questions