I have a list of cities that have numerous incorrect spelling for the same city. One city is misspelled 18 times! I am trying to clean this up but its taking hours. Is there some algorithm that might "guess" at the valid city name for each of these misspelled ones? Some form of weighting? The data is in MySQL and I do have a table of the correct spelling as well to compare against.
Any ideas on this? A PHP example would help if possible.
Read about Levenshtein distance: http://en.wikipedia.org/wiki/Levenshtein_distance.
Find an implementation or write your own. It's not that complex.
Use it to locate near-miss spelling errors.