Is there a way to search for a value in a dataframe column using FuzzyWuzzy
or similar library?
I'm trying to find a value in one column that corresponds to the value in another while taking fuzzy matching into account. So
So for example, if I have State Names in one column and State Codes in another, how would I find the state code for Florida, which is FL while catering for abbreviations like "Flor"?
So in other words, I want to find a match for a State Name corresponding to "Flor" and get the corresponding State Code "FL".
Any help is greatly appreciated.
If the abbreviations are all prefixes, you can use the
.startswith()
string method against either the short or long version of the state.However, if you have more complex abbreviations,
difflib.get_close_matches
will probably do what you want!You will probably want to try/except
IndexError
around reading the first member of the returned list from difflib and possibly tweak the cutoff to get less false matches with close states (perhaps offer all the states as possibilities to some user or require more letters for close states).You may also see the best results combining the two; testing prefixes first before trying the fuzzy match.
Putting it all together
Beware of states which start with 'New' or 'M' (and probably others), which are all pretty close and will probably want special handling. Testing will do wonders here.