I want to take a list of Customer names and compare them to an internal database to find a high likely match and return a customer code
So I would receive a list of customers like this:
| Cx Name |
|---|
| Chicken C. |
| Water Gmbh |
| Computer ldt |
| Food, Glorious Food |
and I want to compare it to an internal database like this:
| Cx Name database | Cx Number |
|---|---|
| Tech Co. | 9123 |
| Computer LTD. | 8123 |
| Chicken Co. | 7123 |
| Water Gmbh | 6123 |
and return something like this:
| Cx Name | Cx Suggestion |
|---|---|
| Chicken C. | 7123 |
| Water Gmbh | 6123 |
| Computer ldt | 8123 |
I was thinking of using a loop and stringdist to compare each cx name to the database and return the highest value score if it scores above a 90% match. But I'm not sure how to best approach this and my loop skills are bit rusty in R.
This is obviously a very crude example. Typically I would do a bit of data cleaning before hand and I would be working with about 500 different customers matched against a database of 5000 - 10000 customers names.
You could try something like this:
You will need to play around with the threshold that would work for you. I set it to 0.4 here, but you could go lower if things need to fit better. I also recomeng looking into
fuzzy_join.Data: