How to match household data from two different data sources based on geo and demographic info in r?

62 Views Asked by At

I have household data from two different data source. Sample data looks like this:

df1: | hh_id_1 | geo_location | hh_size | hh_race | hh_income | | ------- | ------------ | ------- | ------- | --------- | | 111 | 12345 | 1 | white | 100k-149k | | 222 | 12387 | 2 | black | 75k-99k | | 333 | 12356 | 3 | asian | 100k-149k | | 444 | 20534 | 4 | hispanic| 50k-74k |

df2: | hh_id_2 | geo_location | hh_size | hh_race | hh_income | | ------- | ------------ | ------- | ------- | --------- | | aaa | 12345 | 3 | white | 100k-149k | | bbb | 12387 | 4 | black | 75k-99k | | ccc | 22309 | 2 | other | 50k-74k | | ddd | 21687 | 5 | hispanic| 50k-74k |

df1 and df2 have some common columns as shown above and they have additional different features that are useful for following analysis. I would like to match the households from df1 and df2. Each matched pair should be similar enough to be claimed as the same household. They should be from the same location, similar household size, race, income etc. I don't know what would be the best matching methodology to use. Any packages available or any methodologies?

0

There are 0 best solutions below