PostgreSQL: Address matching using fuzzymatch from two tables

1.1k Views Asked by At

What I want to do;

I have two tables with two address columns , both stored as text I want to create a view returning the matching rows.

What I've tried;

I've created and index on both columns and tables as below;

CREATE INDEX idx_table1_fulladdress ON table1 (LOWER(fulladdress_ppd));

Then run the following;

CREATE OR REPLACE VIEW view_adresscheck AS
SELECT 
--from table1
    table1.postcode,
    table1.fulladdress_ppd,
--from table2
    table2.epc_postcode,
    table2.fulladdress_epc 
FROM
    table1,
    table2
WHERE 
    table1.postcode = table2.epc_postcode
AND 
    table2.fulladdress_epc = table1.fulladdress_ppd ::text;

What hasn't worked The above returned fewer records than I know to be there. On inspection this is because the address format is not consistent between the two tables ie.

table1.fulladdress_ppd = Flat 2d The building the street
table2.fulladdress_epc = Flat 2/d The building the street, the town

The address isn't consistently formatted within the table either ie in table not all addresses include town so I can't use regex or trim to bulk clean.

I've then seen the fuzzystrmatch module in postgres and this sounds like it might resolve my problem.

Question Which of Soundex, Levenshtein, Metaphone is most appropriate. Most records are in English by some place names are Gaelic running on 9.6.

1

There are 1 best solutions below

5
Gurmokh On

talking from experience of matching address from different sources. What you could do is index each address. Regardless of formatting the above address would return the same number. You then match on these indexes.

eg in the UK you have what are called UDPRN numbers for each postal address in the country.