I am trying to do some left joins on multiple tables and facing the following issue.
Row Counts of tables
Table 1: 1.6M
Table 2: 1.7M
Table 3: 1.5M
When I am doing left Join using Table 1 and 2 and following query, I get data count as 1.8 M (acceptable):
SELECT Table1.ID1, Table1.ID2, Table2.Name, Table2.City
FROM Table1
LEFT JOIN Table2
ON Table1.ID1 = Table2.ID1
AND Table1.ID2 = Table2.ID2
AND Table1.Source_System = Table2.Source_System
;
Similarly when I am doing left Join using Table 1 and 3 and following query, I get data count as 1.9 M (acceptable):
SELECT Table1.ID1, Table1.ID2, Table3.Name, Table3.City
FROM Table1
LEFT JOIN Table3
ON Table1.ID1 = Table3.ID1
AND Table1.ID2 = Table3.ID2
AND Table1.Source_System = Table3.Source_System
;
But when I am doing left Join using Table 1, 2 and 3 and following query, I get data count as 11.9 G (ISSUE):
SELECT
Table1.ID1, Table1.ID2,
Table2.Name, Table2.City,
Table3.Name as Name1, Table3.City as City1
FROM Table1
LEFT JOIN Table2
ON Table1.ID1 = Table2.ID1
AND Table1.ID2 = Table2.ID2
AND Table1.Source_System = Table2.Source_System
LEFT JOIN Table3
ON Table1.ID1 = Table3.ID1
AND Table1.ID2 = Table3.ID2
AND Table1.Source_System = Table3.Source_System
;
When your left join is producing more records than the referenced table it should not be acceptable! that should signal warning in your join condition and data. Either you investigate those records in the table to avoid it in the first place or you would need to keep tweaking your SQL to satisfy clean join that produces exact reference table row count. otherwise, it is very common that left joining to another table with a small duplicate records will produce exponential row count as you are facing here.
Try reading these questions here to help here and here
Just to add about investigating and finding those rows, use following SQL to find in each table what rows that have same ID1, ID2 and Source_System columns
i.e. :-
Use the same for each of the tables to find those records and either add another unique condition/ aggregate the table on the joining keys or ask for data cleansing ! for those records