I have two table in postgres 9.4.
- table1 includes records from website visits.
table1 has three columns: IP(type INET and IPV6 addresses), date, event_id and receives ~ 50K records a day.
||ip||date||event_id|| |300:401:149:8000::|20231128|881948183929899|
- table2 contains geo lookup data in the form of ranges
table2 has 3 columns: start_ip, end_ip, country. There is an index set on start_ip and end_ip. It contains 300MM records. The start_ip and end_ip can go over multiple ranges.
||start_ip||end_ip||country_code|| |100:0:0:1::|600:803:29b:ffff:ffff:ffff:ffff:ffff|**| |2605:ffc1::|2606:3f:ffff:ffff:ffff:ffff:ffff:ffff|us| |2609:e400::|260b:ffff:ffff:ffff:ffff:ffff:ffff:ffff|us|
The final output needs to be a join of all data points when table1.ip is between table2.start_ip and table2.end_ip. My problem is due to the size of the dataset, and the type, my queries keep timing out. I have tried multiple join types. While cross join is the fastest on a small data set when I try to run on the full data set
simple things like the following are too slow:
select * from table1 t1
left join table2 t2 on t1.ip between t2.start_ip and t2.end_ip
I have also tried converting the first octet to bits to reduce the size of table2 before comparison but it is not that valuable. I would love to use max(start_ip) but 9.4 does not support it.