DeDuping millions of rows using LOAD DATA INFILE or other solution

134 Views Asked by At

Good day to all. I know this topic comes up a lot and apologize for any redundancy but I need you MYSQL gurus.

I have tried several solutions that have been posted here to no avail. The solutions either take too long and/or more likely I just don't have the knowledge base to get this task done on deadline and need guidance. Most common answers that seem to make sense is to use LOAD DATA INFILE from a temporary table but should I break up the file and do this into chunks, use looping scripts? I know this would help a lot of newbies around here for complete example and explanation and perhaps put this to bed once and for all for you guys trying to help. My stump may be coming from using parameters with IGNORE or REPLACE. Here are the two tables:

Table1 has 27 million rows and duplicates based on individual with more than one ailment like so:

first|last|zip|gender|address|city|state|zip|zip4|birthdate|ailment
-------------------------------------------------------------------
Jay   R    123  M     2 Help  LA   CA    123 123  8/23/86   21
Tom   L    123  M     2 Help  LA   CA    123 123  8/23/86   1

Table2 has 3 million rows and two null columns but no internal duplicates by ailment.

first|last|zip|gender|address|city|state|zip|zip4|ailment|birthdate
-------------------------------------------------------------------
Jay   R    123  M     2 Help  LA   CA   123 NULL  NULL     8/23/86

I am looking to dupe based on first, last and zip code. I'll lose a little data to process this faster if it helps. I am more than grateful for this assistance.

0

There are 0 best solutions below