SQL GROUP BY Results - Salesforce Marketing Cloud

1.6k Views Asked by At

I'm looking to find all duplicate records and then select all duplicates minus the oldest record from each set so that I can delete duplicates and keep one unique record.

When I run this query, I get the results I want. Leaves me with a unique email address and the oldest create date.

SELECT 
    EmailAddress,
    MIN(CreatedDate)
FROM [_ListSubscribers]
WHERE EmailAddress IN
    (
        SELECT EmailAddress
        FROM _ListSubscribers
        GROUP BY EmailAddress
        HAVING COUNT(EmailAddress) > 1
    )
GROUP BY EmailAddress

When I add SubscriberKey to the query, the results DOUBLE! Why is that? I just want to see the SubscriberKey tied to the EmailAddress I found that has the oldest date in the subquery.

SELECT 
    EmailAddress,
    SubscriberKey,
    MIN(CreatedDate)
FROM [_ListSubscribers]
WHERE EmailAddress IN
    (
        SELECT EmailAddress
        FROM _ListSubscribers
        GROUP BY EmailAddress
        HAVING COUNT(EmailAddress) > 1
    )
GROUP BY EmailAddress, SubscriberKey
2

There are 2 best solutions below

0
On BEST ANSWER

You're getting multiple records because you're grouping by SubscriberKey. You'll need to match by EmailAddress and CreatedDate. Try doing a sub query and joining it back to your original table.

select 
[_ListSubscribers].EmailAddress,
[_ListSubscribers].SubscriberKey,
[_ListSubscribers].CreatedDate,
from
(
SELECT 
    EmailAddress,
    MIN(CreatedDate) as CreatedDate
    FROM [_ListSubscribers]
    GROUP BY EmailAddress, SubscriberKey
    Having count(EmailAddress)>1
) SubTbl
inner join
[_ListSubscribers] on
[_ListSubscribers].EmailAddress = SubTbl.EmailAddress
and
[_ListSubscribers].CreatedDate = SubTbl.CreatedDate
0
On

I'm looking to find all duplicate records and then select all duplicates minus the oldest record from each set so that I can delete duplicates and keep one unique record.

Use ROW_NUMBER():

select l.*
from (select l.*,
             row_number() over (partition by EmailAddress order by CreatedDate desc) as seqnum
      from _ListSubscribers l
     ) l
where seqnum > 1;

However, if you want to delete all but the newest record, you can use:

delete from _ListSubscribers
    where CreatedDate < (select max(CreatedDate)
                         from _ListSubscribers l2
                         where l2.EmailAddress = _ListSubscribers.EmailAddress
                        );

If you want the oldest records, you would flip the logic using min() instead of max().