SAS: If first row equals next row, return only first row until last

52 Views Asked by At

I'm trying to return unique emails for specific IDs. Some IDs have multiple emails but sometimes there are duplicates. I'm thinking a loop might do it but I can't seem to figure it out.

HAVE
ID  ACCOUNT_EMAIL
114 [email protected]
114 [email protected]
114 [email protected]
114 .

WANT
ID  ACCOUNT_EMAIL
114 [email protected]
114 [email protected]
CREATE TABLE test AS 
SELECT DISTINCT
ID
,UPPER(ACCOUNTEMAILADDRESS) AS Account_Email
FROM DATABASE
;QUIT;

PROC SORT DATA=test;
BY ID
descending Account_Email;
Run;

DATA test;
SET test;
BY ID descending Account_Email;
IF last.Account_Email AND Account_Email = '' THEN dupIND = "Y";
ELSE dupIND = "N";
;RUN;
2

There are 2 best solutions below

0
On

If the data is not sorted you can use a hash object to track observed keys (id+email)

data want;
  set have;
  uc_email = upcase(email);
  if 0 then do;
    declare hash h();
    h.defineKey('id', 'uc_email')
    h.defineDone();
  end;

  if h.check() ne 0 then do;
    h.add();
    status = '1st';
  end;
  else 
    status = 'dup';
  
  drop uc_email;
run;
2
On

This is more clearly stated as find unique entries in a table.

Three common ways are proc sort, data step and SQL using Distinct.

Sort is one of the easiest, use the NODUPKEY option.

proc sort data=have out=want nodupkey;
by id account_email;
run;

Data step requires a sort anyways.

proc sort data=have;
by id account_email;
run;

data want;
set have;
by id account_email;
if first.account_email;
run;

SQL distinct

proc sql;
create table want as 
select distinct id, account_email
from have;
quit;