SAS Hash Objects to track the history/relationships of ID changes across time

76 Views Asked by At

I have ID's as of today and the history of ID changes. I want to know what the ID most recent ID is and all the ID's associated with it at certain specified historical asof dates; and also have a count of the number of times the ID has changed.

The following code produces results for the "Want" dataset, but it is not correct across time.

data have;
attrib OldID NewID length=8 ChangeDate informat=mmddyy10. format=mmddyy10.;
 input OldID NewID ChangeDate;
 datalines;
 4 .  8/1/10
 12 . 8/1/10
 11 12 8/1/10
 3 4 7/10/10
2 3 7/1/10
 1 2 1/1/10
 10 11 1/1/10
;
 data want(keep=asof origID currID changeCount);
 attrib asof format=mmddyy10. origID currID length=8;

 declare hash roots();
 roots.defineKey('OldID'); 
roots.defineData('OldID', 'ChangeDate'); 
roots.defineDone();

declare hash changes(); 
changes.defineKey('NewID');
 changes.defineData('OldID', 'ChangeDate'); 
changes.defineDone();

 do while (not done);
 set have end=done;
 if missing(NewID) then roots.add();
 else changes.add(); 
end;
array asofs (7) _temporary_ (

 '15-MAR-2010'd
 '02-JUL-2010'd
 '15-JUL-2010'd
 '15-AUG-2010'd
 );
 declare hiter hi('roots');

 do index = 1 to dim(asofs);
 asof = asofs(index); 

do while (hi.next() eq 0);
 origID = OldID; 
currID = .;

 do changeCount = 0 by 1 while (ChangeDate <= asof);
 currID = OldID;
 if changes.find(key:OldID) ne 0 then leave;
 End;

 output; 
end; 
end; 
stop; 
run;

The dataset Want looks like this:

asof origID currID changeCount
03/15/2010 12 11 2
03/15/2010 4 2 3
07/02/2010 12 11 2
07/02/2010 4 3 2
07/15/2010 12 11 2
07/15/2010 4 4 1
08/15/2010 12 . 0
08/15/2010 4 . 0
. 12 10 2
. 4 1 3
. 12 10 2
. 4 1 3
. 12 10 2
. 4 1 3

I would like dataset Want to look something like this:

asof origID currID changeCount
03/15/2010 4 2 1
03/15/2010 12 11 1
07/02/2010 4 3 2
07/02/2010 12 11 1
07/15/2010 4 4 3
07/15/2010 12 11 1
08/15/2010 4 4 3
08/15/2010 12 12 2
1

There are 1 best solutions below

0
On

So, you need to think backwards. This seems to do about what you want, although you'd have to add some code to handle the final 2 rows - the 8/15 date is after your final date here. I imagine you can easily deal with that.

data want(keep=asof origID currID changeCount);
    attrib asof format=mmddyy10. origID currID length=8;

    declare hash roots();
    roots.defineKey('OldID'); 
    roots.defineData('OldID', 'ChangeDate'); 
    roots.defineDone();

    declare hash changes(); 
    changes.defineKey('NewID');
    changes.defineData('OldID', 'ChangeDate'); 
    changes.defineDone();

    do while (not done);
        set have end=done;
        if missing(NewID) then roots.add();
        else changes.add(); 
    end;
    array asofs (7) _temporary_ (

     '15-MAR-2010'd
     '02-JUL-2010'd
     '15-JUL-2010'd
     '15-AUG-2010'd
    );
 declare hiter hi('roots');

 *add code to deal with asofs dimensions not equal to actual contents;
 do index = 1 to dim(asofs) while (asofs[index] ne .);
     asof = asofs(index); 
    
    do while (hi.next() eq 0);
        origID = OldID; 
        currID = .;
        
         *start at -1 instead of 0 because it always increments 1x too many;
         *And, look while changedate is GREATER than, bc you are coming from the larger end;
         do changeCount = -1 by 1 while (changeDate > asof);
             currID = OldID;
             if changes.find(key:OldID) ne 0 then leave;
         End;

         output; 
    end; 
 end; 
 stop; 
run;