Distinguish ID from start & end datetime

82 Views Asked by At

The first table contains Key-value & time like below

Table:Time_Stamp

Second Table contains each IDs have its start and end date.

Table:Time_Table

I'd like to find out ID for each row from time_stamp.

Expected Result

There are fixed numbers of Categories. but there are a lot of ID.

Would you help me how to write a SQL query? (Any SQL Style is fine. I can convert that. PROC SQL in SAS compatible would be better)

2

There are 2 best solutions below

1
On BEST ANSWER

If you're doing this in SAS, you're better off using a format. Formats have the advantage of taking start/end ranges, and are REALLY fast - something around o(1) time if I recall correctly. This doesn't require sorting the larger dataset (and even could avoid sorting the smaller dataset if that was an issue), which most of the SQL solutions probably will unless they can hold the smaller dataset in memory (as a hash table).

The first two data steps just create your data above, the format_two data step is the first one that does anything new.

If there are more categories, as long as they're alpha and not numeric this should still work fine; the only difference you'd want to change is the if _n_ le 2 should have the 2 equal (total number of categories).

data time_Stamp;   *Making up the test dataset;
  category='A';
  do value=1 to 6;
    time = intnx('HOUR','01NOV2014:00:00:00'dt,value-1);
    output;
  end;
  category='B';
  do value = 7 to 12;
    time = intnx('HOUR','01NOV2014:00:00:00'dt,value-4);
    output;
  end;
run;

data time_table;    *Making up the ID dataset;
  informat start_time end_time datetime18.;
  input id category $ start_time end_time;
  datalines;
  1 A 01NOV2014:00:00:00 01NOV2014:03:00:00
  1 B 01NOV2014:00:03:00 01NOV2014:06:00:00
  2 A 01NOV2014:03:00:00 01NOV2014:06:00:00
  2 B 01NOV2014:06:00:00 01NOV2014:09:00:00
  ;
quit;


*This restructures time_table into the needed structure for a format lookup dataset;
data format_two;
  set time_table;
  fmtname=cats('KEYFMT',category);   *This is how we handle A/B - different formats.  If it were numeric would need to end with 'F'.;
  start=start_time;
  end=end_time;
  label=id;
  eexcl='Y';         *This makes it exclusive of the end value, so 03:00 goes with the latter ID and not the former.;
  hlo=' ';
  output;
  if _n_ le 2 then do;  *This allows it to return missing if the ID is not found. ;
                        *le 2 is because we want one for each category - if more categories, needs to be hifgher;
    hlo='o';
    label=' ';
    call missing(of start end);
    output;
  end;
run;


*Have to sort to group formats together, but at least this is the small dataset;
*If even this is a time concern, this could be done differently (make 2 different datasets above);
proc sort data=format_two;
  by fmtname;
run;

*Import the format lookups;
proc format cntlin=format_two;
quit;

*Apply using PUTN which allows specifying a format at runtime;
data table_one_ids;
  set time_stamp;
  id = putn(time,cats('KEYFMT',category));
run;
1
On
SELECT        Time_stamp.Category, Time_stamp.Time, Time_stamp.Value, Time_Table.ID
FROM            Time_stamp INNER JOIN
                         Time_Table 
ON Time_stamp.Category = Time_Table.Category 
  AND Time_stamp.Time BETWEEN Time_Table.Start_time AND DATEADD(SS,-1,Time_Table.End_time)
ORDER BY Time_stamp.Category,TIME