How to add sequential ID based on condition SAS

219 Views Asked by At

I have the dataset with Time and Interval variable as below. I would like to add a sequential ID (Indicator) with SAS based on a condition that Interval is greater than 0.1, as follows:

Time Interval Indicator
11:40:38 0.05 .
11:40:41 0.05 .
11:40:44 0.05 .
11:40:47 0.05 .
11:40:50 0.05 .
11:42:50 2 1
11:42:53 0.05 2
11:42:56 0.05 3
11:42:59 0.05 4
11:43:02 0.05 5
11:43:05 0.05 6
11:43:08 0.05 7
11:43:18 0.16667 1
11:43:21 0.05 2
11:43:24 0.05 3
11:43:27 0.05 4
11:43:30 0.05 5
11:43:33 0.05 6

If I use the code

`data out1; set out ;
 by Time;
 retain indicator;
 if Interval > 0.1 then indicator=1;
 indicator+1;
 run;`

Indicator is not missing for the first five observations. I would like that it starts counting only when the condition is met (Interval > 0.1).

Thanks!

3

There are 3 best solutions below

1
whymath On

You can do it with a little modification:

data out1;
  set out ;
  retain indicator;
  if Interval>0.1 then indicator=0;
  if indicator^=. then indicator+1;
run; 

The summuation will start after the condition Interval>0.1 has been met, because indicator is equal to missing value before that, so indicator+1 would not be calculated.
And you need to initial indicator as 0, not 1. If indicator is equal to 0, indicator^=. will be satisfied and indicator+1 will be calculated.

0
Tom On

If you want to retain INDICATOR it cannot be on the input dataset, otherwise the SET statement will overwrite the retained value with the value read from the existing dataset.

If you want INDICATOR to start as missing when using the SUM statement then you need to explicitly say so in the RETAIN statement. Otherwise the SUM statement will cause the variable to be initialized to zero.

If looks like you only want to increment when the new variable has already been assigned at least one value.

data want;
  set have;
  retain new .;
  if interval>0.1 then new=1; 
  else if new > 0 then new+1;
run;

Results:

OBS        Time    Interval    Indicator    new

  1    11:40:38     0.05000        .         .
  2    11:40:41     0.05000        .         .
  3    11:40:44     0.05000        .         .
  4    11:40:47     0.05000        .         .
  5    11:40:50     0.05000        .         .
  6    11:42:50     2.00000        1         1
  7    11:42:53     0.05000        2         2
  8    11:42:56     0.05000        3         3
  9    11:42:59     0.05000        4         4
 10    11:43:02     0.05000        5         5
 11    11:43:05     0.05000        6         6
 12    11:43:08     0.05000        7         7
 13    11:43:18     0.16667        1         1
 14    11:43:21     0.05000        2         2
 15    11:43:24     0.05000        3         3
 16    11:43:27     0.05000        4         4
 17    11:43:30     0.05000        5         5
 18    11:43:33     0.05000        6         6
0
Richard On

For yucks, here is a one-liner of @WhyMath logic.

data want;
  set have;
  retain seq;
  seq = ifn(interval > 0.1, 1, ifn(seq, sum(seq,1), seq));
run;