rolling up groups in a matrix

105 Views Asked by At

Here is the data I have, I use proc tabulate to present it how it is presented in excel, and to make the visualization easier. The goal is to make sure groups strictly below the diagonal (i know it's a rectangle, the (1,1) (2,2)...(7,7) "diagonal") to roll up the column until it hits the diagonal or makes a group size of at least 75.

      1   2   3    4    5    6   7  (month variable)
(age)
  1  80  90  100  110  122  141 88
  2  80  90  100  110   56   14 88
  3  80  90   87   45   12   41 88
  4  24  90  100  110   22  141 88
  5  0   1    0    0    0    0   2
  6  0   1    0    0    0    0   6
  7  0   1    0    0    0    0   2
  8  0   1    0    0    0    0  11

Ive already used if/thens to regroup certain data values, but I need a general way to do it for other sets. Thanks in advance

desired results

   1  2   3    4    5    6   7  (month variable)
(age)
  1  80  90  100  110  122  141 88
  2  80  90  100  110   56   14 88
  3  104 90   87   45   12   41 88
  4  0   94  100  110   22  141 88
  5  0   0    0    0    0    0   2
  6  0   0    0    0    0    0   6
  7  0   0    0    0    0    0   13
  8  0   0    0    0    0    0   0
1

There are 1 best solutions below

0
On

Mock up some categorical data for some patients who have to be counted

data mock;
  do patient_id = 1 to 2500;
    month = ceil(7*ranuni(123));
    age = ceil(8*ranuni(123));
    output;
  end;
  stop;
run;

Create a tabulation of counts (N) similar to the one shown in the question:

options missing='0';

proc tabulate data=mock;
  class month age;
  table age,month*n=''/nocellmerge;
run;

For each month get the sub-diagonal patient count

proc sql;
/*  create table subdiagonal_column as */
  select month, count(*) as subdiag_col_freq
  from mock
  where age > month
  group by month;

For each row get the pre-diagonal patient count

/*  create table prediagonal_row as */
  select age, count(*) as prediag_row_freq
  from mock
  where age > month
  group by age;

other sets can be tricky if the categorical values are not +1 monotonic. To do a similar process for non-montonic categorical values you will need to create surrogate variables that are +1 monotonic. For example:

data mock;
  do item_id = 1 to 2500;
    pet = scan ('cat dog snake rabbit hamster', ceil(5*ranuni(123)));
    place = scan ('farm home condo apt tower wild', ceil(6*ranuni(123)));
    output;
  end;
run;

proc tabulate data=mock;
  class pet place;
  table pet,place*n=''/nocellmerge;
run;

proc sql;
  create table unq_pets as select distinct pet from mock;
  create table unq_places as select distinct place from mock;

data pets;
  set unq_pets;
  pet_num = _n_;
run;

data places;
  set unq_places;
  place_num = _n_;
run;

proc sql;
  select distinct place_num, mock.place, count(*) as subdiag_col_freq
  from mock 
  join pets on pets.pet = mock.pet
  join places on places.place = mock.place
  where pet_num > place_num
  group by place_num
  order by place_num
  ;