I have a dataset of this kind, this dataset contains some missing values (I'm representing them with X).
ID VA VB ... Vn | ID VA VB ... Vn |
1 a1 b1 ... n1 | 1 a1 X ... n1 |
2 a2 b2 ... n2 |=== 2 X b2 ... X |
3 a3 b3 ... n3 |=== 3 a3 b3 ... n3 |
.................. | .................. |
N aN bN ... nN | N X bN ... nN |
I want to add observations by ID using only one variable column, I will call that variable VAR: something like an inverted proc format by ID; var VAR;.
ID VAR
1 a1
..
1 n1
2 b2
..
3 a3
3 b3
..
3 n3
..
N bN
..
N nN
so I tried to split OLDdataset in different datasets (NEWa NEWb ...NEWn) where, in each dataset we have all the not-missing observations stored in a column called VAR. Then I will merge NEWa NEWb ... NEWn in NEWdataset and I will apply a proc sort for restoring the order by ID.
The problem arised when I realized that "n" is not known prior the analysis 'cause I want to setup a generalized code that won't work for only one dataset, and VA VB VN are the result of a proc format step.
If n is knows I would use something like this:
data NEWa NEWb NEWc;
set OLDdataset;
array try[3] VA VB VC ;
if try[1] ne '.' then output NEWa;
if try[1] ne '.' then output NEWb;
if try[1] ne '.' then output NEWc;
run;
But now i need some iterative naming system that maybe could sounds like:
data NEW_i; <-------- "i" must be assigned
set OLDdataset;
array try[*] V: ;
do i=1 dim(try);
if try[i] ne '.' then output NEW_i; <---- "i" must be assigned
run;
Hope this sounds clear. Any hints? Thanks.
Try a proc transpose instead, you can delete the missing in a second step or add a where clause to the out data set. If you want to create subsets after that based on ID separate them out in a data step.