How to replace a whole observation working on a SAS file through IML

406 Views Asked by At

Long time user, first time poster. I am very new to IML and have played around in R before. I am currently trying to create an adjacency list for easier computation of networks in SAS/IML. The file i am working on is huge. I am doing an implementation that involve using a SASfile and not have the adjacency list in memory. Creating an empty file and reading from specific rows (which corresponds to particular agents) all goes well until the "final" step: updating the entire observation.

Below is IML code that works, up until the last stage.

    proc iml;
    /* initialize vars*/
    checkObs = 2;
    numCol = 5;
    db = "myTestDataBase";
    nObs = 5;
    temp = {};
    myList = J(1, numCol, 0);
    nVarToUpdate = 2;

    /* create empty database */
    create (db) from myList;
        append from myList;
    close (db);
    do i = 1 to (nObs-1);
        edit (db);
            append from myList;
        close (db);
    end;

    /* read index checkObs and write to temp*/
    edit (db);
        read point (checkObs) into temp; /* Read an entire row*/
        temp[nVarToUpdate] = 1; /* I would like to update some values*/
        /* I want to replace point chekObs with the whole of vector temp*/
        replace point checkObs var _all_;
    close (db);
    print temp;

My aim is to replace/update a whole observation (row), while keeping the order of the rows intact. Any ideas?

2

There are 2 best solutions below

3
On BEST ANSWER

@Joe's solution will work for this problem, but combining macro and IML in that way is like kissing your sister: it's not pleasant and people will look at you strangely. Joe had the right idea to get the name of the variables, but he forgot that you can use the VALSET call to perform indirect assignment. In other words, by having the name of a variable, you can change it's value.

If you want to avoid the macro, you can get the names of the variables one time (outside the EDIT loop) and then loop over the names of the variales, like this:

/* get column names ONE TIME */
use (db);
   read next var _ALL_ into temp[colname=varNames]; /* get names of cols */
close (db);

/* read index checkObs and write to temp*/
edit (db);
read point (checkObs) into temp; /* Read an entire row*/
temp[nVarToUpdate] = 1; /* I would like to update some values*/
do i = 1 to ncol(temp);
   call valset(varNames[i], temp[i]); /* update scalar variables */
end;
replace point (checkObs) var _all_;
close (db);

The main advantage of this technique is that you can discover the variable names at run time.

Be aware that this method (creating variable names) can be dangerous because if the data set has a variable named X, you are overwriting any pre-existing variable of that name in your program.

Also be aware that using EDIT and READ POINT on a huge data set to change one row at a time will be slower than molasses flowing uphill in the winter time. If possible, you should read in a big block of data, operate on all rows in that block, and write out the block.

If disk space allows, you might want to try using the SETIN and SETOUT statements to read from one data set while writing to another. That would completely eliminate the need for the REPLACE statement and the VALSET call. In general, it is more efficient to open a data set read-only and process it sequentially than to open it read/write and process it by using random access.

2
On

Your problem is this, from the documentation for REPLACE:

The REPLACE statement replaces the values of observations in a SAS data set with current values of matrices with the same name.

So this would work:

edit (db);
    read point (checkObs) into temp; /* Read an entire row*/
    col2 = 1; /* I would like to update some values*/
    /* I want to replace point chekObs with the whole of vector temp*/
    replace point (checkObs) var('col2');
close (db);

Of course, that probably isn't what you had in mind.

I don't know if there's an IML way to do this - perhaps Rick will happen along and have an answer. I'm not sure why there wouldn't be, but I am nowhere near an expert in IML, so perhaps there is - or is a reason why not. My feeling is that what you're trying to do might be better done in an entirely different way; it's not clear exactly what you're doing, though. If you're doing a neural network or similar, there's a lot of code out there - another approach might present itself. I find that people coming to SAS from other languages often expect what worked in another language to work here - when there's an entirely different way to do it in SAS efficiently (which works well, if not better - just need to realize what it is).

Anyway, you can do this with a bit of setup and the macro language. Here is an example. You'll have to do the setup ahead of time - and if you have dynamic variable lists, this may be complicated (you may have to have a double-resolved macro variable name, for example).

The main thing to remember with the macro approach is that the macro values have to entirely be available before IML runs; so you can't put a variable in the argument, it has to be a hard-coded value or a macro variable which itself is known before IML runs. DOSUBL/RUN_MACRO might let you get around this to some extent, but it's probably going to be complicated to do things that way.

proc sql;
  select name into :namelist separated by ','
    from dictionary.columns
    where libname='WORK' and memname='MYTESTDATABASE';
quit;

%macro replace_var(num_Vars=1);
        %do _i = 1 %to &num_vars.;
          %let _var = %scan(%bquote(&namelist),&_i,%str(,));
          &_var. = temp[&_i.];
        %end;
%mend;

proc iml;
    /* initialize vars*/
    checkObs = 4;
    numCol = 5;
    db = "myTestDataBase";
    nObs = 5;
    temp = {};
    myList = J(1, numCol, 0);
    nVarToUpdate = 2;

    /* create empty database */
    create (db) from myList;
        append from myList;
    close (db);

    do i = 2 to (nObs);
        edit (db);
            append from myList;
        close (db);
    end;
    /* read index checkObs and write to temp*/
    edit (db);
        read point (checkObs) into temp[colname=temp_names]; /* Read an entire row*/
        temp[nVarToUpdate] = 1; /* I would like to update some values*/
        %replace_var(num_vars=&sqlobs.);   *call the macro which sets the various variable names to their values.  Semicolon is just for syntax coloring to work.;
        /* I want to replace point chekObs with the whole of vector temp*/
        replace point (checkObs) var _all_;
    close (db);
    print temp;
    print temp_names;
  quit;