SAS year function not working inside macro

945 Views Asked by At

hello am trying access columns from library with specific date format and using year function on the columns in my macro code but it produces duplicate values... but the year function displays duplicate values and does not provide desired results. my code should return only the year from the input dates.

%macro dteyear(lib=,outdsn=);
proc sql noprint;
select distinct catx(".",libname,memname), name 
into :dsns separated by " ", :varname separated by " "
from dictionary.columns
where libname = upcase("&lib") and format=('YYMMDD10.')
order by 1;
quit;
%put &dsns;
%put &varname;
%local olddsn curdsn curvbl i;
data &outdsn.;
set 
%let olddsn=;
%do i=1 %to &sqlobs;
%let curdsn=%scan(&dsns,&i,%str( ));
%let curvbl=%scan(&varname,&i,%str( ));
%if &curdsn NE &olddsn
%then %do;
%if &olddsn NE
%then %do;
)
%end;
%let olddsn=&curdsn.;
&curdsn (keep=&curvbl
%end;
%else %do;
&curvbl
%end;
%end;
);
%do i=1 %to &sqlobs;
%scan(&varname,&i,%str( ))=year(&varname.);
%end;
run;
proc print data=&outdsn;run;
%MEND;
%dteyear(lib=dte3,outdsn=dtetst);


the input data is as follows
1975-12-04 
1977-11-03
1989-09-15
1998-06-17
1999-05-31
2000-08-14
2001-03-11
2007-03-11
2007-12-28
2008-10-07
2009-12-03

duplicate output from my code is-->

Obs RFDTC 
1 1965-05-19 
2 1965-05-19 
3 1965-05-19 
4 1965-05-19
5 1965-05-19
6 1965-05-19
7 1965-05-19
8 1965-05-19
9 1965-05-19
10 1965-05-19
11 1965-05-19
12 1965-05-19
13 1965-05-19
2

There are 2 best solutions below

2
On BEST ANSWER

The basic problem is that the YEAR() function returns a 4-digit number, and the variable's format is YYMMDD10., so the result is formatted as a SAS date very close to 1960 (SAS's beginning of all time).

What I did in the code below was change the format to 4.0, so it displays as a 4-digit number.

If you want to have access to the original date variable, you'll have to create a new variable for the year. I'll leave that to you.

There was an additional problem--that is, YEAR(&varname.) inserts the entire list of variables, not just the one you're working with. It works if there is only one date variable, but not if there are more than one. I fixed this, too.

%macro dteyear(lib=,outdsn=);
proc sql noprint;
select distinct catx(".",libname,memname), name 
into :dsns separated by " ", :varname separated by " "
from dictionary.columns
where libname = upcase("&lib") and format=('YYMMDD10.')
order by 1;
quit;
%put &dsns;
%put &varname;
%local olddsn curdsn curvbl i;
data &outdsn.;
set 
%let olddsn=;
%do i=1 %to &sqlobs;
%let curdsn=%scan(&dsns,&i,%str( ));
%let curvbl=%scan(&varname,&i,%str( ));
%if &curdsn NE &olddsn
%then %do;
%if &olddsn NE
%then %do;
)
%end;
%let olddsn=&curdsn.;
&curdsn (keep=&curvbl
%end;
%else %do;
&curvbl
%end;
%end;
);
%do i=1 %to &sqlobs;
%let curvbl=%scan(&varname,&i,%str( ));
&curvbl=year(&curvbl.);
format &curvbl 4.0;
%end;
run;
proc print data=&outdsn;run;
%MEND;

data have;
    input datevar yymmdd10.;
    format datevar yymmdd10.;
cards;
1975-12-04 
1977-11-03
1989-09-15
1998-06-17
1999-05-31
2000-08-14
2001-03-11
2007-03-11
2007-12-28
2008-10-07
2009-12-03
run;

options mprint;

%dteyear(lib=work,outdsn=want)

The result, then, is:

Obs datevar 
1 1975 
2 1977 
3 1989 
4 1998 
5 1999 
6 2000 
7 2001 
8 2007 
9 2007 
10 2008 
11 2009 
0
On

To convert a date value to just a year you can use the YEAR() function, but you also need to change the format attached to the variable since you will have essentially divided the value stored in it by 365 to convert it from the number of days to the number of years.

rfdtc = year(rfdtc);
format rfdtc 4. ;

Your macro is attempting to read many variables from many datasets and generate a single output dataset. I am not sure the resulting dataset will be of much value to you since it will look like a checker board of missing values. Also if the same variable name appears in more than one input dataset you will get corrupted values because of applying the YEAR() function to value that has already been converted from a date value to a year value.

For example you could end up generating a data step like this:

data WANT ;
  set ds1 (keep=datevar1)
      ds1 (keep=datevar2)
      ds2 (keep=datevar3)
      ds3 (keep=datevar3)
  ;
  datevar1=year(datevar1);
  datevar2=year(datevar2);
  datevar3=year(datevar3);
  datevar3=year(datevar3);
  format datevar1 datevar2 datevar3 datevar3 4.;
run;

Since both input datasets DS2 and DS3 have a variable named DATEVAR3 you will be applying the YEAR() function to the value twice. That will convert everything to the year 1965.

To eliminate the problem with running the YEAR() function on the same value multiple times and losing the actual year perhaps you just want to apply the YEAR. format instead of converting the stored value.

 format datevar1 datevar2 datevar3 datevar4 year. ;

That would still leave the underlying different date values. If you really need to values to be identical perhaps you could convert the value to the first day of the year? You could use INTNX() function

 datevar1 = intnx('year',datevar1,0,'b');

or the MDY() function

 datevar1 = mdy(1,1,year(datevar1));