SAS Merge By Not Working

1.1k Views Asked by At

The following is a macro I wrote to left out join two datasets.

/*
* Macro to add the t-rate to the data
*/
%MACRO RFRATE(WITHOUT_RATE, WITH_RATE);
/*read the tbill rate data*/
data TBRATE;
  INFILE "T-RATE/T-RATE-FORMATTED.csv"
  DELIMITER = "," 
  MISSOVER DSD 
  FIRSTOBS=2
  LRECL=32767;
  INPUT DATE one_M three_M six_M one_Y two_Y three_Y five_Y seven_Y ten_Y twenty_Y thirty_Y;
  format DATE yymmddn8.;
  length Date 4;
run;

data &WITH_RATE;
  merge &WITHOUT_RATE(IN=A) TBRATE(IN=B);
  by Date;
  if A;
run;

/*I have also tried the following way (SQL)*/
/*
PROC SQL;
  Create table &WITH_RATE as
  Select a.*,b.*
  from
    &WITHOUT_RATE a
  left join
    TBRATE b
  on a.Date=b.Date;
quit;
*/

proc export data=&WITH_RATE (obs=99999)
  outfile= 'samplesmall.csv'
  dbms=CSV REPLACE;
  putname=YES;
run;

proc contents data= TBRATE  position; 
run;

proc contents data= &WITHOUT_RATE  position; 
run;

%mend;

The TBRATE is some data similar to the following format:

| Date     | Rate     |
|----------|----------|
| 20120101 | 1.0      | 
| 20120102 | 1.5      | 
| 20120103 | 1.5      |
| 20120104 | 1.3      |
| 20120105 | 1.1      |

The WITHOUT_RATE is some data similar to the following:

| Date     | Other Data    |
|----------|---------------|
| 20120101 | 7.0           | 
| 20120101 | 3.5           | 
| 20120101 | 4.5           |
| 20120101 | 2.3           |
| 20120101 | 11.1          |
| 20120102 | 23.0          | 
| 20120102 | 12.5          | 
| 20120102 | 12.5          |
| 20120102 | 11.3          |
| 20120102 | 11.1          |

I wanted to create something similar to the following:

| Date     | Other Data    |   Rate  |
|----------|---------------|----------
| 20120101 | 7.0           | 1.0     | 
| 20120101 | 3.5           | 1.0     |
| 20120101 | 4.5           | 1.0     |
| 20120101 | 2.3           | 1.0     |
| 20120101 | 11.1          | 1.0     |
| 20120102 | 23.0          | 1.5     |
| 20120102 | 12.5          | 1.5     |
| 20120102 | 12.5          | 1.5     |
| 20120102 | 11.3          | 1.5     |
| 20120102 | 11.1          | 1.5     |

The format of the date variable in WITHOUT_RATE is:

 Variable      Type    Len    Format       Label
 DATE          Num       4    YYMMDDN8.    Quote date

The format of the date variable in TBRATE is:

 Variable    Type    Len    Format
 DATE        Num       4    YYMMDDN8.

It seems that they are the same.

However, the code keep producing null results:

| Date     | Other Data    |   Rate  |
|----------|---------------|----------
| 20120101 | 7.0           |         | 
| 20120101 | 3.5           |         |
| 20120101 | 4.5           |         |
| 20120101 | 2.3           |         |
| 20120101 | 11.1          |         |
| 20120102 | 23.0          |         |
| 20120102 | 12.5          |         |
| 20120102 | 12.5          |         |
| 20120102 | 11.3          |         |
| 20120102 | 11.1          |         |

Therefore, I was wondering where did I go wrong?

Update #2

The code used to call the macro: 

/*The output dataset*/
%NBBO(20130102, tempoutputset);
%NBBOReturn(tempoutputset, NBBODATA);
%RFRATE(NBBODATA, RFRATEDATA);
3

There are 3 best solutions below

2
On BEST ANSWER

I solved it. Thanks to @Quentin and @India.Rocket's suggestions, I was able to find the solution.

The problem was reading the dates from csv file. It seems that dates in the format of yyyymmdd or yyyy-mm-dd cannot be read correctly. I had to reformat the dates in the csv to mm/dd/yyyy with python, then input them as DATE:mmddyy10 in sas. However, the other dataset I was merging with has the date format in yymmddn8. Therefore, I had to reformat DATE:mmddyy10. to yymmddn8. to make the whole thing work.

Following code was used.

data TBRATE;
  INFILE "T-RATE/T-RATE-FORMATTED.csv"
  DELIMITER = ","
  MISSOVER DSD
  FIRSTOBS=2
  LRECL=32767;
  INPUT DATE:mmddyy10. DateRaw one_M three_M six_M one_Y two_Y three_Y five_Y seven_Y ten_Y twenty_Y thirty_Y;
  format DATE yymmddn8.;
run;
5
On

Remove set from data merge statement. You cannot write set in Data Merge statement. Also you need to sort datasets, if they are not already sorted by default by date for them to Merge on Date:-

Proc sort data=&WITHOUT_RATE; by Date; run;
Proc sort data=TBRATE; by Date; run;

data &WITH_RATE;
  merge &WITHOUT_RATE(IN=A) TBRATE(IN=B);
  by Date;
  if A;
run;

OR 

PROC SQL:
Create table &WITH_RATE as
Select a.*,b.Rate
from
&WITHOUT_RATE a
left join
TBRATE b
on a.Date=b.Date;
quit;
8
On

Agree with @Reeza, I would remove the macro code to work on finding the source of the problem. Once you have it working outside of the macro language, go back to working on the macro. From a quick look, I would guess your BY values are not matching.

Try something like below to look at the mismatched records:

data WITH_RATE
     Aonly
     Bonly
     BothAandB
  ;
  merge WITHOUT_RATE(IN=A) TBRATE(IN=B);
  by Date;
  if A then output With_Rate;

  if A=1 and B=1 then output BothAandB;  
  else if B=0 then output Aonly;
  else if A=0 then output Bonly;

run;

Then remove the formats on date and look at the values in Aonly and Bonly to see why they don't match. The output dataset Aonly will have records from WITHOUT_RATE which do not have a matching DATE in TBRATE. The output dataset Bonly will have records from TBRATE which do not have a matching DATE in WITHOUT_RATE. The output dataset BothAandB will have the records that did match.

If you see values in Aonly and Bonly that look like they should match, and assuming these are dates, perhaps round them before the merge to make sure they are integers to avoid any possible numeric precision / floating point differences.