How to convert date format from imported CSV to be able to merge data

894 Views Asked by At

I'm working on an assignment & not allowed to use PROC SQL. I imported data from two csv files & renamed columns at the same time (see code below) but when I tried to PROC APPEND the two tables, there's an issue with the date format. When I PROC CONTENTS I get the following info:

Work.2019data: Variable:date Type:Num Length:8 Format/Informat:MMDDYY10 When I open this file in notepad, the dates appears like this: 12/31/2019

For the second table:

Work.2020data: Variable:date Type:Num Length:8 Format/Informat:YYMMDD10 But when I open this file in notepad, the dates appears like this: 2020-11-16

PROC IMPORT
    DATAFILE= "&export_mtl/2019data.csv"
    OUT= WORK.2019data
    (RENAME=(new_cases=nouveaux_cas
             new_deaths=nouveaux_deces
             new_tests=nouveaux_tests
             total_tests=nb_total_tests
             female_smokers=femmes_fumeuses
             male_smokers=hommes_fumeurs
            ))
    DBMS= csv
    REPLACE;guessingrows=10000;
    GETNAMES= YES;
RUN;

PROC IMPORT
    DATAFILE= "&export_mtl/2020data.csv"
    OUT= WORK.2020data
(RENAME=(new_cases=nouveaux_cas
             new_deaths=nouveaux_deces
             new_tests=nouveaux_tests
             total_tests=nb_total_tests
             female_smokers=femmes_fumeuses
             male_smokers=hommes_fumeurs
            ))
    DBMS= csv
    REPLACE;guessingrows=10000;
    GETNAMES= YES;
RUN;

What's the simplest way to cast the date in the 2020data table so that I can concatenate the two tables after? I've seen so many ways of doing this & tried them with no luck.

2

There are 2 best solutions below

0
On BEST ANSWER

If you combine two datasets that have the same variable then the default format attached to the variable will be first non missing format that the data step compiler sees. So for your example if you run

data want;
   set '2019data'n '2020data'n ;
run;

then the MMDDYY10. format will be used. But if you reverse the order that the input datasets are referenced

data want;
   set '2020data'n '2019data'n ;
run;

then the YYMMDD10. format will be used.

But if you add a FORMAT statement to the data step then you can control what format will be attached in the new dataset. So if you used:

data want;
   set '2020data'n '2019data'n ;
   format date date9.;
run;

Now the default style that will be used to display the values of DATE in WANT will be ddMONyyyy.

3
On

If you just want to stack both files, you can go with a data step. It will do the task, assigning to the second dataset the same formats of the first:

data data_2019_2020;
set 2019data 2020data;
run;

If you want to change the format of date, you can also do it with a datastep. For example:

data data_2019_2020;
set 2019data 2020data;
FORMAT date YYMMDD10.;
run;

Or, if you prefer, you can change just in one dataset first and then use PROC APPEND to stack.