My data are structured as follows (these are just sample data as the original data are secret)
id | crime | location | crimedate
------------------------------
1 | Theft | public | 2019-01-04
1 | Theft | public | 2019-02-06
1 | Theft | public | 2019-02-20
1 | Theft | private | 2019-03-10
1 | Theft | private | 2019-03-21
1 | Theft | public | 2019-03-01
1 | Theft | private | 2019-03-14
1 | Theft | public | 2019-06-15
1 | Murder | private | 2019-01-04
1 | Murder | private | 2019-10-20
1 | Murder | private | 2019-11-18
1 | Murder | private | 2019-01-01
1 | Assault | private | 2019-03-19
1 | Assault | private | 2019-01-21
1 | Assault | public | 2019-04-11
1 | Assault | public | 2019-01-10
… | … | … | …
My goal is to create a lineplot (time series plot) showing how the numbers of the three crimes have changed over the year. Therefore on the x-axis I would like to show the monthes (1-12) and on the y-axis the number of crimes in each month. There should be two lines (one for each location).
I started with this code:
DATA new;
SET old;
month=month(datepart(crimedate));
RUN;
PROC sgplot DATA=new;
series x=month y=no_of_crimes / group=location;
run;
But I have no idea, how I can aggregate the number of crimes per month. Could anyone please give me a hint? I have been looking in the internet for a solution, but usually the examples just use data that are already aggregated.
The SG routines will aggregate Y axis values for a
VBAR
orHBAR
statement. The same aggregate information displayed in aSERIES
statement would have to be from a apriori aggregate computation, easily done withProc SUMMARY
.Additionally, to plot the counts for each crime in a separate visual, you would want a
BY CRIME
statement, orProc SGPANEL
withPANELBY crime
.The crime datetime value does not have to be down transformed to a date value, you can use the appropriate
datetime
format in the procedures and they will auto-aggregate based on the formatted value.Example with some simulated crime data: