How to plot a cumulative frequency line graph using ggplot2?

656 Views Asked by At

Forgive me if this question is self explanatory, but I am still trying to get to grips with some more of R's features.

I am currently trying to use R to replot a cumulative frequency with lines I plotted in excel.

Here is a picture of the graph I am trying to recreate

I think a lot of my problems are coming from having a lot of cells with no data, as I keep getting the warning:

Warning messages:
1: Removed 81 row(s) containing missing values (geom_path).
2: Removed 81 row(s) containing missing values (geom_path).
3: Removed 81 row(s) containing missing values (geom_path).

This is because each column represents a recording frequency witch witch only occurred for 21 days, with a 20 day rest period between each recording period.

My data table

I have tried using geom_ steps() and geom_points() but I end up with these:

Graphic produced using geom_step

graphic produced with geom_point

When I use the geom_line() function the axis are created but nothing is plotted.

Graphic produced using geom_line

The dates on the x axis also look horrendous, I tried using the code + theme(axis.text.x = element_text(angle = 90)) to rotate the labels but it still looks terrible, I am not sure if its just to many dates.

Here is the code I have been trying to get to work for the various geom functions:

ggplot() +
    geom_point(aes(x = Date, y = d2s1, group = 1), data = cf) +
    geom_point(aes(x = Date, y = d20s1, group = 1), data = cf) +
    geom_point(aes(x = Date, y = d10s1, group = 1), data = cf) +
    theme(axis.text.x = element_text(angle = 90))

ggplot() +
    geom_step(aes(x = Date, y = d2s1, group = 1), data = cf) +
    geom_step(aes(x = Date, y = d20s1, group = 1), data = cf) +
    geom_step(aes(x = Date, y = d10s1, group = 1), data = cf) +
    theme(axis.text.x = element_text(angle = 90))

ggplot() +
    geom_line(aes(x = Date, y = d2s1, group = 1), data = cf) +
    geom_line(aes(x = Date, y = d20s1, group = 1), data = cf) +
    geom_line(aes(x = Date, y = d10s1, group = 1), data = cf) +
    theme(axis.text.x = element_text(angle = 90))

I hope this all makes sense and thank you all in advance for any help you can provide!

I read in the data using read.csv("cf.csv").

I have attached the output of dput(cf) below.

structure(list(Date = c("08/11/2019", "09/11/2019", "10/11/2019", 
"11/11/2019", "12/11/2019", "13/11/2019", "14/11/2019", "15/11/2019", 
"16/11/2019", "17/11/2019", "18/11/2019", "19/11/2019", "20/11/2019", 
"21/11/2019", "22/11/2019", "23/11/2019", "24/11/2019", "25/11/2019", 
"26/11/2019", "27/11/2019", "28/11/2019", "29/11/2019", "30/11/2019", 
"01/12/2019", "02/12/2019", "03/12/2019", "04/12/2019", "05/12/2019", 
"06/12/2019", "07/12/2019", "08/12/2019", "09/12/2019", "10/12/2019", 
"11/12/2019", "12/12/2019", "13/12/2019", "14/12/2019", "15/12/2019", 
"16/12/2019", "17/12/2019", "18/12/2019", "19/12/2019", "20/12/2019", 
"21/12/2019", "22/12/2019", "23/12/2019", "24/12/2019", "25/12/2019", 
"26/12/2019", "27/12/2019", "28/12/2019", "29/12/2019", "30/12/2019", 
"31/12/2019", "01/01/2020", "02/01/2020", "03/01/2020", "04/01/2020", 
"05/01/2020", "06/01/2020", "07/01/2020", "08/01/2020", "09/01/2020", 
"10/01/2020", "11/01/2020", "12/01/2020", "13/01/2020", "14/01/2020", 
"15/01/2020", "16/01/2020", "17/01/2020", "18/01/2020", "19/01/2020", 
"20/01/2020", "21/01/2020", "22/01/2020", "23/01/2020", "24/01/2020", 
"25/01/2020", "26/01/2020", "27/01/2020", "28/01/2020", "29/01/2020", 
"30/01/2020", "31/01/2020", "01/02/2020", "02/02/2020", "03/02/2020", 
"04/02/2020", "05/02/2020", "06/02/2020", "07/02/2020", "08/02/2020", 
"09/02/2020", "10/02/2020", "11/02/2020", "12/02/2020", "13/02/2020", 
"14/02/2020", "15/02/2020", "16/02/2020", "17/02/2020"), d2s1 = c(6L, 
11L, 13L, 20L, 25L, 35L, 42L, 49L, 49L, 51L, 53L, 54L, 60L, 65L, 
69L, 73L, 76L, 80L, 85L, 86L, 86L, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), d10s2 = c(0L, 6L, 8L, 
10L, 11L, 14L, 14L, 15L, 18L, 19L, 21L, 21L, 22L, 22L, 24L, 24L, 
26L, 27L, 31L, 32L, 32L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), d20s1 = c(NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 3L, 9L, 13L, 19L, 24L, 26L, 32L, 38L, 44L, 46L, 48L, 
50L, 56L, 62L, 64L, 64L, 73L, 83L, 92L, 99L, 105L, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA), d20s2 = c(NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 0L, 2L, 2L, 3L, 4L, 14L, 15L, 23L, 25L, 27L, 36L, 37L, 38L, 
43L, 43L, 45L, 47L, 50L, 53L, 56L, 57L, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA), d10s1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, 2L, 15L, 19L, 22L, 33L, 34L, 37L, 
37L, 39L, 41L, 48L, 50L, 52L, 56L, 62L, 64L, 65L, 68L, 72L, 77L, 
84L), d2s2 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 4L, 4L, 4L, 4L, 4L, 7L, 9L, 9L, 12L, 12L, 
14L, 17L, 17L, 23L, 24L, 24L, 24L, 26L, 26L, 30L, 33L)), class = "data.frame", row.names = c(NA, 
-102L) 
1

There are 1 best solutions below

1
On

The function geom_step() has an argument na.rm to remove NA values, which is FALSE by default. changing this to TRUE should give you the plots that you want. Alternatively you could change the NA data to zeroes for the same effect.

The crowded x-axis is typical of what happens when the data is stored as a factor, rather than a date. This will be related to how you read in your data, which you haven't shown.