Cumulative Frequency Graph in R

234 Views Asked by At

I've been trying to look around to see if other questions helped - they didn't.

I've imported my data as follows:

  • Data <- read.csv("Module 1.csv")

Data Output:

structure(list(ID = 1:50, Data_Points = c(41L, 42L, 43L, 44L, 
45L, 45L, 45L, 46L, 47L, 48L, 48L, 49L, 50L, 50L, 52L, 53L, 54L, 
55L, 55L, 57L, 57L, 57L, 58L, 58L, 58L, 59L, 60L, 62L, 62L, 63L, 
65L, 67L, 68L, 69L, 70L, 71L, 71L, 72L, 73L, 75L, 75L, 77L, 82L, 
83L, 83L, 85L, 85L, 86L, 87L, 89L), LCL = c(40L, 48L, 56L, 64L, 
72L, 80L, 88L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    UCL = c(47L, 55L, 63L, 71L, 79L, 87L, 95L, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA), LCB = c(39.5, 47.5, 
    55.5, 63.5, 71.5, 79.5, 87.5, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA), UCB = c(47.5, 55.5, 63.5, 71.5, 
    79.5, 87.5, 95.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), MP = c(43.5, 51.5, 59.5, 67.5, 75.5, 83.5, 91.5, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Frequency = c(9L, 
    10L, 11L, 7L, 5L, 7L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), Cumulative_Frequency = c(9L, 19L, 30L, 
    37L, 42L, 49L, 50L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA)), class = "data.frame", row.names = c(NA, -50L
))
  • I do not know why R placed some the letter "L" after many of the numbers.
  • Please ignore the any "L" that you see.

Code for Cumulative Frequency Version 1:

  • So, I have done this in two ways, both of which are wrong.
  • The first code I tried is as follows:
ggplot(Data, aes(x = Data_Points, y = cumsum(Data_Points))) +
  geom_line() +
  geom_point() +
  labs(x = "Data Points",
       y = "Frequency",
       title = "Cumulative Frequency Polygon of the Data Provided") +
  scale_x_continuous(breaks = seq(39.5, 95.5, by = 8)) +
  scale_y_continuous(breaks = c(0:12)) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16)) +
  theme(axis.title.y.right = element_text(margin = margin(0, 0, 0, 10)))

It looks something like this: Weird, not a smooth line

  • The line doesn't show each individual frequency in between each class whatsoever.

Code for Cumulative Frequency Version 2:

  • I've also tried using geom_line() and geom_point() to see if that helps (newsflash! It does not).

  • I wrote something like this for the code (I changed it multiple times at this point, with no luck).

ggplot(Data, aes(x = Data_Points, y = Cumulative_Frequency)) +
  geom_line() +
  geom_point() +
  labs(x = "Data Points",
       y = "Frequency",
       title = "Cumulative Frequency Polygon of the Data Provided") +
  scale_x_continuous(breaks = seq(39.5, 95.5, by = 8)) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16)) +
  theme(axis.title.y.right = element_text(margin = margin(0, 0, 0, 10)))

Any help is much appreciated.

1

There are 1 best solutions below

0
On

The usual way to show an empirical cumulative density function from a particular data set would be to use stat_ecdf:

library(ggplot2)

ggplot(Data, aes(x = Data_Points)) +
  stat_ecdf() +
  labs(x = "Data Points",
       y = "Cumulative density",
       title = "") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16)) +
  theme(axis.title.y.right = element_text(margin = margin(0, 0, 0, 10)))

The steps are to be expected from the given density of the data points and are quite normal. However, if you want a smoother version you could create a cumulative frequency polygon like this:

ggplot(as.data.frame(table(Data$Data_Points)),
       aes(x = as.numeric(as.character(Var1)), y = cumsum(Freq)/sum(Freq))) +
  geom_line() +
  labs(x = "Data Points",
       y = "Frequency",
       title = "Cumulative Frequency Polygon of the Data Provided") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16)) +
  theme(axis.title.y.right = element_text(margin = margin(0, 0, 0, 10)))

Created on 2022-06-08 by the reprex package (v2.0.1)