I've been trying to look around to see if other questions helped - they didn't.
I've imported my data as follows:
- Data <- read.csv("Module 1.csv")
Data Output:
structure(list(ID = 1:50, Data_Points = c(41L, 42L, 43L, 44L,
45L, 45L, 45L, 46L, 47L, 48L, 48L, 49L, 50L, 50L, 52L, 53L, 54L,
55L, 55L, 57L, 57L, 57L, 58L, 58L, 58L, 59L, 60L, 62L, 62L, 63L,
65L, 67L, 68L, 69L, 70L, 71L, 71L, 72L, 73L, 75L, 75L, 77L, 82L,
83L, 83L, 85L, 85L, 86L, 87L, 89L), LCL = c(40L, 48L, 56L, 64L,
72L, 80L, 88L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
UCL = c(47L, 55L, 63L, 71L, 79L, 87L, 95L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), LCB = c(39.5, 47.5,
55.5, 63.5, 71.5, 79.5, 87.5, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), UCB = c(47.5, 55.5, 63.5, 71.5,
79.5, 87.5, 95.5, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), MP = c(43.5, 51.5, 59.5, 67.5, 75.5, 83.5, 91.5,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Frequency = c(9L,
10L, 11L, 7L, 5L, 7L, 1L, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), Cumulative_Frequency = c(9L, 19L, 30L,
37L, 42L, 49L, 50L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA)), class = "data.frame", row.names = c(NA, -50L
))
- I do not know why R placed some the letter "L" after many of the numbers.
- Please ignore the any "L" that you see.
Code for Cumulative Frequency Version 1:
- So, I have done this in two ways, both of which are wrong.
- The first code I tried is as follows:
ggplot(Data, aes(x = Data_Points, y = cumsum(Data_Points))) +
geom_line() +
geom_point() +
labs(x = "Data Points",
y = "Frequency",
title = "Cumulative Frequency Polygon of the Data Provided") +
scale_x_continuous(breaks = seq(39.5, 95.5, by = 8)) +
scale_y_continuous(breaks = c(0:12)) +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16)) +
theme(axis.title.y.right = element_text(margin = margin(0, 0, 0, 10)))
It looks something like this: Weird, not a smooth line
- The line doesn't show each individual frequency in between each class whatsoever.
Code for Cumulative Frequency Version 2:
I've also tried using geom_line() and geom_point() to see if that helps (newsflash! It does not).
I wrote something like this for the code (I changed it multiple times at this point, with no luck).
ggplot(Data, aes(x = Data_Points, y = Cumulative_Frequency)) +
geom_line() +
geom_point() +
labs(x = "Data Points",
y = "Frequency",
title = "Cumulative Frequency Polygon of the Data Provided") +
scale_x_continuous(breaks = seq(39.5, 95.5, by = 8)) +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16)) +
theme(axis.title.y.right = element_text(margin = margin(0, 0, 0, 10)))
- Here is what this looks like: What the heck?
Any help is much appreciated.
The usual way to show an empirical cumulative density function from a particular data set would be to use
stat_ecdf
:The steps are to be expected from the given density of the data points and are quite normal. However, if you want a smoother version you could create a cumulative frequency polygon like this:
Created on 2022-06-08 by the reprex package (v2.0.1)