I have data of samples from a bunch of time series. I have a spreadsheet of start year of each, end year of each, and the years that measurements were taken between. I want to plot them as line segments with dots showing years when measurements were taken, but I can't think of how in R since the number / pattern of measurement years is inconsistent between rows (some are NA because no years were measured between first and last).
For now, they're saved in excel in a column called "middle_years" that literally has "c(1990,1995,2000,2007)" in the excel cell. I am happy to changing this format, but I don't know what is most efficient here. What's the best way to import these so I can show them on a plot?
# sample data
test <- data.frame(
category = c("a","a","b","b","c","c"),
start_year = c(1920, 1970, 1980, 1977, 1950, 1982),
end_year = c(2019, 2008, 2010, 2001, 2000, 2010),
middle_years = c("c(1945,1960,1988,2002)","c(1981,1988,1995:1998,2004)",
"c(1981,1999)", NA, "c(1970)", NA ))
# plot as segments, colored by category
# but this is where I want to add sample dots on top of each line segment
test %>%
arrange(start_year) %>%
mutate(order = c(1:nrow(.)))%>%
ggplot() +
geom_segment(aes(x=start_year,xend=end_year,y=order,yend=order,color=category),
size=3,lineend = "round")+
theme_minimal()
I suppose what I'll need is to change the original excel sheet from 'c()' to just the years separated by columns, then import, coerce comma-separated strings into new columns, pivot_longer by ID, then filter out NA rows. But some of the values are like, 1950:2000, so if there is a way to do this without individually typing our every year in the excel sheet, I would definitely prefer that.
Thanks!
Ok! I figured it out.
Instead of 'c()' in the spreadsheet, I just listed numbers with commas.
So I have the following data:
then I used
splitstackshape::cSplit
to coerce these lists into individual columns.