Import vector in one CSV cell R

136 Views Asked by At

I have data of samples from a bunch of time series. I have a spreadsheet of start year of each, end year of each, and the years that measurements were taken between. I want to plot them as line segments with dots showing years when measurements were taken, but I can't think of how in R since the number / pattern of measurement years is inconsistent between rows (some are NA because no years were measured between first and last).

For now, they're saved in excel in a column called "middle_years" that literally has "c(1990,1995,2000,2007)" in the excel cell. I am happy to changing this format, but I don't know what is most efficient here. What's the best way to import these so I can show them on a plot?

# sample data
test <- data.frame(
          category = c("a","a","b","b","c","c"),
          start_year = c(1920, 1970, 1980, 1977, 1950, 1982),
           end_year = c(2019, 2008, 2010, 2001, 2000, 2010),
           middle_years = c("c(1945,1960,1988,2002)","c(1981,1988,1995:1998,2004)",
                            "c(1981,1999)", NA, "c(1970)",  NA ))

# plot as segments, colored by category 
# but this is where I want to add sample dots on top of each line segment
test %>%
  arrange(start_year) %>%
  mutate(order = c(1:nrow(.)))%>%
  ggplot() +
  geom_segment(aes(x=start_year,xend=end_year,y=order,yend=order,color=category),
               size=3,lineend = "round")+
  theme_minimal()

I suppose what I'll need is to change the original excel sheet from 'c()' to just the years separated by columns, then import, coerce comma-separated strings into new columns, pivot_longer by ID, then filter out NA rows. But some of the values are like, 1950:2000, so if there is a way to do this without individually typing our every year in the excel sheet, I would definitely prefer that.

Thanks!

1

There are 1 best solutions below

0
On

Ok! I figured it out.

Instead of 'c()' in the spreadsheet, I just listed numbers with commas.

So I have the following data:

# sample data
test <- data.frame(
          category = c("a","a","b","b","c","c"),
          start_year = c(1920, 1970, 1980, 1977, 1950, 1982),
           end_year = c(2019, 2008, 2010, 2001, 2000, 2010),
           middle_years = c("1945,1960,1988,2002", 
                            "1981,1988,1995,1996,1998,1998,2004",
                            "1981,1999", NA, "1970",  NA ))

then I used splitstackshape::cSplit to coerce these lists into individual columns.


test %>%
  arrange(start.year) %>%
  mutate(order = c(1:nrow(.)))%>%

  # split each comma-separated string into n columns
  splitstackshape::cSplit('middle.years',sep=",") %>%

  # pivot these new columns into rows
  pivot_longer(cols = starts_with("middle.years"),
               names_to = "middle.year.order",
               values_to = "middle.year") %>%
  
  # slice so that you don't have a bunch of NAs.
  group_by(order,middle.year) %>%
  slice(1) %>%
  ungroup() %>%
  
  mutate(Category = as.character(Category)) %>%


  # plot
  ggplot()+
  geom_segment(aes(x=start.year,xend=end.year,y=order,yend=order,color=Category),
               size=3,lineend = "round")+
  geom_point(aes(y=order,x=middle.year),color="black")+
  theme_minimal()