Extract 3 dimensional data from csv file in R

258 Views Asked by At

I am trying to extract my data from a csv file into R. The data is currently formatted in the following:

,"[{time=2014-01-01T00:00:00, NDVI=0.3793765496776215}, {time=2014-02-01T00:00:00, NDVI=...

,"[{time=2014-01-01T00:00:00, NDVI=0.4071076986818826}, {time=2014-02-01T00:00:00, ...

,"[{time=2014-01-01T00:00:00, NDVI=0.3412131556625801}, {time=2014-02-01T00:00:00, NDVI=...

Each of these lines is data for a different region. I want to have it in this format:

Region [1]

Time       NDVI

[1]        [1]

[2]        [2]

[3]        [3]

[...]      [...]


Region [2]

Time       NDVI

[1]        [1]

[2]        [2]

[3]        [3]

[...]      [...]


Region [3]

Time       NDVI

[1]        [1]

[2]        [2]

[3]        [3]

[...]      [...]

How can I do this?

1

There are 1 best solutions below

2
On BEST ANSWER

Maybe there is a package that can parse this. However, you could do some data transformation using the tidyverse package.

You can read in your data with readLines():

dat <- readLines("test.txt")

Which in this case looks like this:

dat <- c(",\"[{time=2014-01-01T00:00:00, NDVI=0.3793765496776215}, {time=2014-02-01T00:00:00, NDVI=2}]\"", 
"", ",\"[{time=2014-01-01T00:00:00, NDVI=0.4071076986818826}, {time=2014-02-01T00:00:00, NDVI=3}]\"", 
"", ",\"[{time=2014-01-01T00:00:00, NDVI=0.3412131556625801}, {time=2014-02-01T00:00:00, NDVI=4}]\""
)

Then you can do some data transformations with a for loop and store the result in a list.

library(tidyverse)
dat <- c(",\"[{time=2014-01-01T00:00:00, NDVI=0.3793765496776215}, {time=2014-02-01T00:00:00, NDVI=2}]\"", 
         "", ",\"[{time=2014-01-01T00:00:00, NDVI=0.4071076986818826}, {time=2014-02-01T00:00:00, NDVI=3}]\"", 
         "", ",\"[{time=2014-01-01T00:00:00, NDVI=0.3412131556625801}, {time=2014-02-01T00:00:00, NDVI=4}]\""
)

l <- list()
counter <- 1
for (line in dat){
  if(nchar(line)>0){
    line <- as.data.frame(line) %>%
        # We need to remove some unwanted strings
      mutate(line = str_replace_all(line, 
                                    c("\\\""="", ",\\["= "", "\\]" = ""))) %>% 
        # The lines can be separated into rows where the string "}, {" occurs
               separate_rows(line, line, sep = "\\}, \\{") %>% 
        # again removing some unwanted strings
      mutate(line = str_replace_all(line, c("\\{"="", "\\}"=""))) %>% 
        # add a unique identifier for each observation
      mutate(observation = row_number()) %>% 
        # separete the rows where a "," occurs
      separate_rows(line, line, sep =",") %>% 
      separate(., line, into = c("category", "value"), sep = "=") %>% 
        # put it into the long format
      pivot_wider(names_from = category, values_from = value)
    l[[counter]] <- line
    counter <- counter+1
  }
}

l
#> [[1]]
#> # A tibble: 2 x 3
#>   observation time                ` NDVI`           
#>         <int> <chr>               <chr>             
#> 1           1 2014-01-01T00:00:00 0.3793765496776215
#> 2           2 2014-02-01T00:00:00 2                 
#> 
#> [[2]]
#> # A tibble: 2 x 3
#>   observation time                ` NDVI`           
#>         <int> <chr>               <chr>             
#> 1           1 2014-01-01T00:00:00 0.4071076986818826
#> 2           2 2014-02-01T00:00:00 3                 
#> 
#> [[3]]
#> # A tibble: 2 x 3
#>   observation time                ` NDVI`           
#>         <int> <chr>               <chr>             
#> 1           1 2014-01-01T00:00:00 0.3412131556625801
#> 2           2 2014-02-01T00:00:00 4

Created on 2020-03-24 by the reprex package (v0.3.0)