Apply different scaling factor to subsets of rows in a column in a dataframe

83 Views Asked by At

I generating a dataframe by combining a load of csv's into one. I generate a path column, which corresponds to the ID of each CSV, and then aim to do simple math transformations to scale the path column into the timestamp for each csv (each csv is data from a particular time in an experiment). The combined_files tibble has 14 columns and path is the first.

I have some data where I have a gap of 0.5 hours between path 1-7 and path 8-100 (1-7 corresponds to the first 336 rows of my dataframe). I am trying to convert the path for 1-7 to time in hours using

files <- list.files(pattern = "*.csv")
files <- stringr::str_sort(files, numeric = TRUE)
files
combined_files <- bind_rows(lapply(files, fread))

combined_files <- readr::read_csv(files, id = "path") %>%
  mutate(path = str_remove(path, "cap_data_"))%>%
  mutate(path = str_remove(path, ".csv"))
combined_files$path <- as.numeric(combined_files$path)

(combined_files[combined_files[c(1:336), 1]] - 1)*0.5

and then do the same transformation on the remaining rows, just shifted by 0.5 hours:

(combined_files[combined_files[c(1:336), 1]] - 1)*0.5

When I do this, I get the following error:

Error in `vectbl_as_col_location()`:
! Can't subset columns with `combined_files[c(1:336), 1]`.
x `combined_files[c(1:336), 1]` must be logical, numeric, or character, not a <tbl_df/tbl/data.frame> object.

Any help would be much appreciated! Edit: an example from my data is too large to attach, so here is a table with similar properties: |path|p1|p2| |:---|:-:|-:| |1|10|20| |1|20|30| |1|30|54| |2|10|20| |2|20|30| |2|30|54| |3|10|20| |3|20|30| |3|30|54|

In this example, I would want to change the values for path between 1 and 2 to be (i-1)*0.5 (converting into hours, where path = 1 is t = 0) and path = 3 to a value of 2 (through (i -1)*0.5 +1 (where i is the value of path for each row) I found a bad way to do it - just split the tibble and then rebind it, but if anyone has any route to avoid splitting that would be amazing!

combined_files_1st <- subset(combined_files, combined_files$path < 7.5 )
combined_files_2nd <- subset(combined_files, combined_files$path > 7)
combined_files_1st$path <- (combined_files_1st$path - 1)*0.5
combined_files_2nd$path <- (combined_files_2nd$path-1)*0.5 +0.5

combined_join <- rbind(combined_files_1st, combined_files_2nd)
1

There are 1 best solutions below

3
PBulls On

Unlike base matrices/data.frames the [ operator doesn't subset a vector from a tibble if you ask for a single column, it'll still be a tibble. You'll have to use [[, $, or dplyr::pull.

Untested, but this would be the 'tidyversiest' way (using the base pipe):

(combined_files[combined_files[1:336, 1] |> dplyr::pull()] - 1)*0.5