Apply different scaling factor to subsets of rows in a column in a dataframe

Question

Apply different scaling factor to subsets of rows in a column in a dataframe

83 Views Asked by Kieran At 31 December 2023 at 21:50

I generating a dataframe by combining a load of csv's into one. I generate a path column, which corresponds to the ID of each CSV, and then aim to do simple math transformations to scale the path column into the timestamp for each csv (each csv is data from a particular time in an experiment). The combined_files tibble has 14 columns and path is the first.

I have some data where I have a gap of 0.5 hours between path 1-7 and path 8-100 (1-7 corresponds to the first 336 rows of my dataframe). I am trying to convert the path for 1-7 to time in hours using

files <- list.files(pattern = "*.csv")
files <- stringr::str_sort(files, numeric = TRUE)
files
combined_files <- bind_rows(lapply(files, fread))

combined_files <- readr::read_csv(files, id = "path") %>%
  mutate(path = str_remove(path, "cap_data_"))%>%
  mutate(path = str_remove(path, ".csv"))
combined_files$path <- as.numeric(combined_files$path)

(combined_files[combined_files[c(1:336), 1]] - 1)*0.5

and then do the same transformation on the remaining rows, just shifted by 0.5 hours:

(combined_files[combined_files[c(1:336), 1]] - 1)*0.5

When I do this, I get the following error:

Error in `vectbl_as_col_location()`:
! Can't subset columns with `combined_files[c(1:336), 1]`.
x `combined_files[c(1:336), 1]` must be logical, numeric, or character, not a <tbl_df/tbl/data.frame> object.

Any help would be much appreciated! Edit: an example from my data is too large to attach, so here is a table with similar properties: |path|p1|p2| |:---|:-:|-:| |1|10|20| |1|20|30| |1|30|54| |2|10|20| |2|20|30| |2|30|54| |3|10|20| |3|20|30| |3|30|54|

In this example, I would want to change the values for path between 1 and 2 to be (i-1)*0.5 (converting into hours, where path = 1 is t = 0) and path = 3 to a value of 2 (through (i -1)*0.5 +1 (where i is the value of path for each row) I found a bad way to do it - just split the tibble and then rebind it, but if anyone has any route to avoid splitting that would be amazing!

combined_files_1st <- subset(combined_files, combined_files$path < 7.5 )
combined_files_2nd <- subset(combined_files, combined_files$path > 7)
combined_files_1st$path <- (combined_files_1st$path - 1)*0.5
combined_files_2nd$path <- (combined_files_2nd$path-1)*0.5 +0.5

combined_join <- rbind(combined_files_1st, combined_files_2nd)

Original Q&A

There are 1 best solutions below

**PBulls** · Answer 1 · 2023-12-31T22:10:47.780000

Unlike base matrices/data.frames the [ operator doesn't subset a vector from a tibble if you ask for a single column, it'll still be a tibble. You'll have to use [[, $, or dplyr::pull.

Untested, but this would be the 'tidyversiest' way (using the base pipe):

(combined_files[combined_files[1:336, 1] |> dplyr::pull()] - 1)*0.5

Apply different scaling factor to subsets of rows in a column in a dataframe

There are 1 best solutions below

Related Questions in R

Related Questions in TIME-SERIES

Related Questions in TIBBLE

Trending Questions

Popular # Hahtags

Popular Questions