Pass a list containing list names to `pmap` function in R and name resulting dataframes or tibbles

758 Views Asked by At

I am attempting to write a function in R that is called with the pmap function and renames the nested dataframes (or tibbles) that it creates using an argument passed from a list to the pmap function. I think this is best explained with a toy example that is reproducible. Here is one (which assumes the user is running in windows and has directory C:\temp\ already created and currently empty, although you could set the paths below to any directory of your choosing:

#create some toy sample input data files
write.csv(x=data.frame(var1=c(42,43),var2=c(43,45)), file="C:\\temp\\AL.csv")
write.csv(x=data.frame(var1=c(22,43),var2=c(43,45)), file="C:\\temp\\AK.csv")
write.csv(x=data.frame(var1=c(90,98),var2=c(97,96)), file="C:\\temp\\AZ.csv")
write.csv(x=data.frame(var1=c(43,55),var2=c(85,43)), file="C:\\temp\\PossiblyUnknownName.csv")

#Get list of files in c:\temp directory - assumes only files to be read in exist there
pathnames<-list.files(path = "C:\\temp\\", full.names=TRUE)
ListIdNumber<-c("ID3413241", "ID3413242", "ID3413243", "ID3413244")

#Create a named list.  In reality, my problem is more complex, but this gets at the root of the issue
mylistnames<-list(pathnames_in=pathnames, ListIdNumber_in=ListIdNumber)

#Functions that I've tried, where I'm passing the name ListIdNumber_in into the function so
#the resulting data frames are named.

#Attempt 1
get_data_files1<-function(pathnames_in, ListIdNumber_in){
  tempdf <- read.csv(pathnames_in) %>% set_names(nm=ListIdNumber_in)
}

#Attempt 2
get_data_files2<-function(pathnames_in, ListIdNumber_in){
  tempdf <- read.csv(pathnames_in) 
  names(tempdf)<-ListIdNumber_in
  tempdf
}

#Attempt 3
get_data_files3<-function(pathnames_in, ListIdNumber_in){
  tempdf <- read.csv(pathnames_in) 
  tempdf
}

#Fails
pmap(mylistnames, get_data_files1)->myoutput1

#Almost, but doesn't name the tibbles it creates and instead creates a variable named ListIdNumber_in
pmap(mylistnames, get_data_files2)->myoutput2

#This gets me the end result that I want, but I want to set the names inside the function
pmap(mylistnames, get_data_files3) %>% set_names(nm=mylistnames$ListIdNumber_in)->myoutput3

So when I run pmap I'd like to get the following result, only I'd like the naming of the nested data frames/tibbles to be done inside the function (and I don't really need the 'X' variable which I think is being erroneously created).:

$ID3413241
  X var1 var2
1 1   22   43
2 2   43   45

$ID3413242
  X var1 var2
1 1   42   43
2 2   43   45

$ID3413243
  X var1 var2
1 1   90   97
2 2   98   96

$ID3413244
  X var1 var2
1 1   43   85
2 2   55   43

Any ideas how this can be accomplished?

Thanks!

2

There are 2 best solutions below

1
On BEST ANSWER

How about creating your own pmap for this purpose?

# assume that your names are always stored in `ListIdNumber_in`
named_pmap <- function(.l, .f, ...) set_names(pmap(.l, .f, ...), .l$ListIdNumber_in)

Then you can directly call named_pmap(mylistnames, get_data_files3). Except for the naming part, this named_pmap is basically the same as pmap.

4
On
  • Use map here
  • No need to create a named list since you cannot attach names at top level while reading the csv, add names separately.
library(purrr)
map(pathnames, read.csv) %>% set_names(ListIdNumber)

#$ID3413241
#  var1 var2
#1   22   43
#2   43   45

#$ID3413242
#  var1 var2
#1   42   43
#2   43   45

#$ID3413243
#  var1 var2
#1   90   97
#2   98   96

#$ID3413244
#  var1 var2
#1   43   85
#2   55   43

In base R, this can be done as :

setNames(lapply(pathnames, read.csv), ListIdNumber)

The reason why you get an additional X column is because while writing the csv you are writing rownames as well. Set it to row.names = FALSE and you'll not have that column.

write.csv(x=data.frame(var1=c(42,43),var2=c(43,45)), 
          file="C:\\temp\\AL.csv", row.names = FALSE)