How to change a column name to the data frame variable name with lapply?

72 Views Asked by At

I'm trying to change the name of column 2 of multiple data frames to the name of variable defining each data frame. My function works on it's own, but when I use lapply, the column is renamed "x[[i]]" instead of "df1" or "df2". Thanks in advance!

    df1 = data.frame("a" = c(1,2,3), "b" = c(4,5,6))
    df2 = data.frame("a" = c(1,2,3), "b" = c(4,5,6))
    
    nameChange = function(x){
    name = deparse(substitute(x))
    colnames(x)[2] = name 
    return(x)
        }
    
    head(name(df)) # works
    lapply(list(df1, df2), nameChange ) # column is named "x[[i]]"

I tried using lapply, sapply, mapply, etc. and the USE.NAMES=T/F parameter but nothing seems to do the trick.

2

There are 2 best solutions below

0
Isaac On

One way how to do it with lapply is this one:

# Define a function 
change_second_column_name <- function(new_name, ...) {

  modified_dfs <- lapply(list(...), function(df) {
    # Change the name of the second column to the specified new_name
    colnames(df)[2] <- new_name
    return(df)
  })
  # Return the modified data frames as a list
  return(modified_dfs)
}

# Apply the function to change the name of the second column
change_second_column_name("Yay!", df1, df2)

[[1]]
  a Yay!
1 1    4
2 2    5
3 3    6

[[2]]
  a Yay!
1 1    4
2 2    5
3 3    6
3
knitz3 On

In your expression lapply(list(df1, df2), nameChange), that first argument is evaulated to a list of dataframes, and the variable names df1 and df2 are lost:

list(df1, df2)
[[1]]
  a b
1 1 4
2 2 5
3 3 6

[[2]]
  a b
1 1 4
2 2 5
3 3 6

When you run lapply(), the function is applied to the contents of each list item. You can see internally that lapply() uses [[ to subset the list item, which is why your name variable pulls X[[i]] from your deparse(substitute()) operation.

lapply(list(df1, df2), nameChange)
[[1]]
  a X[[i]]
1 1      4
2 2      5
3 3      6

[[2]]
  a X[[i]]
1 1      4
2 2      5
3 3      6

I find myself constructing lists a lot, where the name of each list item has some important information. Instead of using lapply() to iterate through the list items, you can use base::Map() or purrr::imap() to iterate through both the list items and the list names. The way you specify arguments differ a bit. This solution would suggest naming your list items.

# Make sure to store that naming information in the `names()` of the list
df_list <- list(
  df1 = df1,
  df2 = df2
)

# Function that takes the list item contents and the list item name
nameChangeNew <- function(df, name) {
  colnames(df)[2] <- name
  df
}

# For base::Map(), provide the lists (or vectors) to simultaneously iterate over
base::Map(nameChangeNew, df_list, names(df_list))

# For purrr::imap(), the list item contents and names are both provided to
# the function
purrr::imap(df_list, nameChangeNew)

Both of these functions give this output:

$df1
  a df1
1 1   4
2 2   5
3 3   6

$df2
  a df2
1 1   4
2 2   5
3 3   6