lapply a function for a specific column in each dataframe in a list of dataframes?

41 Views Asked by At

I'm trying to apply a function that orders a specific column in each dataframe within a list of dataframes using lapply. I am mostly confused on how to call a specific column from each dataframe while having lapply go through each dataframe.

My code for the function, which orders a column in a dataframe by either decreasing or increasing depending on a logical condition

my_function = function(individual_dataframe, logical_condition, column_to_order){
  output_dataframe = if(logical_condition == T){
    individual_dataframe[order(column_to_order,decreasing = F),]
  }else {
    individual_dataframe[order(column_to_order,decreasing = T),]
  }
    return(ordered_dataframe)
  }

I now want to apply this function to a list of dataframes using lapply, ordering a specific column in each dataframe.

Data replication using the iris dataset:

iris_species = split(iris, iris$Species)

Code I have tried, attempting to sort Sepal.length

ordered_iris_species = lapply(iris_species, 
       FUN = my_function(df = iris_species[[]],
                            increasing_order = F,
                            column_to_order = Sepal.length))

ordered_iris_species = lapply(iris_species, 
       FUN = my_function,
             increasing_order = F,
             column_to_order = iris_species[[1]]$Sepal.length))

output in the first instance asks for a subscript. output in the second instance orders only the first dataframe. How do I order the Sepal.length column for each dataframe within iris_species?

1

There are 1 best solutions below

1
MrFlick On

First make sure you are returning the correct value from your function and match up the parameter names to how you are calling the function. Also set it up so you can pass in the column name as a character value/string rather than a bare symbol

my_function = function(individual_dataframe, increasing_order, column_to_order){
  output_dataframe = if(increasing_order == T){
    individual_dataframe[order(individual_dataframe[[column_to_order]],decreasing = F),]
  }else {
    individual_dataframe[order(individual_dataframe[[column_to_order]],decreasing = T),]
  }
  return(output_dataframe)
}

And then you can use the following syntax putting the column name in quotes.

ordered_iris_species = lapply(iris_species, my_function,
                              increasing_order = F,
                              column_to_order = "Sepal.Length")