I need to create my own class object that takes a dataframe and has methods 'get_data' to choose dataframe, 'select' to select columns by their names and 'filter' to filter rows with certain values. Select and filter are a kind of similar to dplyr, but without using dplyr.
I would like they could be chained like this:
result <- df_object$get_data(df)$select(col1, col2, period)$filter(period)
What can I do so that 'filter' method would filter already selected values? Now it filters initial dataset. Also how to change methods so that select and filter wouldn't need data argument? Please give me some tips, I feel like I'm doing it a wrong way. Do I need to add some fields to class?
dataFrame <- R6Class("dataFrame",
list(data = "data.frame"),
public = list(
get_data = function(data) {data},
select_func = function(data, columns) {data[columns]},
filter_func = function(data, var) {data[var, ]}
))
# Create new object
df_object <- dataFrame$new()
# Call methods
df_object$get_data(df)
df_object$select_func(df, c("month", "forecast"))
df_object$filter_func(df[df$month %in% c(1, 2), ])
If you want to chain member functions, you need those member functions to return
self
. This means that the R6 object has to modify the data it contains. Since the benefit of R6 is to reduce copies, I would probably keep a full copy of the data, and haveselect_func
andfilter_func
update some row and column indices:This allows us to chain the filter and select methods:
and our select method can take names too:
For completeness, you need some type safety, and I would also add a reset method to remove all filtering. This effectively gives you a data frame where the filtering and selecting are non-destructive, which could actually be very useful.
Created on 2022-05-01 by the reprex package (v2.0.1)